DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/13] add hairpin feature
@ 2019-09-26  6:28 Ori Kam
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue Ori Kam
                   ` (19 more replies)
  0 siblings, 20 replies; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:28 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.


This series must be applied after series[2]

[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
[2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com


Ori Kam (13):
  ethdev: support setup function for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows
  doc: add hairpin feature

 app/test-pmd/parameters.c                |  12 +
 app/test-pmd/testpmd.c                   |  59 ++++-
 app/test-pmd/testpmd.h                   |   1 +
 doc/guides/rel_notes/release_19_11.rst   |   5 +
 drivers/net/mlx5/mlx5.c                  | 160 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  65 ++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  73 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rxq.c              | 323 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  72 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 134 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 309 ++++++++++++++++++++----
 lib/librte_ethdev/rte_ethdev.c           | 213 +++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 145 ++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  18 ++
 lib/librte_ethdev/rte_ethdev_version.map |   4 +
 21 files changed, 2424 insertions(+), 127 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
@ 2019-09-26  6:28 ` Ori Kam
  2019-09-26 12:18   ` Andrew Rybchenko
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities Ori Kam
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:28 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the RX/TX hairpin setup function.

Hairpin is RX/TX queue that is used by the nic in order to offload
wire to wire traffic.

Each hairpin queue is binded to one or more queues from other type.
For example TX hairpin queue should be binded to at least 1 RX hairpin
queue and vice versa.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c           | 213 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
 lib/librte_ethdev/rte_ethdev_version.map |   4 +
 4 files changed, 380 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 30b0c78..4021f38 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1701,6 +1701,115 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc, unsigned int socket_id,
+			       const struct rte_eth_rxconf *rx_conf,
+			       const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_rx_desc(=%hu), should be: "
+			       "<= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, "
+			"new added offloads 0x%"PRIx64" must be "
+			"within per-queue offload capabilities "
+			"0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, socket_id,
+						      &local_conf,
+						      hairpin_conf);
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1799,6 +1908,110 @@ struct rte_eth_dev *
 		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc, unsigned int socket_id,
+			       const struct rte_eth_txconf *tx_conf,
+			       const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_txconf local_conf;
+	void **txq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0) {
+		nb_tx_desc = dev_info.default_txportconf.ring_size;
+		/* If driver default is zero, fall back on EAL default */
+		if (nb_tx_desc == 0)
+			nb_tx_desc = RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
+	}
+	if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
+	    nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
+	    nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_tx_desc(=%hu), "
+			       "should be: <= %hu, >= %hu, and a product of "
+			       " %hu\n",
+			       nb_tx_desc, dev_info.tx_desc_lim.nb_max,
+			       dev_info.tx_desc_lim.nb_min,
+			       dev_info.tx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->tx_queue_state[tx_queue_id] !=
+		 RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+
+	if (tx_conf == NULL)
+		tx_conf = &dev_info.default_txconf;
+
+	local_conf = *tx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Ethdev port_id=%d tx_queue_id=%d, new added "
+			       "offloads 0x%"PRIx64" must be within "
+			       "per-queue offload capabilities 0x%"PRIx64" "
+			       "in %s()\n",
+			       port_id, tx_queue_id, local_conf.offloads,
+			       dev_info.tx_queue_offload_capa,
+			       __func__);
+		return -EINVAL;
+	}
+
+	return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
+		       (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
+			hairpin_conf));
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 475dbda..b3b1597 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -803,6 +803,30 @@ struct rte_eth_txconf {
 	uint64_t offloads;
 };
 
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_n; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
 /**
  * A structure contains information about HW descriptor ring limitations.
  */
@@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of NUMA.
+ *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
+ *   the DMA memory allocated for the receive descriptors of the ring.
+ * @param rx_conf
+ *   The pointer to the configuration data to be used for the receive queue.
+ *   NULL value is allowed, in which case default RX configuration
+ *   will be used.
+ *   The *rx_conf* structure contains an *rx_thresh* structure with the values
+ *   of the Prefetch, Host, and Write-Back threshold registers of the receive
+ *   ring.
+ *   In addition it contains the hardware offloads features to activate using
+ *   the DEV_RX_OFFLOAD_* flags.
+ *   If an offloading set in rx_conf->offloads
+ *   hasn't been set in the input argument eth_conf->rxmode.offloads
+ *   to rte_eth_dev_configure(), it is a new added offloading, it must be
+ *   per-queue type and it is enabled for the queue.
+ *   No need to repeat any bit in rx_conf->offloads which has already been
+ *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
+ *   at port level can't be disabled at queue level.
+ * @param hairpin_conf
+ *   The pointer to the hairpin binding configuration.
+ * @return
+ *   - 0: Success, receive queue correctly set up.
+ *   - -EINVAL: The size of network buffers which can be allocated from the
+ *      memory pool does not fit the various buffer sizes allowed by the
+ *      device controller.
+ *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
+ *      allocate network memory buffers from the memory pool when
+ *      initializing receive descriptors.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc, unsigned int socket_id,
+	 const struct rte_eth_rxconf *rx_conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of NUMA.
+ *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
+ *   the DMA memory allocated for the transmit descriptors of the ring.
+ * @param tx_conf
+ *   The pointer to the configuration data to be used for the transmit queue.
+ *   NULL value is allowed, in which case default RX configuration
+ *   will be used.
+ *   The *tx_conf* structure contains the following data:
+ *   - The *tx_thresh* structure with the values of the Prefetch, Host, and
+ *     Write-Back threshold registers of the transmit ring.
+ *     When setting Write-Back threshold to the value greater then zero,
+ *     *tx_rs_thresh* value should be explicitly set to one.
+ *   - The *tx_free_thresh* value indicates the [minimum] number of network
+ *     buffers that must be pending in the transmit ring to trigger their
+ *     [implicit] freeing by the driver transmit function.
+ *   - The *tx_rs_thresh* value indicates the [minimum] number of transmit
+ *     descriptors that must be pending in the transmit ring before setting the
+ *     RS bit on a descriptor by the driver transmit function.
+ *     The *tx_rs_thresh* value should be less or equal then
+ *     *tx_free_thresh* value, and both of them should be less then
+ *     *nb_tx_desc* - 3.
+ *   - The *txq_flags* member contains flags to pass to the TX queue setup
+ *     function to configure the behavior of the TX queue. This should be set
+ *     to 0 if no special configuration is required.
+ *     This API is obsolete and will be deprecated. Applications
+ *     should set it to ETH_TXQ_FLAGS_IGNORE and use
+ *     the offloads field below.
+ *   - The *offloads* member contains Tx offloads to be enabled.
+ *     If an offloading set in tx_conf->offloads
+ *     hasn't been set in the input argument eth_conf->txmode.offloads
+ *     to rte_eth_dev_configure(), it is a new added offloading, it must be
+ *     per-queue type and it is enabled for the queue.
+ *     No need to repeat any bit in tx_conf->offloads which has already been
+ *     enabled in rte_eth_dev_configure() at port level. An offloading enabled
+ *     at port level can't be disabled at queue level.
+ *
+ *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0 forces
+ *     the transmit function to use default values.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   - 0: Success, the transmit queue is correctly set up.
+ *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc, unsigned int socket_id,
+	 const struct rte_eth_txconf *tx_conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 2394b32..bc40708 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -126,6 +126,13 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc, unsigned int socket_id,
+	 const struct rte_eth_rxconf *rx_conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+/**< @internal Set up a receive hairpin queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -133,6 +140,13 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc, unsigned int socket_id,
+	 const struct rte_eth_txconf *tx_conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
+
 typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
 				    uint16_t rx_queue_id);
 /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
@@ -433,6 +447,8 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 	eth_rx_queue_count_t       rx_queue_count;
 	/**< Get the number of used RX descriptors. */
@@ -444,6 +460,8 @@ struct eth_dev_ops {
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
 	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 6df42a4..99e05fe 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -283,4 +283,8 @@ EXPERIMENTAL {
 
 	# added in 19.08
 	rte_eth_read_clock;
+
+	# added in 19.11
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue Ori Kam
@ 2019-09-26  6:28 ` Ori Kam
  2019-09-26  9:31   ` Slava Ovsiienko
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues Ori Kam
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:28 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d4d2ca8..cd896c8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,10 @@ struct mlx5_hca_attr {
 	uint32_t tunnel_lro_vxlan:1;
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index acfe1de..b072c37 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue Ori Kam
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-09-26  6:28 ` Ori Kam
  2019-09-26  9:32   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types Ori Kam
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:28 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 286 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  17 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 +
 4 files changed, 288 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index be01db9..81894fb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -974,6 +974,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1040,6 +1041,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a1fdeef..a673da9 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,62 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc, unsigned int socket,
+			    const struct rte_eth_rxconf *conf,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, socket, conf,
+					hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +646,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +680,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1243,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1338,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1610,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1647,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1695,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1803,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1972,59 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param conf
+ *   The Rx configuration.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     unsigned int socket, const struct rte_eth_rxconf *conf,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+	uint64_t offloads = conf->offloads |
+			   dev->data->dev_conf.rxmode.offloads;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, socket);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = socket;
+	/* Configure VLAN stripping. */
+	tmpl->rxq.vlan_strip = !!(offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
+	/* Save port ID. */
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2078,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..dbb616e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,10 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_rxconf *conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +364,10 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_rxconf *conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (2 preceding siblings ...)
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:32   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues Ori Kam
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>

Conflicts:
	drivers/net/mlx5/mlx5_txq.c
---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 81894fb..f0d122d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -911,7 +911,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index cd896c8..a34972c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -645,7 +645,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 10d0ca1..f23708c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index dbb616e..2de674a 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -395,10 +414,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d9fd143..e1ed4eb 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -980,7 +980,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1006,8 +1006,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (3 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:32   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support Ori Kam
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |  26 ++++
 drivers/net/mlx5/mlx5.h           |  46 +++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 ++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  20 +++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 245 +++++++++++++++++++++++++++++++++++---
 7 files changed, 631 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0d122d..ad36743 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -325,6 +325,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -394,6 +397,19 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "Fail to extract pdn from PD");
 		goto error;
 	}
+	sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+	if (!sh->td) {
+		DRV_LOG(ERR, "TD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	tis_attr.transport_domain = sh->td->id;
+	sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+	if (!sh->tis) {
+		DRV_LOG(ERR, "TIS allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -425,6 +441,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -485,6 +505,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -976,6 +1000,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1043,6 +1068,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a34972c..506920e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -350,6 +350,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -591,6 +628,8 @@ struct mlx5_ibv_shared {
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -911,5 +950,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index b072c37..917bbf9 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -709,3 +709,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 3765df0..faa7996 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -666,9 +666,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1311,6 +1315,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1427,6 +1448,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1572,6 +1611,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 2de674a..8fa22e5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -412,15 +417,24 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_txconf *conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_txconf *conf,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index e1ed4eb..44233e9 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,63 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc, unsigned int socket,
+			    const struct rte_eth_txconf *conf,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc, socket, conf,
+					hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +327,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +365,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +401,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +433,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +454,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +554,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +803,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -953,6 +1118,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -961,6 +1127,55 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *  Thresholds parameters.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     unsigned int socket, const struct rte_eth_txconf *conf,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl),
+				 0, socket);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	assert(desc > MLX5_TX_COMP_THRESH);
+	tmpl->txq.offloads = conf->offloads |
+			     dev->data->dev_conf.txmode.offloads;
+	tmpl->priv = priv;
+	tmpl->socket = socket;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (4 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:32   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function Ori Kam
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 app/test-pmd/parameters.c | 12 ++++++++++
 app/test-pmd/testpmd.c    | 59 +++++++++++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h    |  1 +
 3 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 6c78dca..16bdcc8 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -147,6 +147,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -618,6 +620,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1036,6 +1039,15 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 && check_nb_txq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_txq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index de91e1b..f15a308 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -2064,6 +2065,10 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_n = 1,
+	};
+	int i;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2097,8 +2102,9 @@ struct extmem_param {
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2197,55 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd,
+					 port->socket_id, &(port->tx_conf[qi]),
+					 &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd,
+					 port->socket_id, &(port->rx_conf[qi]),
+					 &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index d73955d..09baa72 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (5 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:33   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq Ori Kam
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 506920e..41eb35a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -188,6 +188,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 917bbf9..0243733 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index faa7996..d4084db 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1611,6 +1611,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (6 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:33   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action Ori Kam
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

The hairpin hrxq is based on the DevX hrxq but uses different
pd.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a673da9..bf39112 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2344,13 +2344,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2446,7 +2446,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (7 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:33   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function Ori Kam
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  54 ++++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 257 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 482f65b..00afc18 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 235bccd..0148c1b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,16 +90,17 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
-#define MLX5_FLOW_LAYER_IPIP (1u << 21)
-#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 22)
-#define MLX5_FLOW_LAYER_NVGRE (1u << 23)
+#define MLX5_FLOW_LAYER_IPIP (1u << 22)
+#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
+#define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
@@ -139,6 +177,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -172,7 +211,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 2a7e3ed..dde0831 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4640,6 +4693,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5225,8 +5366,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5541,6 +5683,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5565,8 +5713,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5712,6 +5861,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index d4084db..695578f 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -623,7 +623,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (8 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:34   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin Ori Kam
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad36743..940503d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -179,6 +179,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -329,7 +447,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0148c1b..1b14fb7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -495,8 +495,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (9 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:34   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows Ori Kam
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 41eb35a..5f1a25d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -556,6 +556,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -872,6 +873,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 00afc18..33ed204 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2712,6 +2712,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1b14fb7..bb67380 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -102,6 +108,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
 #define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dde0831..2b48680 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3518,6 +3520,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3526,11 +3531,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3796,6 +3802,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5291,6 +5299,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5866,6 +5919,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..a476cd5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -396,6 +396,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (10 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:34   ` Slava Ovsiienko
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 13/13] doc: add hairpin feature Ori Kam
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 940503d..2837cba 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -528,6 +528,12 @@ struct mlx5_flow_id_pool *
 		err = ENOMEM;
 		goto error;
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -567,6 +573,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -629,6 +637,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5f1a25d..5336554 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -574,6 +574,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -632,6 +641,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 33ed204..50e1d11 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2419,6 +2419,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2446,6 +2650,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2453,16 +2658,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2471,11 +2704,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2496,7 +2731,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2504,7 +2739,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2516,8 +2768,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2607,12 +2867,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index bb67380..90a289e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -434,6 +434,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -455,12 +457,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -504,15 +505,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 2b48680..6828bd1 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5763,7 +5763,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -5985,7 +5985,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6000,7 +6000,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6056,7 +6056,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6290,7 +6290,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index bf39112..e51a0c6 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2113,6 +2113,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 8fa22e5..4707b29 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -408,6 +409,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH 13/13] doc: add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (11 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows Ori Kam
@ 2019-09-26  6:29 ` Ori Kam
  2019-09-26  9:34   ` Slava Ovsiienko
  2019-09-26 12:32 ` [dpdk-dev] [PATCH 00/13] " Andrew Rybchenko
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26  6:29 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, orika, jingjing.wu, stephen

This commit adds the hairpin feature to the release notes.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 doc/guides/rel_notes/release_19_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index c8d97f1..a880655 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added hairpin queue.**
+
+  On supported nics, we can now setup haipin queue which will offload packets from the wire,
+  back to the wire.
 
 Removed Items
 -------------
@@ -234,4 +238,5 @@ Tested Platforms
   * Added support for VLAN push flow offload command.
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
+  * Added hairpin support.
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-09-26  9:31   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:31 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 02/13] net/mlx5: query hca hairpin capabilities
> 
> This commit query and store the hairpin capabilities from the device.
> 
> Those capabilities will be used when creating the hairpin queue.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.h           | 4 ++++
>  drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> d4d2ca8..cd896c8 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -184,6 +184,10 @@ struct mlx5_hca_attr {
>  	uint32_t tunnel_lro_vxlan:1;
>  	uint32_t lro_max_msg_sz_mode:2;
>  	uint32_t
> lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
> +	uint32_t hairpin:1;
> +	uint32_t log_max_hairpin_queues:5;
> +	uint32_t log_max_hairpin_wq_data_sz:5;
> +	uint32_t log_max_hairpin_num_packets:5;
>  };
> 
>  /* Flow list . */
> diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c
> b/drivers/net/mlx5/mlx5_devx_cmds.c
> index acfe1de..b072c37 100644
> --- a/drivers/net/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/net/mlx5/mlx5_devx_cmds.c
> @@ -327,6 +327,13 @@ struct mlx5_devx_obj *
>  	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
>  					    flow_counters_dump);
>  	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr,
> eswitch_manager);
> +	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
> +	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
> +						log_max_hairpin_queues);
> +	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap,
> hcattr,
> +
> log_max_hairpin_wq_data_sz);
> +	attr->log_max_hairpin_num_packets = MLX5_GET
> +		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
>  	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
>  					  eth_net_offloads);
>  	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-09-26  9:32   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:32 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 03/13] net/mlx5: support Rx hairpin queues
> 
> This commit adds the support for creating Rx hairpin queues.
> Hairpin queue is a queue that is created using DevX and only used by the HW.
> This results in that all the data part of the RQ is not being used.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.c         |   2 +
>  drivers/net/mlx5/mlx5_rxq.c     | 286
> ++++++++++++++++++++++++++++++++++++----
>  drivers/net/mlx5/mlx5_rxtx.h    |  17 +++
>  drivers/net/mlx5/mlx5_trigger.c |   7 +
>  4 files changed, 288 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> be01db9..81894fb 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -974,6 +974,7 @@ struct mlx5_dev_spawn_data {
>  	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
>  	.vlan_filter_set = mlx5_vlan_filter_set,
>  	.rx_queue_setup = mlx5_rx_queue_setup,
> +	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
>  	.tx_queue_setup = mlx5_tx_queue_setup,
>  	.rx_queue_release = mlx5_rx_queue_release,
>  	.tx_queue_release = mlx5_tx_queue_release, @@ -1040,6 +1041,7
> @@ struct mlx5_dev_spawn_data {
>  	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
>  	.vlan_filter_set = mlx5_vlan_filter_set,
>  	.rx_queue_setup = mlx5_rx_queue_setup,
> +	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
>  	.tx_queue_setup = mlx5_tx_queue_setup,
>  	.rx_queue_release = mlx5_rx_queue_release,
>  	.tx_queue_release = mlx5_tx_queue_release, diff --git
> a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> a1fdeef..a673da9 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -106,21 +106,25 @@
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	uint16_t i;
>  	uint16_t n = 0;
> +	uint16_t n_ibv = 0;
> 
>  	if (mlx5_check_mprq_support(dev) < 0)
>  		return 0;
>  	/* All the configured queues should be enabled. */
>  	for (i = 0; i < priv->rxqs_n; ++i) {
>  		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
> +		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
> +			(rxq, struct mlx5_rxq_ctrl, rxq);
> 
> -		if (!rxq)
> +		if (rxq == NULL || rxq_ctrl->type !=
> MLX5_RXQ_TYPE_STANDARD)
>  			continue;
> +		n_ibv++;
>  		if (mlx5_rxq_mprq_enabled(rxq))
>  			++n;
>  	}
>  	/* Multi-Packet RQ can't be partially configured. */
> -	assert(n == 0 || n == priv->rxqs_n);
> -	return n == priv->rxqs_n;
> +	assert(n == 0 || n == n_ibv);
> +	return n == n_ibv;
>  }
> 
>  /**
> @@ -427,6 +431,7 @@
>  }
> 
>  /**
> + * Rx queue presetup checks.
>   *
>   * @param dev
>   *   Pointer to Ethernet device structure.
> @@ -434,25 +439,14 @@
>   *   RX queue index.
>   * @param desc
>   *   Number of descriptors to configure in queue.
> - * @param socket
> - *   NUMA socket on which memory must be allocated.
> - * @param[in] conf
> - *   Thresholds parameters.
> - * @param mp
> - *   Memory pool for buffer allocations.
>   *
>   * @return
>   *   0 on success, a negative errno value otherwise and rte_errno is set.
>   */
> -int
> -mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> -		    unsigned int socket, const struct rte_eth_rxconf *conf,
> -		    struct rte_mempool *mp)
> +static int
> +mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t
> +desc)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> -	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> -	struct mlx5_rxq_ctrl *rxq_ctrl =
> -		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> 
>  	if (!rte_is_power_of_2(desc)) {
>  		desc = 1 << log2above(desc);
> @@ -476,6 +470,41 @@
>  		return -rte_errno;
>  	}
>  	mlx5_rxq_release(dev, idx);
> +	return 0;
> +}
> +
> +/**
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   RX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param[in] conf
> + *   Thresholds parameters.
> + * @param mp
> + *   Memory pool for buffer allocations.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +		    unsigned int socket, const struct rte_eth_rxconf *conf,
> +		    struct rte_mempool *mp)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> +	struct mlx5_rxq_ctrl *rxq_ctrl =
> +		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> +	int res;
> +
> +	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
> +	if (res)
> +		return res;
>  	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
>  	if (!rxq_ctrl) {
>  		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> @@ -490,6 +519,62 @@  }
> 
>  /**
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   RX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param[in] conf
> + *   Thresholds parameters.
> + * @param hairpin_conf
> + *   Hairpin configuration parameters.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
> +			    uint16_t desc, unsigned int socket,
> +			    const struct rte_eth_rxconf *conf,
> +			    const struct rte_eth_hairpin_conf *hairpin_conf) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
> +	struct mlx5_rxq_ctrl *rxq_ctrl =
> +		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
> +	int res;
> +
> +	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
> +	if (res)
> +		return res;
> +	if (hairpin_conf->peer_n != 1 ||
> +	    hairpin_conf->peers[0].port != dev->data->port_id ||
> +	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
> +		DRV_LOG(ERR, "port %u unable to setup hairpin queue index
> %u "
> +			" invalid hairpind configuration", dev->data->port_id,
> +			idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, socket, conf,
> +					hairpin_conf);
> +	if (!rxq_ctrl) {
> +		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> +			dev->data->port_id, idx);
> +		rte_errno = ENOMEM;
> +		return -rte_errno;
> +	}
> +	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
> +		dev->data->port_id, idx);
> +	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
> +	return 0;
> +}
> +
> +/**
>   * DPDK callback to release a RX queue.
>   *
>   * @param dpdk_rxq
> @@ -561,6 +646,24 @@
>  }
> 
>  /**
> + * Release an Rx hairpin related resources.
> + *
> + * @param rxq_obj
> + *   Hairpin Rx queue object.
> + */
> +static void
> +rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj) {
> +	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
> +
> +	assert(rxq_obj);
> +	rq_attr.state = MLX5_RQC_STATE_RST;
> +	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
> +	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
> +	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
> +}
> +
> +/**
>   * Release an Rx verbs/DevX queue object.
>   *
>   * @param rxq_obj
> @@ -577,14 +680,22 @@
>  		assert(rxq_obj->wq);
>  	assert(rxq_obj->cq);
>  	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
> -		rxq_free_elts(rxq_obj->rxq_ctrl);
> -		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
> +		switch (rxq_obj->type) {
> +		case MLX5_RXQ_OBJ_TYPE_IBV:
> +			rxq_free_elts(rxq_obj->rxq_ctrl);
>  			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
> -		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
> +			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
> +			break;
> +		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
> +			rxq_free_elts(rxq_obj->rxq_ctrl);
>  			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
>  			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
> +			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
> +			break;
> +		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
> +			rxq_obj_hairpin_release(rxq_obj);
> +			break;
>  		}
> -		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
>  		if (rxq_obj->channel)
>  			claim_zero(mlx5_glue->destroy_comp_channel
>  				   (rxq_obj->channel));
> @@ -1132,6 +1243,70 @@
>  }
> 
>  /**
> + * Create the Rx hairpin queue object.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param idx
> + *   Queue index in DPDK Rx queue array
> + *
> + * @return
> + *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
> + */
> +static struct mlx5_rxq_obj *
> +mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
> +	struct mlx5_rxq_ctrl *rxq_ctrl =
> +		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> +	struct mlx5_devx_create_rq_attr attr = { 0 };
> +	struct mlx5_rxq_obj *tmpl = NULL;
> +	int ret = 0;
> +
> +	assert(rxq_data);
> +	assert(!rxq_ctrl->obj);
> +	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
> +				 rxq_ctrl->socket);
> +	if (!tmpl) {
> +		DRV_LOG(ERR,
> +			"port %u Rx queue %u cannot allocate verbs
> resources",
> +			dev->data->port_id, rxq_data->idx);
> +		rte_errno = ENOMEM;
> +		goto error;
> +	}
> +	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
> +	tmpl->rxq_ctrl = rxq_ctrl;
> +	attr.hairpin = 1;
> +	/* Workaround for hairpin startup */
> +	attr.wq_attr.log_hairpin_num_packets = log2above(32);
> +	/* Workaround for packets larger than 1KB */
> +	attr.wq_attr.log_hairpin_data_sz =
> +			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
> +	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
> +					   rxq_ctrl->socket);
> +	if (!tmpl->rq) {
> +		DRV_LOG(ERR,
> +			"port %u Rx hairpin queue %u can't create rq object",
> +			dev->data->port_id, idx);
> +		rte_errno = errno;
> +		goto error;
> +	}
> +	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data-
> >port_id,
> +		idx, (void *)&tmpl);
> +	rte_atomic32_inc(&tmpl->refcnt);
> +	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
> +	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
> +	return tmpl;
> +error:
> +	ret = rte_errno; /* Save rte_errno before cleanup. */
> +	if (tmpl->rq)
> +		mlx5_devx_cmd_destroy(tmpl->rq);
> +	rte_errno = ret; /* Restore rte_errno. */
> +	return NULL;
> +}
> +
> +/**
>   * Create the Rx queue Verbs/DevX object.
>   *
>   * @param dev
> @@ -1163,6 +1338,8 @@ struct mlx5_rxq_obj *
> 
>  	assert(rxq_data);
>  	assert(!rxq_ctrl->obj);
> +	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
> +		return mlx5_rxq_obj_hairpin_new(dev, idx);
>  	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
>  	priv->verbs_alloc_ctx.obj = rxq_ctrl;
>  	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0, @@ -1433,15
> +1610,19 @@ struct mlx5_rxq_obj *
>  	unsigned int strd_num_n = 0;
>  	unsigned int strd_sz_n = 0;
>  	unsigned int i;
> +	unsigned int n_ibv = 0;
> 
>  	if (!mlx5_mprq_enabled(dev))
>  		return 0;
>  	/* Count the total number of descriptors configured. */
>  	for (i = 0; i != priv->rxqs_n; ++i) {
>  		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
> +		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
> +			(rxq, struct mlx5_rxq_ctrl, rxq);
> 
> -		if (rxq == NULL)
> +		if (rxq == NULL || rxq_ctrl->type !=
> MLX5_RXQ_TYPE_STANDARD)
>  			continue;
> +		n_ibv++;
>  		desc += 1 << rxq->elts_n;
>  		/* Get the max number of strides. */
>  		if (strd_num_n < rxq->strd_num_n)
> @@ -1466,7 +1647,7 @@ struct mlx5_rxq_obj *
>  	 * this Mempool gets available again.
>  	 */
>  	desc *= 4;
> -	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
> +	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
>  	/*
>  	 * rte_mempool_create_empty() has sanity check to refuse large
> cache
>  	 * size compared to the number of elements.
> @@ -1514,8 +1695,10 @@ struct mlx5_rxq_obj *
>  	/* Set mempool for each Rx queue. */
>  	for (i = 0; i != priv->rxqs_n; ++i) {
>  		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
> +		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
> +			(rxq, struct mlx5_rxq_ctrl, rxq);
> 
> -		if (rxq == NULL)
> +		if (rxq == NULL || rxq_ctrl->type !=
> MLX5_RXQ_TYPE_STANDARD)
>  			continue;
>  		rxq->mprq_mp = mp;
>  	}
> @@ -1620,6 +1803,7 @@ struct mlx5_rxq_ctrl *
>  		rte_errno = ENOMEM;
>  		return NULL;
>  	}
> +	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
>  	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
>  			       MLX5_MR_BTREE_CACHE_N, socket)) {
>  		/* rte_errno is already set. */
> @@ -1788,6 +1972,59 @@ struct mlx5_rxq_ctrl *  }
> 
>  /**
> + * Create a DPDK Rx hairpin queue.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param idx
> + *   RX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param conf
> + *   The Rx configuration.
> + * @param hairpin_conf
> + *   The hairpin binding configuration.
> + *
> + * @return
> + *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
> + */
> +struct mlx5_rxq_ctrl *
> +mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +		     unsigned int socket, const struct rte_eth_rxconf *conf,
> +		     const struct rte_eth_hairpin_conf *hairpin_conf) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_rxq_ctrl *tmpl;
> +	uint64_t offloads = conf->offloads |
> +			   dev->data->dev_conf.rxmode.offloads;
> +
> +	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, socket);
> +	if (!tmpl) {
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
> +	tmpl->socket = socket;
> +	/* Configure VLAN stripping. */
> +	tmpl->rxq.vlan_strip = !!(offloads &
> DEV_RX_OFFLOAD_VLAN_STRIP);
> +	/* Save port ID. */
> +	tmpl->rxq.rss_hash = 0;
> +	tmpl->rxq.port_id = dev->data->port_id;
> +	tmpl->priv = priv;
> +	tmpl->rxq.mp = NULL;
> +	tmpl->rxq.elts_n = log2above(desc);
> +	tmpl->rxq.elts = NULL;
> +	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
> +	tmpl->hairpin_conf = *hairpin_conf;
> +	tmpl->rxq.idx = idx;
> +	rte_atomic32_inc(&tmpl->refcnt);
> +	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
> +	return tmpl;
> +}
> +
> +/**
>   * Get a Rx queue.
>   *
>   * @param dev
> @@ -1841,7 +2078,8 @@ struct mlx5_rxq_ctrl *
>  		if (rxq_ctrl->dbr_umem_id_valid)
>  			claim_zero(mlx5_release_dbr(dev, rxq_ctrl-
> >dbr_umem_id,
>  						    rxq_ctrl->dbr_offset));
> -		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
> +		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
> +			mlx5_mr_btree_free(&rxq_ctrl-
> >rxq.mr_ctrl.cache_bh);
>  		LIST_REMOVE(rxq_ctrl, next);
>  		rte_free(rxq_ctrl);
>  		(*priv->rxqs)[idx] = NULL;
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 4bb28a4..dbb616e 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -159,6 +159,13 @@ struct mlx5_rxq_data {  enum mlx5_rxq_obj_type {
>  	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with
> ibv_wq. */
>  	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with
> mlx5_devx_rq. */
> +	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
> +	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */ };
> +
> +enum mlx5_rxq_type {
> +	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
> +	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
>  };
> 
>  /* Verbs/DevX Rx queue elements. */
> @@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
>  	rte_atomic32_t refcnt; /* Reference counter. */
>  	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
>  	struct mlx5_priv *priv; /* Back pointer to private data. */
> +	enum mlx5_rxq_type type; /* Rxq type. */
>  	unsigned int socket; /* CPU socket ID for allocations. */
>  	unsigned int irq:1; /* Whether IRQ is enabled. */
>  	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid
> value. */ @@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
>  	uint32_t dbr_umem_id; /* Storing door-bell information, */
>  	uint64_t dbr_offset;  /* needed when freeing door-bell. */
>  	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration
> info. */
> +	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration.
> */
>  };
> 
>  enum mlx5_ind_tbl_type {
> @@ -339,6 +348,10 @@ struct mlx5_txq_ctrl {  int
> mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
>  			unsigned int socket, const struct rte_eth_rxconf
> *conf,
>  			struct rte_mempool *mp);
> +int mlx5_rx_hairpin_queue_setup
> +	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +	 unsigned int socket, const struct rte_eth_rxconf *conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>  void mlx5_rx_queue_release(void *dpdk_rxq);  int
> mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);  void
> mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev); @@ -351,6 +364,10 @@
> struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
>  				   uint16_t desc, unsigned int socket,
>  				   const struct rte_eth_rxconf *conf,
>  				   struct rte_mempool *mp);
> +struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
> +	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +	 unsigned int socket, const struct rte_eth_rxconf *conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>  struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
> int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);  int
> mlx5_rxq_verify(struct rte_eth_dev *dev); diff --git
> a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index
> 122f31c..cb31ae2 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -118,6 +118,13 @@
> 
>  		if (!rxq_ctrl)
>  			continue;
> +		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
> +			rxq_ctrl->obj = mlx5_rxq_obj_new
> +				(dev, i,
> MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
> +			if (!rxq_ctrl->obj)
> +				goto error;
> +			continue;
> +		}
>  		/* Pre-register Rx mempool. */
>  		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
>  		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-09-26  9:32   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:32 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 04/13] net/mlx5: prepare txq to work with different types
> 
> Currenlty all Tx queues are created using Verbs.
> This commit modify the naming so it will not include verbs, since in next
> commit a new type will be introduce (hairpin)
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> 
> Conflicts:
> 	drivers/net/mlx5/mlx5_txq.c
> ---
>  drivers/net/mlx5/mlx5.c         |  2 +-
>  drivers/net/mlx5/mlx5.h         |  2 +-
>  drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
>  drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
>  drivers/net/mlx5/mlx5_trigger.c |  4 +--
>  drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++-------------------
> --
>  6 files changed, 69 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 81894fb..f0d122d 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -911,7 +911,7 @@ struct mlx5_dev_spawn_data {
>  	if (ret)
>  		DRV_LOG(WARNING, "port %u some Rx queues still remain",
>  			dev->data->port_id);
> -	ret = mlx5_txq_ibv_verify(dev);
> +	ret = mlx5_txq_obj_verify(dev);
>  	if (ret)
>  		DRV_LOG(WARNING, "port %u some Verbs Tx queue still
> remain",
>  			dev->data->port_id);
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> cd896c8..a34972c 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -645,7 +645,7 @@ struct mlx5_priv {
>  	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues.
> */
>  	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
>  	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
> -	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
> +	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues.
> */
>  	/* Indirection tables. */
>  	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
>  	/* Pointer to next element. */
> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> index 10d0ca1..f23708c 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
>  			.qp_state = IBV_QPS_RESET,
>  			.port_num = (uint8_t)priv->ibv_port,
>  		};
> -		struct ibv_qp *qp = txq_ctrl->ibv->qp;
> +		struct ibv_qp *qp = txq_ctrl->obj->qp;
> 
>  		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
>  		if (ret) {
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index dbb616e..2de674a 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -308,13 +308,31 @@ struct mlx5_txq_data {
>  	/* Storage for queued packets, must be the last field. */  }
> __rte_cache_aligned;
> 
> -/* Verbs Rx queue elements. */
> -struct mlx5_txq_ibv {
> -	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
> +enum mlx5_txq_obj_type {
> +	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with
> ibv_wq. */
> +	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
> +	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */ };
> +
> +enum mlx5_txq_type {
> +	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
> +	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */ };
> +
> +/* Verbs/DevX Tx queue elements. */
> +struct mlx5_txq_obj {
> +	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
>  	rte_atomic32_t refcnt; /* Reference counter. */
>  	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
> -	struct ibv_cq *cq; /* Completion Queue. */
> -	struct ibv_qp *qp; /* Queue Pair. */
> +	enum mlx5_rxq_obj_type type; /* The txq object type. */
> +	RTE_STD_C11
> +	union {
> +		struct {
> +			struct ibv_cq *cq; /* Completion Queue. */
> +			struct ibv_qp *qp; /* Queue Pair. */
> +		};
> +		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
> +	};
>  };
> 
>  /* TX queue control descriptor. */
> @@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
>  	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
>  	rte_atomic32_t refcnt; /* Reference counter. */
>  	unsigned int socket; /* CPU socket ID for allocations. */
> +	enum mlx5_txq_type type; /* The txq ctrl type. */
>  	unsigned int max_inline_data; /* Max inline data. */
>  	unsigned int max_tso_header; /* Max TSO header size. */
> -	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
> +	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
>  	struct mlx5_priv *priv; /* Back pointer to private data. */
>  	off_t uar_mmap_offset; /* UAR mmap offset for non-primary
> process. */
>  	void *bf_reg; /* BlueFlame register from Verbs. */ @@ -395,10
> +414,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
> uint16_t desc,
>  			unsigned int socket, const struct rte_eth_txconf
> *conf);  void mlx5_tx_queue_release(void *dpdk_txq);  int
> mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd); -struct
> mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx); -
> struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t
> idx); -int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv); -int
> mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
> +struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t
> +idx); struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev,
> +uint16_t idx); int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
> +int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
>  struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
>  				   uint16_t desc, unsigned int socket,
>  				   const struct rte_eth_txconf *conf); diff --
> git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index cb31ae2..50c4df5 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -52,8 +52,8 @@
>  		if (!txq_ctrl)
>  			continue;
>  		txq_alloc_elts(txq_ctrl);
> -		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
> -		if (!txq_ctrl->ibv) {
> +		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
> +		if (!txq_ctrl->obj) {
>  			rte_errno = ENOMEM;
>  			goto error;
>  		}
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> d9fd143..e1ed4eb 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -375,15 +375,15 @@
>   * @return
>   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
>   */
> -struct mlx5_txq_ibv *
> -mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
> +struct mlx5_txq_obj *
> +mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
>  	struct mlx5_txq_ctrl *txq_ctrl =
>  		container_of(txq_data, struct mlx5_txq_ctrl, txq);
> -	struct mlx5_txq_ibv tmpl;
> -	struct mlx5_txq_ibv *txq_ibv = NULL;
> +	struct mlx5_txq_obj tmpl;
> +	struct mlx5_txq_obj *txq_obj = NULL;
>  	union {
>  		struct ibv_qp_init_attr_ex init;
>  		struct ibv_cq_init_attr_ex cq;
> @@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
>  		rte_errno = EINVAL;
>  		return NULL;
>  	}
> -	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
> +	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
>  	attr.cq = (struct ibv_cq_init_attr_ex){
>  		.comp_mask = 0,
>  	};
> @@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
>  		rte_errno = errno;
>  		goto error;
>  	}
> -	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv),
> 0,
> +	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj),
> +0,
>  				    txq_ctrl->socket);
> -	if (!txq_ibv) {
> +	if (!txq_obj) {
>  		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate
> memory",
>  			dev->data->port_id, idx);
>  		rte_errno = ENOMEM;
> @@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
>  		}
>  	}
>  #endif
> -	txq_ibv->qp = tmpl.qp;
> -	txq_ibv->cq = tmpl.cq;
> -	rte_atomic32_inc(&txq_ibv->refcnt);
> +	txq_obj->qp = tmpl.qp;
> +	txq_obj->cq = tmpl.cq;
> +	rte_atomic32_inc(&txq_obj->refcnt);
>  	txq_ctrl->bf_reg = qp.bf.reg;
>  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
>  		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset; @@ -
> 585,18 +585,18 @@ struct mlx5_txq_ibv *
>  		goto error;
>  	}
>  	txq_uar_init(txq_ctrl);
> -	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
> -	txq_ibv->txq_ctrl = txq_ctrl;
> +	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
> +	txq_obj->txq_ctrl = txq_ctrl;
>  	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
> -	return txq_ibv;
> +	return txq_obj;
>  error:
>  	ret = rte_errno; /* Save rte_errno before cleanup. */
>  	if (tmpl.cq)
>  		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
>  	if (tmpl.qp)
>  		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
> -	if (txq_ibv)
> -		rte_free(txq_ibv);
> +	if (txq_obj)
> +		rte_free(txq_obj);
>  	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
>  	rte_errno = ret; /* Restore rte_errno. */
>  	return NULL;
> @@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
>   * @return
>   *   The Verbs object if it exists.
>   */
> -struct mlx5_txq_ibv *
> -mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
> +struct mlx5_txq_obj *
> +mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct mlx5_txq_ctrl *txq_ctrl;
> @@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
>  	if (!(*priv->txqs)[idx])
>  		return NULL;
>  	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
> -	if (txq_ctrl->ibv)
> -		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
> -	return txq_ctrl->ibv;
> +	if (txq_ctrl->obj)
> +		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
> +	return txq_ctrl->obj;
>  }
> 
>  /**
>   * Release an Tx verbs queue object.
>   *
> - * @param txq_ibv
> + * @param txq_obj
>   *   Verbs Tx queue object.
>   *
>   * @return
>   *   1 while a reference on it exists, 0 when freed.
>   */
>  int
> -mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
> +mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
>  {
> -	assert(txq_ibv);
> -	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
> -		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
> -		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
> -		LIST_REMOVE(txq_ibv, next);
> -		rte_free(txq_ibv);
> +	assert(txq_obj);
> +	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
> +		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
> +		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
> +		LIST_REMOVE(txq_obj, next);
> +		rte_free(txq_obj);
>  		return 0;
>  	}
>  	return 1;
> @@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
>   *   The number of object not released.
>   */
>  int
> -mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
> +mlx5_txq_obj_verify(struct rte_eth_dev *dev)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	int ret = 0;
> -	struct mlx5_txq_ibv *txq_ibv;
> +	struct mlx5_txq_obj *txq_obj;
> 
> -	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
> +	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
>  		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still
> referenced",
> -			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
> +			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
>  		++ret;
>  	}
>  	return ret;
> @@ -980,7 +980,7 @@ struct mlx5_txq_ctrl *
>  	if ((*priv->txqs)[idx]) {
>  		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
>  				    txq);
> -		mlx5_txq_ibv_get(dev, idx);
> +		mlx5_txq_obj_get(dev, idx);
>  		rte_atomic32_inc(&ctrl->refcnt);
>  	}
>  	return ctrl;
> @@ -1006,8 +1006,8 @@ struct mlx5_txq_ctrl *
>  	if (!(*priv->txqs)[idx])
>  		return 0;
>  	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
> -	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
> -		txq->ibv = NULL;
> +	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
> +		txq->obj = NULL;
>  	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
>  		txq_free_elts(txq);
>  		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-09-26  9:32   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:32 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 05/13] net/mlx5: support Tx hairpin queues
> 
> This commit adds the support for creating Tx hairpin queues.
> Hairpin queue is a queue that is created using DevX and only used by the HW.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.c           |  26 ++++
>  drivers/net/mlx5/mlx5.h           |  46 +++++++
>  drivers/net/mlx5/mlx5_devx_cmds.c | 186
> +++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_prm.h       | 118 ++++++++++++++++++
>  drivers/net/mlx5/mlx5_rxtx.h      |  20 +++-
>  drivers/net/mlx5/mlx5_trigger.c   |  10 +-
>  drivers/net/mlx5/mlx5_txq.c       | 245
> +++++++++++++++++++++++++++++++++++---
>  7 files changed, 631 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> f0d122d..ad36743 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -325,6 +325,9 @@ struct mlx5_dev_spawn_data {
>  	struct mlx5_ibv_shared *sh;
>  	int err = 0;
>  	uint32_t i;
> +#ifdef HAVE_IBV_FLOW_DV_SUPPORT
> +	struct mlx5_devx_tis_attr tis_attr = { 0 }; #endif
> 
>  	assert(spawn);
>  	/* Secondary process should not create the shared context. */ @@ -
> 394,6 +397,19 @@ struct mlx5_dev_spawn_data {
>  		DRV_LOG(ERR, "Fail to extract pdn from PD");
>  		goto error;
>  	}
> +	sh->td = mlx5_devx_cmd_create_td(sh->ctx);
> +	if (!sh->td) {
> +		DRV_LOG(ERR, "TD allocation failure");
> +		err = ENOMEM;
> +		goto error;
> +	}
> +	tis_attr.transport_domain = sh->td->id;
> +	sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
> +	if (!sh->tis) {
> +		DRV_LOG(ERR, "TIS allocation failure");
> +		err = ENOMEM;
> +		goto error;
> +	}
>  #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
>  	/*
>  	 * Once the device is added to the list of memory event @@ -425,6
> +441,10 @@ struct mlx5_dev_spawn_data {
>  error:
>  	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
>  	assert(sh);
> +	if (sh->tis)
> +		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
> +	if (sh->td)
> +		claim_zero(mlx5_devx_cmd_destroy(sh->td));
>  	if (sh->pd)
>  		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
>  	if (sh->ctx)
> @@ -485,6 +505,10 @@ struct mlx5_dev_spawn_data {
>  	pthread_mutex_destroy(&sh->intr_mutex);
>  	if (sh->pd)
>  		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> +	if (sh->tis)
> +		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
> +	if (sh->td)
> +		claim_zero(mlx5_devx_cmd_destroy(sh->td));
>  	if (sh->ctx)
>  		claim_zero(mlx5_glue->close_device(sh->ctx));
>  	rte_free(sh);
> @@ -976,6 +1000,7 @@ struct mlx5_dev_spawn_data {
>  	.rx_queue_setup = mlx5_rx_queue_setup,
>  	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
>  	.tx_queue_setup = mlx5_tx_queue_setup,
> +	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
>  	.rx_queue_release = mlx5_rx_queue_release,
>  	.tx_queue_release = mlx5_tx_queue_release,
>  	.flow_ctrl_get = mlx5_dev_get_flow_ctrl, @@ -1043,6 +1068,7 @@
> struct mlx5_dev_spawn_data {
>  	.rx_queue_setup = mlx5_rx_queue_setup,
>  	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
>  	.tx_queue_setup = mlx5_tx_queue_setup,
> +	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
>  	.rx_queue_release = mlx5_rx_queue_release,
>  	.tx_queue_release = mlx5_tx_queue_release,
>  	.flow_ctrl_get = mlx5_dev_get_flow_ctrl, diff --git
> a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> a34972c..506920e 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -350,6 +350,43 @@ struct mlx5_devx_rqt_attr {
>  	uint32_t rq_list[];
>  };
> 
> +/* TIS attributes structure. */
> +struct mlx5_devx_tis_attr {
> +	uint32_t strict_lag_tx_port_affinity:1;
> +	uint32_t tls_en:1;
> +	uint32_t lag_tx_port_affinity:4;
> +	uint32_t prio:4;
> +	uint32_t transport_domain:24;
> +};
> +
> +/* SQ attributes structure, used by SQ create operation. */ struct
> +mlx5_devx_create_sq_attr {
> +	uint32_t rlky:1;
> +	uint32_t cd_master:1;
> +	uint32_t fre:1;
> +	uint32_t flush_in_error_en:1;
> +	uint32_t allow_multi_pkt_send_wqe:1;
> +	uint32_t min_wqe_inline_mode:3;
> +	uint32_t state:4;
> +	uint32_t reg_umr:1;
> +	uint32_t allow_swp:1;
> +	uint32_t hairpin:1;
> +	uint32_t user_index:24;
> +	uint32_t cqn:24;
> +	uint32_t packet_pacing_rate_limit_index:16;
> +	uint32_t tis_lst_sz:16;
> +	uint32_t tis_num:24;
> +	struct mlx5_devx_wq_attr wq_attr;
> +};
> +
> +/* SQ attributes structure, used by SQ modify operation. */ struct
> +mlx5_devx_modify_sq_attr {
> +	uint32_t sq_state:4;
> +	uint32_t state:4;
> +	uint32_t hairpin_peer_rq:24;
> +	uint32_t hairpin_peer_vhca:16;
> +};
> +
>  /**
>   * Type of object being allocated.
>   */
> @@ -591,6 +628,8 @@ struct mlx5_ibv_shared {
>  	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
>  	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler.
> */
>  	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp
> obj. */
> +	struct mlx5_devx_obj *tis; /* TIS object. */
> +	struct mlx5_devx_obj *td; /* Transport domain. */
>  	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> };
> 
> @@ -911,5 +950,12 @@ struct mlx5_devx_obj
> *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
>  					struct mlx5_devx_tir_attr *tir_attr);
> struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
>  					struct mlx5_devx_rqt_attr *rqt_attr);
> +struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
> +	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
> +int mlx5_devx_cmd_modify_sq
> +	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr
> *sq_attr);
> +struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
> +	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr); struct
> +mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
> 
>  #endif /* RTE_PMD_MLX5_H_ */
> diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c
> b/drivers/net/mlx5/mlx5_devx_cmds.c
> index b072c37..917bbf9 100644
> --- a/drivers/net/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/net/mlx5/mlx5_devx_cmds.c
> @@ -709,3 +709,189 @@ struct mlx5_devx_obj *
>  	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
>  	return rqt;
>  }
> +
> +/**
> + * Create SQ using DevX API.
> + *
> + * @param[in] ctx
> + *   ibv_context returned from mlx5dv_open_device.
> + * @param [in] sq_attr
> + *   Pointer to SQ attributes structure.
> + * @param [in] socket
> + *   CPU socket ID for allocations.
> + *
> + * @return
> + *   The DevX object created, NULL otherwise and rte_errno is set.
> + **/
> +struct mlx5_devx_obj *
> +mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
> +			struct mlx5_devx_create_sq_attr *sq_attr) {
> +	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
> +	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
> +	void *sq_ctx;
> +	void *wq_ctx;
> +	struct mlx5_devx_wq_attr *wq_attr;
> +	struct mlx5_devx_obj *sq = NULL;
> +
> +	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
> +	if (!sq) {
> +		DRV_LOG(ERR, "Failed to allocate SQ data");
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
> +	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
> +	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
> +	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
> +	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
> +	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr-
> >flush_in_error_en);
> +	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
> +		 sq_attr->flush_in_error_en);
> +	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
> +		 sq_attr->min_wqe_inline_mode);
> +	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
> +	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
> +	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
> +	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
> +	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
> +	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
> +	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
> +		 sq_attr->packet_pacing_rate_limit_index);
> +	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
> +	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
> +	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
> +	wq_attr = &sq_attr->wq_attr;
> +	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
> +	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
> +					     out, sizeof(out));
> +	if (!sq->obj) {
> +		DRV_LOG(ERR, "Failed to create SQ using DevX");
> +		rte_errno = errno;
> +		rte_free(sq);
> +		return NULL;
> +	}
> +	sq->id = MLX5_GET(create_sq_out, out, sqn);
> +	return sq;
> +}
> +
> +/**
> + * Modify SQ using DevX API.
> + *
> + * @param[in] sq
> + *   Pointer to SQ object structure.
> + * @param [in] sq_attr
> + *   Pointer to SQ attributes structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
> +			struct mlx5_devx_modify_sq_attr *sq_attr) {
> +	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
> +	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
> +	void *sq_ctx;
> +	int ret;
> +
> +	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
> +	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
> +	MLX5_SET(modify_sq_in, in, sqn, sq->id);
> +	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
> +	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
> +	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
> +	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr-
> >hairpin_peer_vhca);
> +	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
> +					 out, sizeof(out));
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to modify SQ using DevX");
> +		rte_errno = errno;
> +		return -errno;
> +	}
> +	return ret;
> +}
> +
> +/**
> + * Create TIS using DevX API.
> + *
> + * @param[in] ctx
> + *   ibv_context returned from mlx5dv_open_device.
> + * @param [in] tis_attr
> + *   Pointer to TIS attributes structure.
> + *
> + * @return
> + *   The DevX object created, NULL otherwise and rte_errno is set.
> + */
> +struct mlx5_devx_obj *
> +mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
> +			 struct mlx5_devx_tis_attr *tis_attr) {
> +	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
> +	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
> +	struct mlx5_devx_obj *tis = NULL;
> +	void *tis_ctx;
> +
> +	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
> +	if (!tis) {
> +		DRV_LOG(ERR, "Failed to allocate TIS object");
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
> +	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
> +	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
> +		 tis_attr->strict_lag_tx_port_affinity);
> +	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
> +		 tis_attr->strict_lag_tx_port_affinity);
> +	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
> +	MLX5_SET(tisc, tis_ctx, transport_domain,
> +		 tis_attr->transport_domain);
> +	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
> +					      out, sizeof(out));
> +	if (!tis->obj) {
> +		DRV_LOG(ERR, "Failed to create TIS using DevX");
> +		rte_errno = errno;
> +		rte_free(tis);
> +		return NULL;
> +	}
> +	tis->id = MLX5_GET(create_tis_out, out, tisn);
> +	return tis;
> +}
> +
> +/**
> + * Create transport domain using DevX API.
> + *
> + * @param[in] ctx
> + *   ibv_context returned from mlx5dv_open_device.
> + *
> + * @return
> + *   The DevX object created, NULL otherwise and rte_errno is set.
> + */
> +struct mlx5_devx_obj *
> +mlx5_devx_cmd_create_td(struct ibv_context *ctx) {
> +	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
> +	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
> +	struct mlx5_devx_obj *td = NULL;
> +
> +	td = rte_calloc(__func__, 1, sizeof(*td), 0);
> +	if (!td) {
> +		DRV_LOG(ERR, "Failed to allocate TD object");
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	MLX5_SET(alloc_transport_domain_in, in, opcode,
> +		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
> +	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
> +					     out, sizeof(out));
> +	if (!td->obj) {
> +		DRV_LOG(ERR, "Failed to create TIS using DevX");
> +		rte_errno = errno;
> +		rte_free(td);
> +		return NULL;
> +	}
> +	td->id = MLX5_GET(alloc_transport_domain_out, out,
> +			   transport_domain);
> +	return td;
> +}
> diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
> index 3765df0..faa7996 100644
> --- a/drivers/net/mlx5/mlx5_prm.h
> +++ b/drivers/net/mlx5/mlx5_prm.h
> @@ -666,9 +666,13 @@ enum {
>  	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
>  	MLX5_CMD_OP_CREATE_MKEY = 0x200,
>  	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
> +	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
>  	MLX5_CMD_OP_CREATE_TIR = 0x900,
> +	MLX5_CMD_OP_CREATE_SQ = 0X904,
> +	MLX5_CMD_OP_MODIFY_SQ = 0X905,
>  	MLX5_CMD_OP_CREATE_RQ = 0x908,
>  	MLX5_CMD_OP_MODIFY_RQ = 0x909,
> +	MLX5_CMD_OP_CREATE_TIS = 0x912,
>  	MLX5_CMD_OP_QUERY_TIS = 0x915,
>  	MLX5_CMD_OP_CREATE_RQT = 0x916,
>  	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939, @@ -1311,6
> +1315,23 @@ struct mlx5_ifc_query_tis_in_bits {
>  	u8 reserved_at_60[0x20];
>  };
> 
> +struct mlx5_ifc_alloc_transport_domain_out_bits {
> +	u8 status[0x8];
> +	u8 reserved_at_8[0x18];
> +	u8 syndrome[0x20];
> +	u8 reserved_at_40[0x8];
> +	u8 transport_domain[0x18];
> +	u8 reserved_at_60[0x20];
> +};
> +
> +struct mlx5_ifc_alloc_transport_domain_in_bits {
> +	u8 opcode[0x10];
> +	u8 reserved_at_10[0x10];
> +	u8 reserved_at_20[0x10];
> +	u8 op_mod[0x10];
> +	u8 reserved_at_40[0x40];
> +};
> +
>  enum {
>  	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
>  	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
> @@ -1427,6 +1448,24 @@ struct mlx5_ifc_modify_rq_out_bits {
>  	u8 reserved_at_40[0x40];
>  };
> 
> +struct mlx5_ifc_create_tis_out_bits {
> +	u8 status[0x8];
> +	u8 reserved_at_8[0x18];
> +	u8 syndrome[0x20];
> +	u8 reserved_at_40[0x8];
> +	u8 tisn[0x18];
> +	u8 reserved_at_60[0x20];
> +};
> +
> +struct mlx5_ifc_create_tis_in_bits {
> +	u8 opcode[0x10];
> +	u8 uid[0x10];
> +	u8 reserved_at_20[0x10];
> +	u8 op_mod[0x10];
> +	u8 reserved_at_40[0xc0];
> +	struct mlx5_ifc_tisc_bits ctx;
> +};
> +
>  enum {
>  	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
>  	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1, @@ -
> 1572,6 +1611,85 @@ struct mlx5_ifc_create_rqt_in_bits {  #pragma GCC
> diagnostic error "-Wpedantic"
>  #endif
> 
> +struct mlx5_ifc_sqc_bits {
> +	u8 rlky[0x1];
> +	u8 cd_master[0x1];
> +	u8 fre[0x1];
> +	u8 flush_in_error_en[0x1];
> +	u8 allow_multi_pkt_send_wqe[0x1];
> +	u8 min_wqe_inline_mode[0x3];
> +	u8 state[0x4];
> +	u8 reg_umr[0x1];
> +	u8 allow_swp[0x1];
> +	u8 hairpin[0x1];
> +	u8 reserved_at_f[0x11];
> +	u8 reserved_at_20[0x8];
> +	u8 user_index[0x18];
> +	u8 reserved_at_40[0x8];
> +	u8 cqn[0x18];
> +	u8 reserved_at_60[0x8];
> +	u8 hairpin_peer_rq[0x18];
> +	u8 reserved_at_80[0x10];
> +	u8 hairpin_peer_vhca[0x10];
> +	u8 reserved_at_a0[0x50];
> +	u8 packet_pacing_rate_limit_index[0x10];
> +	u8 tis_lst_sz[0x10];
> +	u8 reserved_at_110[0x10];
> +	u8 reserved_at_120[0x40];
> +	u8 reserved_at_160[0x8];
> +	u8 tis_num_0[0x18];
> +	struct mlx5_ifc_wq_bits wq;
> +};
> +
> +struct mlx5_ifc_query_sq_in_bits {
> +	u8 opcode[0x10];
> +	u8 reserved_at_10[0x10];
> +	u8 reserved_at_20[0x10];
> +	u8 op_mod[0x10];
> +	u8 reserved_at_40[0x8];
> +	u8 sqn[0x18];
> +	u8 reserved_at_60[0x20];
> +};
> +
> +struct mlx5_ifc_modify_sq_out_bits {
> +	u8 status[0x8];
> +	u8 reserved_at_8[0x18];
> +	u8 syndrome[0x20];
> +	u8 reserved_at_40[0x40];
> +};
> +
> +struct mlx5_ifc_modify_sq_in_bits {
> +	u8 opcode[0x10];
> +	u8 uid[0x10];
> +	u8 reserved_at_20[0x10];
> +	u8 op_mod[0x10];
> +	u8 sq_state[0x4];
> +	u8 reserved_at_44[0x4];
> +	u8 sqn[0x18];
> +	u8 reserved_at_60[0x20];
> +	u8 modify_bitmask[0x40];
> +	u8 reserved_at_c0[0x40];
> +	struct mlx5_ifc_sqc_bits ctx;
> +};
> +
> +struct mlx5_ifc_create_sq_out_bits {
> +	u8 status[0x8];
> +	u8 reserved_at_8[0x18];
> +	u8 syndrome[0x20];
> +	u8 reserved_at_40[0x8];
> +	u8 sqn[0x18];
> +	u8 reserved_at_60[0x20];
> +};
> +
> +struct mlx5_ifc_create_sq_in_bits {
> +	u8 opcode[0x10];
> +	u8 uid[0x10];
> +	u8 reserved_at_20[0x10];
> +	u8 op_mod[0x10];
> +	u8 reserved_at_40[0xc0];
> +	struct mlx5_ifc_sqc_bits ctx;
> +};
> +
>  /* CQE format mask. */
>  #define MLX5E_CQE_FORMAT_MASK 0xc
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 2de674a..8fa22e5 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -324,14 +324,18 @@ struct mlx5_txq_obj {
>  	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
>  	rte_atomic32_t refcnt; /* Reference counter. */
>  	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
> -	enum mlx5_rxq_obj_type type; /* The txq object type. */
> +	enum mlx5_txq_obj_type type; /* The txq object type. */
>  	RTE_STD_C11
>  	union {
>  		struct {
>  			struct ibv_cq *cq; /* Completion Queue. */
>  			struct ibv_qp *qp; /* Queue Pair. */
>  		};
> -		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
> +		struct {
> +			struct mlx5_devx_obj *sq;
> +			/* DevX object for Sx queue. */
> +			struct mlx5_devx_obj *tis; /* The TIS object. */
> +		};
>  	};
>  };
> 
> @@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
>  	off_t uar_mmap_offset; /* UAR mmap offset for non-primary
> process. */
>  	void *bf_reg; /* BlueFlame register from Verbs. */
>  	uint16_t dump_file_n; /* Number of dump files. */
> +	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration.
> */
>  	struct mlx5_txq_data txq; /* Data path structure. */
>  	/* Must be the last field in the structure, contains elts[]. */  }; @@ -
> 412,15 +417,24 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev
> *dev,
> 
>  int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t
> desc,
>  			unsigned int socket, const struct rte_eth_txconf
> *conf);
> +int mlx5_tx_hairpin_queue_setup
> +	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +	 unsigned int socket, const struct rte_eth_txconf *conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>  void mlx5_tx_queue_release(void *dpdk_txq);  int
> mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd); -struct
> mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
> +struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t
> idx,
> +				      enum mlx5_txq_obj_type type);
>  struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t
> idx);  int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);  int
> mlx5_txq_obj_verify(struct rte_eth_dev *dev);  struct mlx5_txq_ctrl
> *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
>  				   uint16_t desc, unsigned int socket,
>  				   const struct rte_eth_txconf *conf);
> +struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
> +	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +	 unsigned int socket, const struct rte_eth_txconf *conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>  struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
> int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);  int
> mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx); diff --git
> a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index
> 50c4df5..3ec86c4 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -51,8 +51,14 @@
> 
>  		if (!txq_ctrl)
>  			continue;
> -		txq_alloc_elts(txq_ctrl);
> -		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
> +		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
> +			txq_ctrl->obj = mlx5_txq_obj_new
> +				(dev, i,
> MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
> +		} else {
> +			txq_alloc_elts(txq_ctrl);
> +			txq_ctrl->obj = mlx5_txq_obj_new
> +				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
> +		}
>  		if (!txq_ctrl->obj) {
>  			rte_errno = ENOMEM;
>  			goto error;
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> e1ed4eb..44233e9 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -136,30 +136,22 @@
>  }
> 
>  /**
> - * DPDK callback to configure a TX queue.
> + * Tx queue presetup checks.
>   *
>   * @param dev
>   *   Pointer to Ethernet device structure.
>   * @param idx
> - *   TX queue index.
> + *   Tx queue index.
>   * @param desc
>   *   Number of descriptors to configure in queue.
> - * @param socket
> - *   NUMA socket on which memory must be allocated.
> - * @param[in] conf
> - *   Thresholds parameters.
>   *
>   * @return
>   *   0 on success, a negative errno value otherwise and rte_errno is set.
>   */
> -int
> -mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> -		    unsigned int socket, const struct rte_eth_txconf *conf)
> +static int
> +mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t
> +desc)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> -	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
> -	struct mlx5_txq_ctrl *txq_ctrl =
> -		container_of(txq, struct mlx5_txq_ctrl, txq);
> 
>  	if (desc <= MLX5_TX_COMP_THRESH) {
>  		DRV_LOG(WARNING,
> @@ -191,6 +183,38 @@
>  		return -rte_errno;
>  	}
>  	mlx5_txq_release(dev, idx);
> +	return 0;
> +}
> +/**
> + * DPDK callback to configure a TX queue.
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   TX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param[in] conf
> + *   Thresholds parameters.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +		    unsigned int socket, const struct rte_eth_txconf *conf) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
> +	struct mlx5_txq_ctrl *txq_ctrl =
> +		container_of(txq, struct mlx5_txq_ctrl, txq);
> +	int res;
> +
> +	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
> +	if (res)
> +		return res;
>  	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
>  	if (!txq_ctrl) {
>  		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> @@ -204,6 +228,63 @@  }
> 
>  /**
> + * DPDK callback to configure a TX hairpin queue.
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   TX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param[in] conf
> + *   Thresholds parameters.
> + * @param[in] hairpin_conf
> + *   The hairpin binding configuration.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
> +			    uint16_t desc, unsigned int socket,
> +			    const struct rte_eth_txconf *conf,
> +			    const struct rte_eth_hairpin_conf *hairpin_conf) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
> +	struct mlx5_txq_ctrl *txq_ctrl =
> +		container_of(txq, struct mlx5_txq_ctrl, txq);
> +	int res;
> +
> +	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
> +	if (res)
> +		return res;
> +	if (hairpin_conf->peer_n != 1 ||
> +	    hairpin_conf->peers[0].port != dev->data->port_id ||
> +	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
> +		DRV_LOG(ERR, "port %u unable to setup hairpin queue index
> %u "
> +			" invalid hairpind configuration", dev->data->port_id,
> +			idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc, socket, conf,
> +					hairpin_conf);
> +	if (!txq_ctrl) {
> +		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
> +			dev->data->port_id, idx);
> +		return -rte_errno;
> +	}
> +	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
> +		dev->data->port_id, idx);
> +	(*priv->txqs)[idx] = &txq_ctrl->txq;
> +	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
> +	return 0;
> +}
> +
> +/**
>   * DPDK callback to release a TX queue.
>   *
>   * @param dpdk_txq
> @@ -246,6 +327,8 @@
>  	const size_t page_size = sysconf(_SC_PAGESIZE);  #endif
> 
> +	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
> +		return;
>  	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
>  	assert(ppriv);
>  	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg; @@ -282,6
> +365,8 @@
>  	uintptr_t offset;
>  	const size_t page_size = sysconf(_SC_PAGESIZE);
> 
> +	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
> +		return 0;
>  	assert(ppriv);
>  	/*
>  	 * As rdma-core, UARs are mapped in size of OS page @@ -316,6
> +401,8 @@
>  	const size_t page_size = sysconf(_SC_PAGESIZE);
>  	void *addr;
> 
> +	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
> +		return;
>  	addr = ppriv->uar_table[txq_ctrl->txq.idx];
>  	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);  }
> @@ -346,6 +433,8 @@
>  			continue;
>  		txq = (*priv->txqs)[i];
>  		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
> +		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
> +			continue;
>  		assert(txq->idx == (uint16_t)i);
>  		ret = txq_uar_init_secondary(txq_ctrl, fd);
>  		if (ret)
> @@ -365,18 +454,87 @@
>  }
> 
>  /**
> + * Create the Tx hairpin queue object.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param idx
> + *   Queue index in DPDK Tx queue array
> + *
> + * @return
> + *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
> + */
> +static struct mlx5_txq_obj *
> +mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
> +	struct mlx5_txq_ctrl *txq_ctrl =
> +		container_of(txq_data, struct mlx5_txq_ctrl, txq);
> +	struct mlx5_devx_create_sq_attr attr = { 0 };
> +	struct mlx5_txq_obj *tmpl = NULL;
> +	int ret = 0;
> +
> +	assert(txq_data);
> +	assert(!txq_ctrl->obj);
> +	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
> +				 txq_ctrl->socket);
> +	if (!tmpl) {
> +		DRV_LOG(ERR,
> +			"port %u Tx queue %u cannot allocate memory
> resources",
> +			dev->data->port_id, txq_data->idx);
> +		rte_errno = ENOMEM;
> +		goto error;
> +	}
> +	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
> +	tmpl->txq_ctrl = txq_ctrl;
> +	attr.hairpin = 1;
> +	attr.tis_lst_sz = 1;
> +	/* Workaround for hairpin startup */
> +	attr.wq_attr.log_hairpin_num_packets = log2above(32);
> +	/* Workaround for packets larger than 1KB */
> +	attr.wq_attr.log_hairpin_data_sz =
> +			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
> +	attr.tis_num = priv->sh->tis->id;
> +	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
> +	if (!tmpl->sq) {
> +		DRV_LOG(ERR,
> +			"port %u tx hairpin queue %u can't create sq object",
> +			dev->data->port_id, idx);
> +		rte_errno = errno;
> +		goto error;
> +	}
> +	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data-
> >port_id,
> +		idx, (void *)&tmpl);
> +	rte_atomic32_inc(&tmpl->refcnt);
> +	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
> +	return tmpl;
> +error:
> +	ret = rte_errno; /* Save rte_errno before cleanup. */
> +	if (tmpl->tis)
> +		mlx5_devx_cmd_destroy(tmpl->tis);
> +	if (tmpl->sq)
> +		mlx5_devx_cmd_destroy(tmpl->sq);
> +	rte_errno = ret; /* Restore rte_errno. */
> +	return NULL;
> +}
> +
> +/**
>   * Create the Tx queue Verbs object.
>   *
>   * @param dev
>   *   Pointer to Ethernet device.
>   * @param idx
>   *   Queue index in DPDK Tx queue array.
> + * @param type
> + *   Type of the Tx queue object to create.
>   *
>   * @return
>   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
>   */
>  struct mlx5_txq_obj *
> -mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
> +mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
> +		 enum mlx5_txq_obj_type type)
>  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx]; @@ -396,6
> +554,8 @@ struct mlx5_txq_obj *
>  	const int desc = 1 << txq_data->elts_n;
>  	int ret = 0;
> 
> +	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
> +		return mlx5_txq_obj_hairpin_new(dev, idx);
>  #ifdef HAVE_IBV_FLOW_DV_SUPPORT
>  	/* If using DevX, need additional mask to read tisn value. */
>  	if (priv->config.devx && !priv->sh->tdn) @@ -643,8 +803,13 @@
> struct mlx5_txq_obj *  {
>  	assert(txq_obj);
>  	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
> -		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
> -		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
> +		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
> +			if (txq_obj->tis)
> +
> 	claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
> +		} else {
> +			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
> +			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
> +		}
>  		LIST_REMOVE(txq_obj, next);
>  		rte_free(txq_obj);
>  		return 0;
> @@ -953,6 +1118,7 @@ struct mlx5_txq_ctrl *
>  		goto error;
>  	}
>  	rte_atomic32_inc(&tmpl->refcnt);
> +	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
>  	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
>  	return tmpl;
>  error:
> @@ -961,6 +1127,55 @@ struct mlx5_txq_ctrl *  }
> 
>  /**
> + * Create a DPDK Tx hairpin queue.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param idx
> + *   TX queue index.
> + * @param desc
> + *   Number of descriptors to configure in queue.
> + * @param socket
> + *   NUMA socket on which memory must be allocated.
> + * @param[in] conf
> + *  Thresholds parameters.
> + * @param hairpin_conf
> + *  The hairpin configuration.
> + *
> + * @return
> + *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
> + */
> +struct mlx5_txq_ctrl *
> +mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
> +		     unsigned int socket, const struct rte_eth_txconf *conf,
> +		     const struct rte_eth_hairpin_conf *hairpin_conf) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_ctrl *tmpl;
> +
> +	tmpl = rte_calloc_socket("TXQ", 1,
> +				 sizeof(*tmpl),
> +				 0, socket);
> +	if (!tmpl) {
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	assert(desc > MLX5_TX_COMP_THRESH);
> +	tmpl->txq.offloads = conf->offloads |
> +			     dev->data->dev_conf.txmode.offloads;
> +	tmpl->priv = priv;
> +	tmpl->socket = socket;
> +	tmpl->txq.elts_n = log2above(desc);
> +	tmpl->txq.port_id = dev->data->port_id;
> +	tmpl->txq.idx = idx;
> +	tmpl->hairpin_conf = *hairpin_conf;
> +	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
> +	rte_atomic32_inc(&tmpl->refcnt);
> +	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
> +	return tmpl;
> +}
> +
> +/**
>   * Get a Tx queue.
>   *
>   * @param dev
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support Ori Kam
@ 2019-09-26  9:32   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:32 UTC (permalink / raw)
  To: Ori Kam, Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, Ori Kam, stephen

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ori Kam
> Sent: Thursday, September 26, 2019 9:29
> To: Wenzhuo Lu <wenzhuo.lu@intel.com>; Jingjing Wu
> <jingjing.wu@intel.com>; Bernard Iremonger
> <bernard.iremonger@intel.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>;
> stephen@networkplumber.org
> Subject: [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support
> 
> This commit introduce the hairpin queues to the testpmd.
> the hairpin queue is configured using --hairpinq=<n> the hairpin queue adds n
> queue objects for both the total number of TX queues and RX queues.
> The connection between the queues are 1 to 1, first Rx hairpin queue will be
> connected to the first Tx hairpin queue
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  app/test-pmd/parameters.c | 12 ++++++++++
>  app/test-pmd/testpmd.c    | 59
> +++++++++++++++++++++++++++++++++++++++++++++--
>  app/test-pmd/testpmd.h    |  1 +
>  3 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> 6c78dca..16bdcc8 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -147,6 +147,8 @@
>  	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
>  	printf("  --txq=N: set the number of TX queues per port to N.\n");
>  	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
> +	printf("  --hairpinq=N: set the number of hairpin queues per port to "
> +	       "N.\n");
>  	printf("  --burst=N: set the number of packets per burst to N.\n");
>  	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
>  	printf("  --rxpt=N: set prefetch threshold register of RX rings to
> N.\n"); @@ -618,6 +620,7 @@
>  		{ "txq",			1, 0, 0 },
>  		{ "rxd",			1, 0, 0 },
>  		{ "txd",			1, 0, 0 },
> +		{ "hairpinq",			1, 0, 0 },
>  		{ "burst",			1, 0, 0 },
>  		{ "mbcache",			1, 0, 0 },
>  		{ "txpt",			1, 0, 0 },
> @@ -1036,6 +1039,15 @@
>  						  " >= 0 && <= %u\n", n,
> 
> get_allowed_max_nb_txq(&pid));
>  			}
> +			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
> +				n = atoi(optarg);
> +				if (n >= 0 && check_nb_txq((queueid_t)n) ==
> 0)
> +					nb_hairpinq = (queueid_t) n;
> +				else
> +					rte_exit(EXIT_FAILURE, "txq %d
> invalid - must be"
> +						  " >= 0 && <= %u\n", n,
> +
> get_allowed_max_nb_txq(&pid));
> +			}
>  			if (!nb_rxq && !nb_txq) {
>  				rte_exit(EXIT_FAILURE, "Either rx or tx
> queues should "
>  						"be non-zero\n");
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> de91e1b..f15a308 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
>  /*
>   * Configurable number of RX/TX queues.
>   */
> +queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
>  queueid_t nb_rxq = 1; /**< Number of RX queues per port. */  queueid_t
> nb_txq = 1; /**< Number of TX queues per port. */
> 
> @@ -2064,6 +2065,10 @@ struct extmem_param {
>  	queueid_t qi;
>  	struct rte_port *port;
>  	struct rte_ether_addr mac_addr;
> +	struct rte_eth_hairpin_conf hairpin_conf = {
> +		.peer_n = 1,
> +	};
> +	int i;
> 
>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>  		return 0;
> @@ -2097,8 +2102,9 @@ struct extmem_param {
>  			printf("Configuring Port %d (socket %u)\n", pi,
>  					port->socket_id);
>  			/* configure port */
> -			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
> -						&(port->dev_conf));
> +			diag = rte_eth_dev_configure(pi, nb_rxq +
> nb_hairpinq,
> +						     nb_txq + nb_hairpinq,
> +						     &(port->dev_conf));
>  			if (diag != 0) {
>  				if (rte_atomic16_cmpset(&(port-
> >port_status),
>  				RTE_PORT_HANDLING,
> RTE_PORT_STOPPED) == 0) @@ -2191,6 +2197,55 @@ struct
> extmem_param {
>  				port->need_reconfig_queues = 1;
>  				return -1;
>  			}
> +			/* setup hairpin queues */
> +			i = 0;
> +			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
> +				hairpin_conf.peers[0].port = pi;
> +				hairpin_conf.peers[0].queue = i + nb_rxq;
> +				diag = rte_eth_tx_hairpin_queue_setup
> +					(pi, qi, nb_txd,
> +					 port->socket_id, &(port-
> >tx_conf[qi]),
> +					 &hairpin_conf);
> +				i++;
> +				if (diag == 0)
> +					continue;
> +
> +				/* Fail to setup rx queue, return */
> +				if (rte_atomic16_cmpset(&(port-
> >port_status),
> +
> 	RTE_PORT_HANDLING,
> +
> 	RTE_PORT_STOPPED) == 0)
> +					printf("Port %d can not be set back "
> +							"to stopped\n", pi);
> +				printf("Fail to configure port %d hairpin "
> +				       "queues\n", pi);
> +				/* try to reconfigure queues next time */
> +				port->need_reconfig_queues = 1;
> +				return -1;
> +			}
> +			i = 0;
> +			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
> +				hairpin_conf.peers[0].port = pi;
> +				hairpin_conf.peers[0].queue = i + nb_txq;
> +				diag = rte_eth_rx_hairpin_queue_setup
> +					(pi, qi, nb_rxd,
> +					 port->socket_id, &(port-
> >rx_conf[qi]),
> +					 &hairpin_conf);
> +				i++;
> +				if (diag == 0)
> +					continue;
> +
> +				/* Fail to setup rx queue, return */
> +				if (rte_atomic16_cmpset(&(port-
> >port_status),
> +
> 	RTE_PORT_HANDLING,
> +
> 	RTE_PORT_STOPPED) == 0)
> +					printf("Port %d can not be set back "
> +							"to stopped\n", pi);
> +				printf("Fail to configure port %d hairpin "
> +				       "queues\n", pi);
> +				/* try to reconfigure queues next time */
> +				port->need_reconfig_queues = 1;
> +				return -1;
> +			}
>  		}
>  		configure_rxtx_dump_callbacks(verbose_level);
>  		/* start port */
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> d73955d..09baa72 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -383,6 +383,7 @@ struct queue_stats_mappings {
> 
>  extern uint64_t rss_hf;
> 
> +extern queueid_t nb_hairpinq;
>  extern queueid_t nb_rxq;
>  extern queueid_t nb_txq;
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function Ori Kam
@ 2019-09-26  9:33   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:33 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 07/13] net/mlx5: add hairpin binding function
> 
> When starting the port, in addition to creating the queues we need to bind
> the hairpin queues.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.h           |  1 +
>  drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
>  drivers/net/mlx5/mlx5_prm.h       |  6 +++
>  drivers/net/mlx5/mlx5_trigger.c   | 97
> +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 105 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 506920e..41eb35a 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -188,6 +188,7 @@ struct mlx5_hca_attr {
>  	uint32_t log_max_hairpin_queues:5;
>  	uint32_t log_max_hairpin_wq_data_sz:5;
>  	uint32_t log_max_hairpin_num_packets:5;
> +	uint32_t vhca_id:16;
>  };
> 
>  /* Flow list . */
> diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c
> b/drivers/net/mlx5/mlx5_devx_cmds.c
> index 917bbf9..0243733 100644
> --- a/drivers/net/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/net/mlx5/mlx5_devx_cmds.c
> @@ -334,6 +334,7 @@ struct mlx5_devx_obj *
> 
> log_max_hairpin_wq_data_sz);
>  	attr->log_max_hairpin_num_packets = MLX5_GET
>  		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
> +	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
>  	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
>  					  eth_net_offloads);
>  	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt); diff --git
> a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index
> faa7996..d4084db 100644
> --- a/drivers/net/mlx5/mlx5_prm.h
> +++ b/drivers/net/mlx5/mlx5_prm.h
> @@ -1611,6 +1611,12 @@ struct mlx5_ifc_create_rqt_in_bits {  #pragma
> GCC diagnostic error "-Wpedantic"
>  #endif
> 
> +enum {
> +	MLX5_SQC_STATE_RST  = 0x0,
> +	MLX5_SQC_STATE_RDY  = 0x1,
> +	MLX5_SQC_STATE_ERR  = 0x3,
> +};
> +
>  struct mlx5_ifc_sqc_bits {
>  	u8 rlky[0x1];
>  	u8 cd_master[0x1];
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index 3ec86c4..a4fcdb3 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -162,6 +162,96 @@
>  }
> 
>  /**
> + * Binds Tx queues to Rx queues for hairpin.
> + *
> + * Binds Tx queues to the target Rx queues.
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +mlx5_hairpin_bind(struct rte_eth_dev *dev) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
> +	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +	struct mlx5_rxq_ctrl *rxq_ctrl;
> +	struct mlx5_devx_obj *sq;
> +	struct mlx5_devx_obj *rq;
> +	unsigned int i;
> +	int ret = 0;
> +
> +	for (i = 0; i != priv->txqs_n; ++i) {
> +		txq_ctrl = mlx5_txq_get(dev, i);
> +		if (!txq_ctrl)
> +			continue;
> +		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
> +			mlx5_txq_release(dev, i);
> +			continue;
> +		}
> +		if (!txq_ctrl->obj) {
> +			rte_errno = ENOMEM;
> +			DRV_LOG(ERR, "port %u no txq object found: %d",
> +				dev->data->port_id, i);
> +			mlx5_txq_release(dev, i);
> +			return -rte_errno;
> +		}
> +		sq = txq_ctrl->obj->sq;
> +		rxq_ctrl = mlx5_rxq_get(dev,
> +					txq_ctrl-
> >hairpin_conf.peers[0].queue);
> +		if (!rxq_ctrl) {
> +			mlx5_txq_release(dev, i);
> +			rte_errno = EINVAL;
> +			DRV_LOG(ERR, "port %u no rxq object found: %d",
> +				dev->data->port_id,
> +				txq_ctrl->hairpin_conf.peers[0].queue);
> +			return -rte_errno;
> +		}
> +		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
> +		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
> +			rte_errno = ENOMEM;
> +			DRV_LOG(ERR, "port %u Tx queue %d can't be binded
> to "
> +				"Rx queue %d", dev->data->port_id,
> +				i, txq_ctrl->hairpin_conf.peers[0].queue);
> +			goto error;
> +		}
> +		rq = rxq_ctrl->obj->rq;
> +		if (!rq) {
> +			rte_errno = ENOMEM;
> +			DRV_LOG(ERR, "port %u hairpin no matching rxq:
> %d",
> +				dev->data->port_id,
> +				txq_ctrl->hairpin_conf.peers[0].queue);
> +			goto error;
> +		}
> +		sq_attr.state = MLX5_SQC_STATE_RDY;
> +		sq_attr.sq_state = MLX5_SQC_STATE_RST;
> +		sq_attr.hairpin_peer_rq = rq->id;
> +		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
> +		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
> +		if (ret)
> +			goto error;
> +		rq_attr.state = MLX5_SQC_STATE_RDY;
> +		rq_attr.rq_state = MLX5_SQC_STATE_RST;
> +		rq_attr.hairpin_peer_sq = sq->id;
> +		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
> +		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
> +		if (ret)
> +			goto error;
> +		mlx5_txq_release(dev, i);
> +		mlx5_rxq_release(dev, txq_ctrl-
> >hairpin_conf.peers[0].queue);
> +	}
> +	return 0;
> +error:
> +	mlx5_txq_release(dev, i);
> +	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
> +	return -rte_errno;
> +}
> +
> +/**
>   * DPDK callback to start the device.
>   *
>   * Simulate device start by attaching all configured flows.
> @@ -192,6 +282,13 @@
>  		mlx5_txq_stop(dev);
>  		return -rte_errno;
>  	}
> +	ret = mlx5_hairpin_bind(dev);
> +	if (ret) {
> +		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
> +			dev->data->port_id, strerror(rte_errno));
> +		mlx5_txq_stop(dev);
> +		return -rte_errno;
> +	}
>  	dev->data->dev_started = 1;
>  	ret = mlx5_rx_intr_vec_enable(dev);
>  	if (ret) {
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-09-26  9:33   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:33 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 08/13] net/mlx5: add support for hairpin hrxq
> 
> The hairpin hrxq is based on the DevX hrxq but uses different pd.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5_rxq.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> a673da9..bf39112 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -2344,13 +2344,13 @@ struct mlx5_hrxq *
>  	struct mlx5_ind_table_obj *ind_tbl;
>  	int err;
>  	struct mlx5_devx_obj *tir = NULL;
> +	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
> +	struct mlx5_rxq_ctrl *rxq_ctrl =
> +		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> 
>  	queues_n = hash_fields ? queues_n : 1;
>  	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
>  	if (!ind_tbl) {
> -		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
> -		struct mlx5_rxq_ctrl *rxq_ctrl =
> -			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
>  		enum mlx5_ind_tbl_type type;
> 
>  		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
> @@ -2446,7 +2446,10 @@ struct mlx5_hrxq *
>  		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
>  		memcpy(&tir_attr.rx_hash_field_selector_outer,
> &hash_fields,
>  		       sizeof(uint64_t));
> -		tir_attr.transport_domain = priv->sh->tdn;
> +		if (rxq_ctrl->obj->type ==
> MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
> +			tir_attr.transport_domain = priv->sh->td->id;
> +		else
> +			tir_attr.transport_domain = priv->sh->tdn;
>  		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
>  		tir_attr.indirect_table = ind_tbl->rqt->id;
>  		if (dev->data->dev_conf.lpbk_mode)
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action Ori Kam
@ 2019-09-26  9:33   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:33 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 09/13] net/mlx5: add internal tag item and action
> 
> This commit introduce the setting and matching on regiters.
> This item and and action will be used with number of different features like
> hairpin, metering, metadata.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
>  drivers/net/mlx5/mlx5_flow.h    |  54 ++++++++++++--
>  drivers/net/mlx5/mlx5_flow_dv.c | 158
> +++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_prm.h     |   3 +-
>  4 files changed, 257 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 482f65b..00afc18 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
>  	},
>  };
> 
> +enum mlx5_feature_name {
> +	MLX5_HAIRPIN_RX,
> +	MLX5_HAIRPIN_TX,
> +	MLX5_APPLICATION,
> +};
> +
> +/**
> + * Translate tag ID to register.
> + *
> + * @param[in] dev
> + *   Pointer to the Ethernet device structure.
> + * @param[in] feature
> + *   The feature that request the register.
> + * @param[in] id
> + *   The request register ID.
> + * @param[out] error
> + *   Error description in case of any.
> + *
> + * @return
> + *   The request register on success, a negative errno
> + *   value otherwise and rte_errno is set.
> + */
> +__rte_unused
> +static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
> +				       enum mlx5_feature_name feature,
> +				       uint32_t id,
> +				       struct rte_flow_error *error) {
> +	static enum modify_reg id2reg[] = {
> +		[0] = REG_A,
> +		[1] = REG_C_2,
> +		[2] = REG_C_3,
> +		[3] = REG_C_4,
> +		[4] = REG_B,};
> +
> +	dev = (void *)dev;
> +	switch (feature) {
> +	case MLX5_HAIRPIN_RX:
> +		return REG_B;
> +	case MLX5_HAIRPIN_TX:
> +		return REG_A;
> +	case MLX5_APPLICATION:
> +		if (id > 4)
> +			return rte_flow_error_set(error, EINVAL,
> +
> RTE_FLOW_ERROR_TYPE_ITEM,
> +						  NULL, "invalid tag id");
> +		return id2reg[id];
> +	}
> +	return rte_flow_error_set(error, EINVAL,
> RTE_FLOW_ERROR_TYPE_ITEM,
> +				  NULL, "invalid feature name");
> +}
> +
>  /**
>   * Discover the maximum number of priority available.
>   *
> diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
> index 235bccd..0148c1b 100644
> --- a/drivers/net/mlx5/mlx5_flow.h
> +++ b/drivers/net/mlx5/mlx5_flow.h
> @@ -27,6 +27,43 @@
>  #include "mlx5.h"
>  #include "mlx5_prm.h"
> 
> +enum modify_reg {
> +	REG_A,
> +	REG_B,
> +	REG_C_0,
> +	REG_C_1,
> +	REG_C_2,
> +	REG_C_3,
> +	REG_C_4,
> +	REG_C_5,
> +	REG_C_6,
> +	REG_C_7,
> +};
> +
> +/* Private rte flow items. */
> +enum mlx5_rte_flow_item_type {
> +	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
> +	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
> +};
> +
> +/* Private rte flow actions. */
> +enum mlx5_rte_flow_action_type {
> +	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
> +	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
> +};
> +
> +/* Matches on selected register. */
> +struct mlx5_rte_flow_item_tag {
> +	uint16_t id;
> +	rte_be32_t data;
> +};
> +
> +/* Modify selected register. */
> +struct mlx5_rte_flow_action_set_tag {
> +	uint16_t id;
> +	rte_be32_t data;
> +};
> +
>  /* Pattern outer Layer bits. */
>  #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)  #define
> MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1) @@ -53,16 +90,17 @@
>  /* General pattern items bits. */
>  #define MLX5_FLOW_ITEM_METADATA (1u << 16)  #define
> MLX5_FLOW_ITEM_PORT_ID (1u << 17)
> +#define MLX5_FLOW_ITEM_TAG (1u << 18)
> 
>  /* Pattern MISC bits. */
> -#define MLX5_FLOW_LAYER_ICMP (1u << 18) -#define
> MLX5_FLOW_LAYER_ICMP6 (1u << 19) -#define
> MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
> +#define MLX5_FLOW_LAYER_ICMP (1u << 19) #define
> MLX5_FLOW_LAYER_ICMP6
> +(1u << 20) #define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
> 
>  /* Pattern tunnel Layer bits (continued). */ -#define
> MLX5_FLOW_LAYER_IPIP (1u << 21) -#define
> MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 22) -#define
> MLX5_FLOW_LAYER_NVGRE (1u << 23)
> +#define MLX5_FLOW_LAYER_IPIP (1u << 22) #define
> +MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23) #define
> MLX5_FLOW_LAYER_NVGRE (1u
> +<< 24)
> 
>  /* Outer Masks. */
>  #define MLX5_FLOW_LAYER_OUTER_L3 \
> @@ -139,6 +177,7 @@
>  #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)  #define
> MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)  #define
> MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
> +#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
> 
>  #define MLX5_FLOW_FATE_ACTIONS \
>  	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
> @@ -172,7 +211,8 @@
>  				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
>  				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
>  				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
> -
> MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
> +
> MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
> +				      MLX5_FLOW_ACTION_SET_TAG)
> 
>  #define MLX5_FLOW_VLAN_ACTIONS
> (MLX5_FLOW_ACTION_OF_POP_VLAN | \
>  				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
> diff --git a/drivers/net/mlx5/mlx5_flow_dv.c
> b/drivers/net/mlx5/mlx5_flow_dv.c index 2a7e3ed..dde0831 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
> 
> MLX5_MODIFICATION_TYPE_ADD, error);  }
> 
> +static enum mlx5_modification_field reg_to_field[] = {
> +	[REG_A] = MLX5_MODI_META_DATA_REG_A,
> +	[REG_B] = MLX5_MODI_META_DATA_REG_B,
> +	[REG_C_0] = MLX5_MODI_META_REG_C_0,
> +	[REG_C_1] = MLX5_MODI_META_REG_C_1,
> +	[REG_C_2] = MLX5_MODI_META_REG_C_2,
> +	[REG_C_3] = MLX5_MODI_META_REG_C_3,
> +	[REG_C_4] = MLX5_MODI_META_REG_C_4,
> +	[REG_C_5] = MLX5_MODI_META_REG_C_5,
> +	[REG_C_6] = MLX5_MODI_META_REG_C_6,
> +	[REG_C_7] = MLX5_MODI_META_REG_C_7,
> +};
> +
> +/**
> + * Convert register set to DV specification.
> + *
> + * @param[in,out] resource
> + *   Pointer to the modify-header resource.
> + * @param[in] action
> + *   Pointer to action specification.
> + * @param[out] error
> + *   Pointer to the error structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +flow_dv_convert_action_set_reg
> +			(struct mlx5_flow_dv_modify_hdr_resource
> *resource,
> +			 const struct rte_flow_action *action,
> +			 struct rte_flow_error *error)
> +{
> +	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
> +	struct mlx5_modification_cmd *actions = resource->actions;
> +	uint32_t i = resource->actions_num;
> +
> +	if (i >= MLX5_MODIFY_NUM)
> +		return rte_flow_error_set(error, EINVAL,
> +					  RTE_FLOW_ERROR_TYPE_ACTION,
> NULL,
> +					  "too many items to modify");
> +	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
> +	actions[i].field = reg_to_field[conf->id];
> +	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
> +	actions[i].data1 = conf->data;
> +	++i;
> +	resource->actions_num = i;
> +	if (!resource->actions_num)
> +		return rte_flow_error_set(error, EINVAL,
> +					  RTE_FLOW_ERROR_TYPE_ACTION,
> NULL,
> +					  "invalid modification flow item");
> +	return 0;
> +}
> +
>  /**
>   * Validate META item.
>   *
> @@ -4640,6 +4693,94 @@ struct field_modify_info modify_tcp[] = {  }
> 
>  /**
> + * Add tag item to matcher
> + *
> + * @param[in, out] matcher
> + *   Flow matcher.
> + * @param[in, out] key
> + *   Flow matcher value.
> + * @param[in] item
> + *   Flow pattern to translate.
> + */
> +static void
> +flow_dv_translate_item_tag(void *matcher, void *key,
> +			   const struct rte_flow_item *item) {
> +	void *misc2_m =
> +		MLX5_ADDR_OF(fte_match_param, matcher,
> misc_parameters_2);
> +	void *misc2_v =
> +		MLX5_ADDR_OF(fte_match_param, key,
> misc_parameters_2);
> +	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
> +	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
> +	enum modify_reg reg = tag_v->id;
> +	rte_be32_t value = tag_v->data;
> +	rte_be32_t mask = tag_m->data;
> +
> +	switch (reg) {
> +	case REG_A:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_a,
> +				rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_B:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_b,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_0:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_0,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_0,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_1:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_1,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_1,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_2:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_2,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_2,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_3:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_3,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_3,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_4:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_4,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_4,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_5:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_5,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_5,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_6:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_6,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_6,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	case REG_C_7:
> +		MLX5_SET(fte_match_set_misc2, misc2_m,
> metadata_reg_c_7,
> +				 rte_be_to_cpu_32(mask));
> +		MLX5_SET(fte_match_set_misc2, misc2_v,
> metadata_reg_c_7,
> +				rte_be_to_cpu_32(value));
> +		break;
> +	}
> +}
> +
> +/**
>   * Add source vport match to the specified matcher.
>   *
>   * @param[in, out] matcher
> @@ -5225,8 +5366,9 @@ struct field_modify_info modify_tcp[] = {
>  		struct mlx5_flow_tbl_resource *tbl;
>  		uint32_t port_id = 0;
>  		struct mlx5_flow_dv_port_id_action_resource
> port_id_resource;
> +		int action_type = actions->type;
> 
> -		switch (actions->type) {
> +		switch (action_type) {
>  		case RTE_FLOW_ACTION_TYPE_VOID:
>  			break;
>  		case RTE_FLOW_ACTION_TYPE_PORT_ID:
> @@ -5541,6 +5683,12 @@ struct field_modify_info modify_tcp[] = {
>  					MLX5_FLOW_ACTION_INC_TCP_ACK
> :
> 
> 	MLX5_FLOW_ACTION_DEC_TCP_ACK;
>  			break;
> +		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
> +			if (flow_dv_convert_action_set_reg(&res, actions,
> +							   error))
> +				return -rte_errno;
> +			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
> +			break;
>  		case RTE_FLOW_ACTION_TYPE_END:
>  			actions_end = true;
>  			if (action_flags &
> MLX5_FLOW_MODIFY_HDR_ACTIONS) { @@ -5565,8 +5713,9 @@ struct
> field_modify_info modify_tcp[] = {
>  	flow->actions = action_flags;
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> +		int item_type = items->type;
> 
> -		switch (items->type) {
> +		switch (item_type) {
>  		case RTE_FLOW_ITEM_TYPE_PORT_ID:
>  			flow_dv_translate_item_port_id(dev, match_mask,
>  						       match_value, items);
> @@ -5712,6 +5861,11 @@ struct field_modify_info modify_tcp[] = {
>  						      items, tunnel);
>  			last_item = MLX5_FLOW_LAYER_ICMP6;
>  			break;
> +		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
> +			flow_dv_translate_item_tag(match_mask,
> match_value,
> +						   items);
> +			last_item = MLX5_FLOW_ITEM_TAG;
> +			break;
>  		default:
>  			break;
>  		}
> diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
> index d4084db..695578f 100644
> --- a/drivers/net/mlx5/mlx5_prm.h
> +++ b/drivers/net/mlx5/mlx5_prm.h
> @@ -623,7 +623,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
>  	u8 metadata_reg_c_1[0x20];
>  	u8 metadata_reg_c_0[0x20];
>  	u8 metadata_reg_a[0x20];
> -	u8 reserved_at_1a0[0x60];
> +	u8 metadata_reg_b[0x20];
> +	u8 reserved_at_1c0[0x40];
>  };
> 
>  struct mlx5_ifc_fte_match_set_misc3_bits {
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function Ori Kam
@ 2019-09-26  9:34   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:34 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 10/13] net/mlx5: add id generation function
> 
> When splitting flows for example in hairpin / metering, there is a need to
> combine the flows. This is done using ID.
> This commit introduce a simple way to generate such IDs.
> 
> The reason why bitmap was not used is due to fact that the release and
> allocation are O(n) while in the chosen approch the allocation and release
> are O(1)
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.c      | 120
> ++++++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_flow.h |  14 +++++
>  2 files changed, 133 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> ad36743..940503d 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -179,6 +179,124 @@ struct mlx5_dev_spawn_data {  static LIST_HEAD(,
> mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();  static
> pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
> 
> +#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
> +#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
> +
> +/**
> + * Allocate ID pool structure.
> + *
> + * @return
> + *   Pointer to pool object, NULL value otherwise.
> + */
> +struct mlx5_flow_id_pool *
> +mlx5_flow_id_pool_alloc(void)
> +{
> +	struct mlx5_flow_id_pool *pool;
> +	void *mem;
> +
> +	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
> +			   RTE_CACHE_LINE_SIZE);
> +	if (!pool) {
> +		DRV_LOG(ERR, "can't allocate id pool");
> +		rte_errno  = ENOMEM;
> +		return NULL;
> +	}
> +	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE *
> sizeof(uint32_t),
> +			  RTE_CACHE_LINE_SIZE);
> +	if (!mem) {
> +		DRV_LOG(ERR, "can't allocate mem for id pool");
> +		rte_errno  = ENOMEM;
> +		goto error;
> +	}
> +	pool->free_arr = mem;
> +	pool->curr = pool->free_arr;
> +	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
> +	pool->base_index = 0;
> +	return pool;
> +error:
> +	rte_free(pool);
> +	return NULL;
> +}
> +
> +/**
> + * Release ID pool structure.
> + *
> + * @param[in] pool
> + *   Pointer to flow id pool object to free.
> + */
> +void
> +mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool) {
> +	rte_free(pool->free_arr);
> +	rte_free(pool);
> +}
> +
> +/**
> + * Generate ID.
> + *
> + * @param[in] pool
> + *   Pointer to flow id pool.
> + * @param[out] id
> + *   The generated ID.
> + *
> + * @return
> + *   0 on success, error value otherwise.
> + */
> +uint32_t
> +mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id) {
> +	if (pool->curr == pool->free_arr) {
> +		if (pool->base_index == UINT32_MAX) {
> +			rte_errno  = ENOMEM;
> +			DRV_LOG(ERR, "no free id");
> +			return -rte_errno;
> +		}
> +		*id = ++pool->base_index;
> +		return 0;
> +	}
> +	*id = *(--pool->curr);
> +	return 0;
> +}
> +
> +/**
> + * Release ID.
> + *
> + * @param[in] pool
> + *   Pointer to flow id pool.
> + * @param[out] id
> + *   The generated ID.
> + *
> + * @return
> + *   0 on success, error value otherwise.
> + */
> +uint32_t
> +mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id) {
> +	uint32_t size;
> +	uint32_t size2;
> +	void *mem;
> +
> +	if (pool->curr == pool->last) {
> +		size = pool->curr - pool->free_arr;
> +		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
> +		assert(size2 > size);
> +		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
> +		if (!mem) {
> +			DRV_LOG(ERR, "can't allocate mem for id pool");
> +			rte_errno  = ENOMEM;
> +			return -rte_errno;
> +		}
> +		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
> +		rte_free(pool->free_arr);
> +		pool->free_arr = mem;
> +		pool->curr = pool->free_arr + size;
> +		pool->last = pool->free_arr + size2;
> +	}
> +	*pool->curr = id;
> +	pool->curr++;
> +	return 0;
> +}
> +
>  /**
>   * Initialize the counters management structure.
>   *
> @@ -329,7 +447,7 @@ struct mlx5_dev_spawn_data {
>  	struct mlx5_devx_tis_attr tis_attr = { 0 };  #endif
> 
> -	assert(spawn);
> +assert(spawn);
>  	/* Secondary process should not create the shared context. */
>  	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
>  	pthread_mutex_lock(&mlx5_ibv_list_mutex);
> diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
> index 0148c1b..1b14fb7 100644
> --- a/drivers/net/mlx5/mlx5_flow.h
> +++ b/drivers/net/mlx5/mlx5_flow.h
> @@ -495,8 +495,22 @@ struct mlx5_flow_driver_ops {  #define
> MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
>  	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
> 
> +/* ID generation structure. */
> +struct mlx5_flow_id_pool {
> +	uint32_t *free_arr; /**< Pointer to the a array of free values. */
> +	uint32_t base_index;
> +	/**< The next index that can be used without any free elements. */
> +	uint32_t *curr; /**< Pointer to the index to pop. */
> +	uint32_t *last; /**< Pointer to the last element in the empty arrray.
> +*/ };
> +
>  /* mlx5_flow.c */
> 
> +struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void); void
> +mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool); uint32_t
> +mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
> +uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
> +			      uint32_t id);
>  int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
>  			     bool external, uint32_t group, uint32_t *table,
>  			     struct rte_flow_error *error);
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-09-26  9:34   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:34 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 11/13] net/mlx5: add default flows for hairpin
> 
> When using hairpin all traffic from TX hairpin queues should jump to
> dedecated table where matching can be done using regesters.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.h         |  2 ++
>  drivers/net/mlx5/mlx5_flow.c    | 60
> +++++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
>  drivers/net/mlx5/mlx5_flow_dv.c | 63
> +++++++++++++++++++++++++++++++++++++++--
>  drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
>  5 files changed, 150 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 41eb35a..5f1a25d 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -556,6 +556,7 @@ struct mlx5_flow_tbl_resource {  };
> 
>  #define MLX5_MAX_TABLES UINT16_MAX
> +#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
>  #define MLX5_MAX_TABLES_FDB UINT16_MAX
> 
>  #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */ @@ -872,6
> +873,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,  int
> mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);  void
> mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);  int
> mlx5_flow_verify(struct rte_eth_dev *dev);
> +int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t
> +queue);
>  int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
>  			struct rte_flow_item_eth *eth_spec,
>  			struct rte_flow_item_eth *eth_mask,
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 00afc18..33ed204 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -2712,6 +2712,66 @@ struct rte_flow *  }
> 
>  /**
> + * Enable default hairpin egress flow.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param queue
> + *   The queue index.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
> +			    uint32_t queue)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	const struct rte_flow_attr attr = {
> +		.egress = 1,
> +		.priority = 0,
> +	};
> +	struct mlx5_rte_flow_item_tx_queue queue_spec = {
> +		.queue = queue,
> +	};
> +	struct mlx5_rte_flow_item_tx_queue queue_mask = {
> +		.queue = UINT32_MAX,
> +	};
> +	struct rte_flow_item items[] = {
> +		{
> +			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
> +			.spec = &queue_spec,
> +			.last = NULL,
> +			.mask = &queue_mask,
> +		},
> +		{
> +			.type = RTE_FLOW_ITEM_TYPE_END,
> +		},
> +	};
> +	struct rte_flow_action_jump jump = {
> +		.group = MLX5_HAIRPIN_TX_TABLE,
> +	};
> +	struct rte_flow_action actions[2];
> +	struct rte_flow *flow;
> +	struct rte_flow_error error;
> +
> +	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
> +	actions[0].conf = &jump;
> +	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
> +	flow = flow_list_create(dev, &priv->ctrl_flows,
> +				&attr, items, actions, false, &error);
> +	if (!flow) {
> +		DRV_LOG(DEBUG,
> +			"Failed to create ctrl flow: rte_errno(%d),"
> +			" type(%d), message(%s)\n",
> +			rte_errno, error.type,
> +			error.message ? error.message : " (no stated
> reason)");
> +		return -rte_errno;
> +	}
> +	return 0;
> +}
> +
> +/**
>   * Enable a control flow configured from the control plane.
>   *
>   * @param dev
> diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
> index 1b14fb7..bb67380 100644
> --- a/drivers/net/mlx5/mlx5_flow.h
> +++ b/drivers/net/mlx5/mlx5_flow.h
> @@ -44,6 +44,7 @@ enum modify_reg {
>  enum mlx5_rte_flow_item_type {
>  	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
>  	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
> +	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
>  };
> 
>  /* Private rte flow actions. */
> @@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
>  	rte_be32_t data;
>  };
> 
> +/* Matches on source queue. */
> +struct mlx5_rte_flow_item_tx_queue {
> +	uint32_t queue;
> +};
> +
>  /* Pattern outer Layer bits. */
>  #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)  #define
> MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1) @@ -102,6 +108,9 @@
> struct mlx5_rte_flow_action_set_tag {  #define
> MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)  #define
> MLX5_FLOW_LAYER_NVGRE (1u << 24)
> 
> +/* Queue items. */
> +#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
> +
>  /* Outer Masks. */
>  #define MLX5_FLOW_LAYER_OUTER_L3 \
>  	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 |
> MLX5_FLOW_LAYER_OUTER_L3_IPV6) diff --git
> a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
> index dde0831..2b48680 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
>  		return ret;
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> -		switch (items->type) {
> +		int type = items->type;
> +
> +		switch (type) {
>  		case RTE_FLOW_ITEM_TYPE_VOID:
>  			break;
>  		case RTE_FLOW_ITEM_TYPE_PORT_ID:
> @@ -3518,6 +3520,9 @@ struct field_modify_info modify_tcp[] = {
>  				return ret;
>  			last_item = MLX5_FLOW_LAYER_ICMP6;
>  			break;
> +		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
> +		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
> +			break;
>  		default:
>  			return rte_flow_error_set(error, ENOTSUP,
> 
> RTE_FLOW_ERROR_TYPE_ITEM,
> @@ -3526,11 +3531,12 @@ struct field_modify_info modify_tcp[] = {
>  		item_flags |= last_item;
>  	}
>  	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
> +		int type = actions->type;
>  		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
>  			return rte_flow_error_set(error, ENOTSUP,
> 
> RTE_FLOW_ERROR_TYPE_ACTION,
>  						  actions, "too many
> actions");
> -		switch (actions->type) {
> +		switch (type) {
>  		case RTE_FLOW_ACTION_TYPE_VOID:
>  			break;
>  		case RTE_FLOW_ACTION_TYPE_PORT_ID:
> @@ -3796,6 +3802,8 @@ struct field_modify_info modify_tcp[] = {
> 
> 	MLX5_FLOW_ACTION_INC_TCP_ACK :
> 
> 	MLX5_FLOW_ACTION_DEC_TCP_ACK;
>  			break;
> +		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
> +			break;
>  		default:
>  			return rte_flow_error_set(error, ENOTSUP,
> 
> RTE_FLOW_ERROR_TYPE_ACTION,
> @@ -5291,6 +5299,51 @@ struct field_modify_info modify_tcp[] = {  }
> 
>  /**
> + * Add Tx queue matcher
> + *
> + * @param[in] dev
> + *   Pointer to the dev struct.
> + * @param[in, out] matcher
> + *   Flow matcher.
> + * @param[in, out] key
> + *   Flow matcher value.
> + * @param[in] item
> + *   Flow pattern to translate.
> + * @param[in] inner
> + *   Item is inner pattern.
> + */
> +static void
> +flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
> +				void *matcher, void *key,
> +				const struct rte_flow_item *item)
> +{
> +	const struct mlx5_rte_flow_item_tx_queue *queue_m;
> +	const struct mlx5_rte_flow_item_tx_queue *queue_v;
> +	void *misc_m =
> +		MLX5_ADDR_OF(fte_match_param, matcher,
> misc_parameters);
> +	void *misc_v =
> +		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
> +	struct mlx5_txq_ctrl *txq;
> +	uint32_t queue;
> +
> +
> +	queue_m = (const void *)item->mask;
> +	if (!queue_m)
> +		return;
> +	queue_v = (const void *)item->spec;
> +	if (!queue_v)
> +		return;
> +	txq = mlx5_txq_get(dev, queue_v->queue);
> +	if (!txq)
> +		return;
> +	queue = txq->obj->sq->id;
> +	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m-
> >queue);
> +	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
> +		 queue & queue_m->queue);
> +	mlx5_txq_release(dev, queue_v->queue); }
> +
> +/**
>   * Fill the flow with DV spec.
>   *
>   * @param[in] dev
> @@ -5866,6 +5919,12 @@ struct field_modify_info modify_tcp[] = {
>  						   items);
>  			last_item = MLX5_FLOW_ITEM_TAG;
>  			break;
> +		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
> +			flow_dv_translate_item_tx_queue(dev,
> match_mask,
> +							match_value,
> +							items);
> +			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
> +			break;
>  		default:
>  			break;
>  		}
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index a4fcdb3..a476cd5 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -396,6 +396,24 @@
>  	unsigned int j;
>  	int ret;
> 
> +	/*
> +	 * Hairpin txq default flow should be created no matter if it is
> +	 * isolation mode. Or else all the packets to be sent will be sent
> +	 * out directly without the TX flow actions, e.g. encapsulation.
> +	 */
> +	for (i = 0; i != priv->txqs_n; ++i) {
> +		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
> +		if (!txq_ctrl)
> +			continue;
> +		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
> +			ret = mlx5_ctrl_flow_source_queue(dev, i);
> +			if (ret) {
> +				mlx5_txq_release(dev, i);
> +				goto error;
> +			}
> +		}
> +		mlx5_txq_release(dev, i);
> +	}
>  	if (priv->config.dv_esw_en && !priv->config.vf)
>  		if (!mlx5_flow_create_esw_table_zero_flow(dev))
>  			goto error;
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows Ori Kam
@ 2019-09-26  9:34   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:34 UTC (permalink / raw)
  To: Ori Kam, Matan Azrad, Shahaf Shuler; +Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: Ori Kam <orika@mellanox.com>
> Sent: Thursday, September 26, 2019 9:29
> To: Matan Azrad <matan@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [PATCH 12/13] net/mlx5: split hairpin flows
> 
> Since the encap action is not supported in RX, we need to split the hairpin
> flow into RX and TX.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.c            |  10 ++
>  drivers/net/mlx5/mlx5.h            |  10 ++
>  drivers/net/mlx5/mlx5_flow.c       | 281
> +++++++++++++++++++++++++++++++++++--
>  drivers/net/mlx5/mlx5_flow.h       |  14 +-
>  drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
>  drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
>  drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
>  drivers/net/mlx5/mlx5_rxtx.h       |   2 +
>  8 files changed, 334 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 940503d..2837cba 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -528,6 +528,12 @@ struct mlx5_flow_id_pool *
>  		err = ENOMEM;
>  		goto error;
>  	}
> +	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
> +	if (!sh->flow_id_pool) {
> +		DRV_LOG(ERR, "can't create flow id pool");
> +		err = ENOMEM;
> +		goto error;
> +	}
>  #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
>  	/*
>  	 * Once the device is added to the list of memory event @@ -567,6
> +573,8 @@ struct mlx5_flow_id_pool *
>  		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
>  	if (sh->ctx)
>  		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	if (sh->flow_id_pool)
> +		mlx5_flow_id_pool_release(sh->flow_id_pool);
>  	rte_free(sh);
>  	assert(err > 0);
>  	rte_errno = err;
> @@ -629,6 +637,8 @@ struct mlx5_flow_id_pool *
>  		claim_zero(mlx5_devx_cmd_destroy(sh->td));
>  	if (sh->ctx)
>  		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	if (sh->flow_id_pool)
> +		mlx5_flow_id_pool_release(sh->flow_id_pool);
>  	rte_free(sh);
>  exit:
>  	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 5f1a25d..5336554 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -574,6 +574,15 @@ struct mlx5_devx_dbr_page {
>  	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];  };
> 
> +/* ID generation structure. */
> +struct mlx5_flow_id_pool {
> +	uint32_t *free_arr; /**< Pointer to the a array of free values. */
> +	uint32_t base_index;
> +	/**< The next index that can be used without any free elements. */
> +	uint32_t *curr; /**< Pointer to the index to pop. */
> +	uint32_t *last; /**< Pointer to the last element in the empty arrray.
> +*/ };
> +
>  /*
>   * Shared Infiniband device context for Master/Representors
>   * which belong to same IB device with multiple IB ports.
> @@ -632,6 +641,7 @@ struct mlx5_ibv_shared {
>  	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp
> obj. */
>  	struct mlx5_devx_obj *tis; /* TIS object. */
>  	struct mlx5_devx_obj *td; /* Transport domain. */
> +	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
>  	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> };
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 33ed204..50e1d11 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct rte_flow *flow = dev_flow->flow;
> -	const int mark = !!(flow->actions &
> +	const int mark = !!(dev_flow->actions &
>  			    (MLX5_FLOW_ACTION_FLAG |
> MLX5_FLOW_ACTION_MARK));
>  	const int tunnel = !!(dev_flow->layers &
> MLX5_FLOW_LAYER_TUNNEL);
>  	unsigned int i;
> @@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct rte_flow *flow = dev_flow->flow;
> -	const int mark = !!(flow->actions &
> +	const int mark = !!(dev_flow->actions &
>  			    (MLX5_FLOW_ACTION_FLAG |
> MLX5_FLOW_ACTION_MARK));
>  	const int tunnel = !!(dev_flow->layers &
> MLX5_FLOW_LAYER_TUNNEL);
>  	unsigned int i;
> @@ -2419,6 +2419,210 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,  }
> 
>  /**
> + * Check if the flow should be splited due to hairpin.
> + * The reason for the split is that in current HW we can't
> + * support encap on Rx, so if a flow have encap we move it
> + * to Tx.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[in] attr
> + *   Flow rule attributes.
> + * @param[in] actions
> + *   Associated actions (list terminated by the END action).
> + *
> + * @return
> + *   > 0 the number of actions and the flow should be split,
> + *   0 when no split required.
> + */
> +static int
> +flow_check_hairpin_split(struct rte_eth_dev *dev,
> +			 const struct rte_flow_attr *attr,
> +			 const struct rte_flow_action actions[]) {
> +	int queue_action = 0;
> +	int action_n = 0;
> +	int encap = 0;
> +	const struct rte_flow_action_queue *queue;
> +	const struct rte_flow_action_rss *rss;
> +	const struct rte_flow_action_raw_encap *raw_encap;
> +
> +	if (!attr->ingress)
> +		return 0;
> +	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
> +		switch (actions->type) {
> +		case RTE_FLOW_ACTION_TYPE_QUEUE:
> +			queue = actions->conf;
> +			if (mlx5_rxq_get_type(dev, queue->index) !=
> +			    MLX5_RXQ_TYPE_HAIRPIN)
> +				return 0;
> +			queue_action = 1;
> +			action_n++;
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_RSS:
> +			rss = actions->conf;
> +			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
> +			    MLX5_RXQ_TYPE_HAIRPIN)
> +				return 0;
> +			queue_action = 1;
> +			action_n++;
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
> +		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
> +			encap = 1;
> +			action_n++;
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
> +			raw_encap = actions->conf;
> +			if (raw_encap->size >
> +			    (sizeof(struct rte_flow_item_eth) +
> +			     sizeof(struct rte_flow_item_ipv4)))
> +				encap = 1;
> +			action_n++;
> +			break;
> +		default:
> +			action_n++;
> +			break;
> +		}
> +	}
> +	if (encap == 1 && queue_action)
> +		return action_n;
> +	return 0;
> +}
> +
> +#define MLX5_MAX_SPLIT_ACTIONS 24
> +#define MLX5_MAX_SPLIT_ITEMS 24
> +
> +/**
> + * Split the hairpin flow.
> + * Since HW can't support encap on Rx we move the encap to Tx.
> + * If the count action is after the encap then we also
> + * move the count action. in this case the count will also measure
> + * the outer bytes.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[in] actions
> + *   Associated actions (list terminated by the END action).
> + * @param[out] actions_rx
> + *   Rx flow actions.
> + * @param[out] actions_tx
> + *   Tx flow actions..
> + * @param[out] pattern_tx
> + *   The pattern items for the Tx flow.
> + * @param[out] flow_id
> + *   The flow ID connected to this flow.
> + *
> + * @return
> + *   0 on success.
> + */
> +static int
> +flow_hairpin_split(struct rte_eth_dev *dev,
> +		   const struct rte_flow_action actions[],
> +		   struct rte_flow_action actions_rx[],
> +		   struct rte_flow_action actions_tx[],
> +		   struct rte_flow_item pattern_tx[],
> +		   uint32_t *flow_id)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	const struct rte_flow_action_raw_encap *raw_encap;
> +	const struct rte_flow_action_raw_decap *raw_decap;
> +	struct mlx5_rte_flow_action_set_tag *set_tag;
> +	struct rte_flow_action *tag_action;
> +	struct mlx5_rte_flow_item_tag *tag_item;
> +	struct rte_flow_item *item;
> +	char *addr;
> +	struct rte_flow_error error;
> +	int encap = 0;
> +
> +	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
> +	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
> +		switch (actions->type) {
> +		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
> +		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
> +			rte_memcpy(actions_tx, actions,
> +			       sizeof(struct rte_flow_action));
> +			actions_tx++;
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_COUNT:
> +			if (encap) {
> +				rte_memcpy(actions_tx, actions,
> +					   sizeof(struct rte_flow_action));
> +				actions_tx++;
> +			} else {
> +				rte_memcpy(actions_rx, actions,
> +					   sizeof(struct rte_flow_action));
> +				actions_rx++;
> +			}
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
> +			raw_encap = actions->conf;
> +			if (raw_encap->size >
> +			    (sizeof(struct rte_flow_item_eth) +
> +			     sizeof(struct rte_flow_item_ipv4))) {
> +				memcpy(actions_tx, actions,
> +				       sizeof(struct rte_flow_action));
> +				actions_tx++;
> +				encap = 1;
> +			} else {
> +				rte_memcpy(actions_rx, actions,
> +					   sizeof(struct rte_flow_action));
> +				actions_rx++;
> +			}
> +			break;
> +		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
> +			raw_decap = actions->conf;
> +			if (raw_decap->size <
> +			    (sizeof(struct rte_flow_item_eth) +
> +			     sizeof(struct rte_flow_item_ipv4))) {
> +				memcpy(actions_tx, actions,
> +				       sizeof(struct rte_flow_action));
> +				actions_tx++;
> +			} else {
> +				rte_memcpy(actions_rx, actions,
> +					   sizeof(struct rte_flow_action));
> +				actions_rx++;
> +			}
> +			break;
> +		default:
> +			rte_memcpy(actions_rx, actions,
> +				   sizeof(struct rte_flow_action));
> +			actions_rx++;
> +			break;
> +		}
> +	}
> +	/* Add set meta action and end action for the Rx flow. */
> +	tag_action = actions_rx;
> +	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
> +	actions_rx++;
> +	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
> +	actions_rx++;
> +	set_tag = (void *)actions_rx;
> +	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
> +	set_tag->data = rte_cpu_to_be_32(*flow_id);
> +	tag_action->conf = set_tag;
> +	/* Create Tx item list. */
> +	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
> +	addr = (void *)&pattern_tx[2];
> +	item = pattern_tx;
> +	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
> +	tag_item = (void *)addr;
> +	tag_item->data = rte_cpu_to_be_32(*flow_id);
> +	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
> +	item->spec = tag_item;
> +	addr += sizeof(struct mlx5_rte_flow_item_tag);
> +	tag_item = (void *)addr;
> +	tag_item->data = UINT32_MAX;
> +	tag_item->id = UINT16_MAX;
> +	item->mask = tag_item;
> +	addr += sizeof(struct mlx5_rte_flow_item_tag);
> +	item->last = NULL;
> +	item++;
> +	item->type = RTE_FLOW_ITEM_TYPE_END;
> +	return 0;
> +}
> +
> +/**
>   * Create a flow and add it to @p list.
>   *
>   * @param dev
> @@ -2446,6 +2650,7 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,
>  		 const struct rte_flow_action actions[],
>  		 bool external, struct rte_flow_error *error)  {
> +	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct rte_flow *flow = NULL;
>  	struct mlx5_flow *dev_flow;
>  	const struct rte_flow_action_rss *rss; @@ -2453,16 +2658,44 @@
> uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
>  		struct rte_flow_expand_rss buf;
>  		uint8_t buffer[2048];
>  	} expand_buffer;
> +	union {
> +		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
> +		uint8_t buffer[2048];
> +	} actions_rx;
> +	union {
> +		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
> +		uint8_t buffer[2048];
> +	} actions_hairpin_tx;
> +	union {
> +		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
> +		uint8_t buffer[2048];
> +	} items_tx;
>  	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
> +	const struct rte_flow_action *p_actions_rx = actions;
>  	int ret;
>  	uint32_t i;
>  	uint32_t flow_size;
> +	int hairpin_flow = 0;
> +	uint32_t hairpin_id = 0;
> +	struct rte_flow_attr attr_tx = { .priority = 0 };
> 
> -	ret = flow_drv_validate(dev, attr, items, actions, external, error);
> +	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
> +	if (hairpin_flow > 0) {
> +		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
> +			rte_errno = EINVAL;
> +			return NULL;
> +		}
> +		flow_hairpin_split(dev, actions, actions_rx.actions,
> +				   actions_hairpin_tx.actions, items_tx.items,
> +				   &hairpin_id);
> +		p_actions_rx = actions_rx.actions;
> +	}
> +	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
> +				error);
>  	if (ret < 0)
> -		return NULL;
> +		goto error_before_flow;
>  	flow_size = sizeof(struct rte_flow);
> -	rss = flow_get_rss_action(actions);
> +	rss = flow_get_rss_action(p_actions_rx);
>  	if (rss)
>  		flow_size += RTE_ALIGN_CEIL(rss->queue_num *
> sizeof(uint16_t),
>  					    sizeof(void *));
> @@ -2471,11 +2704,13 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,
>  	flow = rte_calloc(__func__, 1, flow_size, 0);
>  	if (!flow) {
>  		rte_errno = ENOMEM;
> -		return NULL;
> +		goto error_before_flow;
>  	}
>  	flow->drv_type = flow_get_drv_type(dev, attr);
>  	flow->ingress = attr->ingress;
>  	flow->transfer = attr->transfer;
> +	if (hairpin_id != 0)
> +		flow->hairpin_flow_id = hairpin_id;
>  	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
>  	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
>  	flow->queue = (void *)(flow + 1);
> @@ -2496,7 +2731,7 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,
>  	}
>  	for (i = 0; i < buf->entries; ++i) {
>  		dev_flow = flow_drv_prepare(flow, attr, buf-
> >entry[i].pattern,
> -					    actions, error);
> +					    p_actions_rx, error);
>  		if (!dev_flow)
>  			goto error;
>  		dev_flow->flow = flow;
> @@ -2504,7 +2739,24 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,
>  		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
>  		ret = flow_drv_translate(dev, dev_flow, attr,
>  					 buf->entry[i].pattern,
> -					 actions, error);
> +					 p_actions_rx, error);
> +		if (ret < 0)
> +			goto error;
> +	}
> +	/* Create the tx flow. */
> +	if (hairpin_flow) {
> +		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
> +		attr_tx.ingress = 0;
> +		attr_tx.egress = 1;
> +		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
> +					    actions_hairpin_tx.actions, error);
> +		if (!dev_flow)
> +			goto error;
> +		dev_flow->flow = flow;
> +		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
> +		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
> +					 items_tx.items,
> +					 actions_hairpin_tx.actions, error);
>  		if (ret < 0)
>  			goto error;
>  	}
> @@ -2516,8 +2768,16 @@ uint32_t mlx5_flow_adjust_priority(struct
> rte_eth_dev *dev, int32_t priority,
>  	TAILQ_INSERT_TAIL(list, flow, next);
>  	flow_rxq_flags_set(dev, flow);
>  	return flow;
> +error_before_flow:
> +	if (hairpin_id)
> +		mlx5_flow_id_release(priv->sh->flow_id_pool,
> +				     hairpin_id);
> +	return NULL;
>  error:
>  	ret = rte_errno; /* Save rte_errno before cleanup. */
> +	if (flow->hairpin_flow_id)
> +		mlx5_flow_id_release(priv->sh->flow_id_pool,
> +				     flow->hairpin_flow_id);
>  	assert(flow);
>  	flow_drv_destroy(dev, flow);
>  	rte_free(flow);
> @@ -2607,12 +2867,17 @@ struct rte_flow *  flow_list_destroy(struct
> rte_eth_dev *dev, struct mlx5_flows *list,
>  		  struct rte_flow *flow)
>  {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +
>  	/*
>  	 * Update RX queue flags only if port is started, otherwise it is
>  	 * already clean.
>  	 */
>  	if (dev->data->dev_started)
>  		flow_rxq_flags_trim(dev, flow);
> +	if (flow->hairpin_flow_id)
> +		mlx5_flow_id_release(priv->sh->flow_id_pool,
> +				     flow->hairpin_flow_id);
>  	flow_drv_destroy(dev, flow);
>  	TAILQ_REMOVE(list, flow, next);
>  	rte_free(flow->fdir);
> diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
> index bb67380..90a289e 100644
> --- a/drivers/net/mlx5/mlx5_flow.h
> +++ b/drivers/net/mlx5/mlx5_flow.h
> @@ -434,6 +434,8 @@ struct mlx5_flow {
>  	struct rte_flow *flow; /**< Pointer to the main flow. */
>  	uint64_t layers;
>  	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
> +	uint64_t actions;
> +	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
>  	union {
>  #ifdef HAVE_IBV_FLOW_DV_SUPPORT
>  		struct mlx5_flow_dv dv;
> @@ -455,12 +457,11 @@ struct rte_flow {
>  	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
>  	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
>  	/**< Device flows that are part of the flow. */
> -	uint64_t actions;
> -	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
>  	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
>  	uint8_t ingress; /**< 1 if the flow is ingress. */
>  	uint32_t group; /**< The group index. */
>  	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
> +	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
>  };
> 
>  typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev, @@ -504,15
> +505,6 @@ struct mlx5_flow_driver_ops {  #define
> MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
>  	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
> 
> -/* ID generation structure. */
> -struct mlx5_flow_id_pool {
> -	uint32_t *free_arr; /**< Pointer to the a array of free values. */
> -	uint32_t base_index;
> -	/**< The next index that can be used without any free elements. */
> -	uint32_t *curr; /**< Pointer to the index to pop. */
> -	uint32_t *last; /**< Pointer to the last element in the empty arrray.
> */
> -};
> -
>  /* mlx5_flow.c */
> 
>  struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void); diff --git
> a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
> index 2b48680..6828bd1 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -5763,7 +5763,7 @@ struct field_modify_info modify_tcp[] = {
>  			modify_action_position = actions_n++;
>  	}
>  	dev_flow->dv.actions_n = actions_n;
> -	flow->actions = action_flags;
> +	dev_flow->actions = action_flags;
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
>  		int item_type = items->type;
> @@ -5985,7 +5985,7 @@ struct field_modify_info modify_tcp[] = {
>  	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
>  		dv = &dev_flow->dv;
>  		n = dv->actions_n;
> -		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
> +		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
>  			if (flow->transfer) {
>  				dv->actions[n++] = priv->sh-
> >esw_drop_action;
>  			} else {
> @@ -6000,7 +6000,7 @@ struct field_modify_info modify_tcp[] = {
>  				}
>  				dv->actions[n++] = dv->hrxq->action;
>  			}
> -		} else if (flow->actions &
> +		} else if (dev_flow->actions &
>  			   (MLX5_FLOW_ACTION_QUEUE |
> MLX5_FLOW_ACTION_RSS)) {
>  			struct mlx5_hrxq *hrxq;
> 
> @@ -6056,7 +6056,7 @@ struct field_modify_info modify_tcp[] = {
>  	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
>  		struct mlx5_flow_dv *dv = &dev_flow->dv;
>  		if (dv->hrxq) {
> -			if (flow->actions & MLX5_FLOW_ACTION_DROP)
> +			if (dev_flow->actions &
> MLX5_FLOW_ACTION_DROP)
>  				mlx5_hrxq_drop_release(dev);
>  			else
>  				mlx5_hrxq_release(dev, dv->hrxq);
> @@ -6290,7 +6290,7 @@ struct field_modify_info modify_tcp[] = {
>  			dv->flow = NULL;
>  		}
>  		if (dv->hrxq) {
> -			if (flow->actions & MLX5_FLOW_ACTION_DROP)
> +			if (dev_flow->actions &
> MLX5_FLOW_ACTION_DROP)
>  				mlx5_hrxq_drop_release(dev);
>  			else
>  				mlx5_hrxq_release(dev, dv->hrxq);
> diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c
> b/drivers/net/mlx5/mlx5_flow_verbs.c
> index 23110f2..fd27f6c 100644
> --- a/drivers/net/mlx5/mlx5_flow_verbs.c
> +++ b/drivers/net/mlx5/mlx5_flow_verbs.c
> @@ -191,7 +191,7 @@
>  {
>  #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
>  	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
> -	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
> +	if (flow->counter->cs) {
>  		struct rte_flow_query_count *qc = data;
>  		uint64_t counters[2] = {0, 0};
>  #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
> @@ -1410,7 +1410,6 @@
>  		     const struct rte_flow_action actions[],
>  		     struct rte_flow_error *error)
>  {
> -	struct rte_flow *flow = dev_flow->flow;
>  	uint64_t item_flags = 0;
>  	uint64_t action_flags = 0;
>  	uint64_t priority = attr->priority;
> @@ -1460,7 +1459,7 @@
>  						  "action not supported");
>  		}
>  	}
> -	flow->actions = action_flags;
> +	dev_flow->actions = action_flags;
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> 
> @@ -1592,7 +1591,7 @@
>  			verbs->flow = NULL;
>  		}
>  		if (verbs->hrxq) {
> -			if (flow->actions & MLX5_FLOW_ACTION_DROP)
> +			if (dev_flow->actions &
> MLX5_FLOW_ACTION_DROP)
>  				mlx5_hrxq_drop_release(dev);
>  			else
>  				mlx5_hrxq_release(dev, verbs->hrxq); @@ -
> 1656,7 +1655,7 @@
> 
>  	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
>  		verbs = &dev_flow->verbs;
> -		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
> +		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
>  			verbs->hrxq = mlx5_hrxq_drop_new(dev);
>  			if (!verbs->hrxq) {
>  				rte_flow_error_set
> @@ -1717,7 +1716,7 @@
>  	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
>  		verbs = &dev_flow->verbs;
>  		if (verbs->hrxq) {
> -			if (flow->actions & MLX5_FLOW_ACTION_DROP)
> +			if (dev_flow->actions &
> MLX5_FLOW_ACTION_DROP)
>  				mlx5_hrxq_drop_release(dev);
>  			else
>  				mlx5_hrxq_release(dev, verbs->hrxq); diff --
> git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index
> bf39112..e51a0c6 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -2113,6 +2113,32 @@ struct mlx5_rxq_ctrl *  }
> 
>  /**
> + * Get a Rx queue type.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param idx
> + *   Rx queue index.
> + *
> + * @return
> + *   The Rx queue type.
> + */
> +enum mlx5_rxq_type
> +mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
> +
> +	if ((*priv->rxqs)[idx]) {
> +		rxq_ctrl = container_of((*priv->rxqs)[idx],
> +					struct mlx5_rxq_ctrl,
> +					rxq);
> +		return rxq_ctrl->type;
> +	}
> +	return MLX5_RXQ_TYPE_UNDEFINED;
> +}
> +
> +/**
>   * Create an indirection table.
>   *
>   * @param dev
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 8fa22e5..4707b29 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {  enum mlx5_rxq_type {
>  	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
>  	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
> +	MLX5_RXQ_TYPE_UNDEFINED,
>  };
> 
>  /* Verbs/DevX Rx queue elements. */
> @@ -408,6 +409,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev
> *dev,
>  				const uint16_t *queues, uint32_t queues_n);
> int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);  int
> mlx5_hrxq_verify(struct rte_eth_dev *dev);
> +enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t
> +idx);
>  struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);  void
> mlx5_hrxq_drop_release(struct rte_eth_dev *dev);  uint64_t
> mlx5_get_rx_port_offloads(void);
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 13/13] doc: add hairpin feature
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 13/13] doc: add hairpin feature Ori Kam
@ 2019-09-26  9:34   ` Slava Ovsiienko
  0 siblings, 0 replies; 186+ messages in thread
From: Slava Ovsiienko @ 2019-09-26  9:34 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic
  Cc: dev, Ori Kam, jingjing.wu, stephen

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ori Kam
> Sent: Thursday, September 26, 2019 9:29
> To: John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: [dpdk-dev] [PATCH 13/13] doc: add hairpin feature
> 
> This commit adds the hairpin feature to the release notes.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

> ---
>  doc/guides/rel_notes/release_19_11.rst | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_19_11.rst
> b/doc/guides/rel_notes/release_19_11.rst
> index c8d97f1..a880655 100644
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -56,6 +56,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
> 
> +* **Added hairpin queue.**
> +
> +  On supported nics, we can now setup haipin queue which will offload
> + packets from the wire,  back to the wire.
> 
>  Removed Items
>  -------------
> @@ -234,4 +238,5 @@ Tested Platforms
>    * Added support for VLAN push flow offload command.
>    * Added support for VLAN set PCP offload command.
>    * Added support for VLAN set VID offload command.
> +  * Added hairpin support.
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-26  6:28 ` [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue Ori Kam
@ 2019-09-26 12:18   ` Andrew Rybchenko
       [not found]     ` <AM0PR0502MB4019A2FEADE5F9DCD0D9DDFED2860@AM0PR0502MB4019.eurprd05.prod.outlook.com>
  2019-10-03 18:39     ` Ray Kinsella
  0 siblings, 2 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-09-26 12:18 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

On 9/26/19 9:28 AM, Ori Kam wrote:
> This commit introduce the RX/TX hairpin setup function.

RX/TX should be Rx/Tx here and everywhere below.

> Hairpin is RX/TX queue that is used by the nic in order to offload
> wire to wire traffic.
>
> Each hairpin queue is binded to one or more queues from other type.
> For example TX hairpin queue should be binded to at least 1 RX hairpin
> queue and vice versa.

How should application find out that hairpin queues are supported?
How many?
How should application find out which ports/queues could be used for pining?
Is hair-pinning domain on device level sufficient to expose limitations?

> Signed-off-by: Ori Kam <orika@mellanox.com>
> ---
>   lib/librte_ethdev/rte_ethdev.c           | 213 +++++++++++++++++++++++++++++++
>   lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
>   lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
>   lib/librte_ethdev/rte_ethdev_version.map |   4 +
>   4 files changed, 380 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 30b0c78..4021f38 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
>   }
>   
>   int
> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			       uint16_t nb_rx_desc, unsigned int socket_id,
> +			       const struct rte_eth_rxconf *rx_conf,
> +			       const struct rte_eth_hairpin_conf *hairpin_conf)

Below code duplicates rte_eth_rx_queue_setup() a lot and it is very
bad from maintenance point of view. Similar problem with Tx hairpin 
queue setup.

> +{
> +	int ret;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_rxconf local_conf;
> +	void **rxq;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
> +				-ENOTSUP);
> +
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +
> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0) {
> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> +		/* If driver default is also zero, fall back on EAL default */
> +		if (nb_rx_desc == 0)
> +			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> +	}
> +
> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> +
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> +			       "<= %hu, >= %hu, and a product of %hu\n",
> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> +			dev_info.rx_desc_lim.nb_min,
> +			dev_info.rx_desc_lim.nb_align);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> +		return -EBUSY;
> +
> +	if (dev->data->dev_started &&
> +		(dev->data->rx_queue_state[rx_queue_id] !=
> +			RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id]) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +
> +	if (rx_conf == NULL)
> +		rx_conf = &dev_info.default_rxconf;
> +
> +	local_conf = *rx_conf;
> +
> +	/*
> +	 * If an offloading has already been enabled in
> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> +	 * so there is no need to enable it in this queue again.
> +	 * The local_conf.offloads input to underlying PMD only carries
> +	 * those offloadings which are only enabled on this queue and
> +	 * not enabled on all queues.
> +	 */
> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> +
> +	/*
> +	 * New added offloadings for this queue are those not enabled in
> +	 * rte_eth_dev_configure() and they must be per-queue type.
> +	 * A pure per-port offloading can't be enabled on a queue while
> +	 * disabled on another queue. A pure per-port offloading can't
> +	 * be enabled for any queue as new added one if it hasn't been
> +	 * enabled in rte_eth_dev_configure().
> +	 */
> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> +	     local_conf.offloads) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port_id=%d rx_queue_id=%d, "
> +			"new added offloads 0x%"PRIx64" must be "
> +			"within per-queue offload capabilities "
> +			"0x%"PRIx64" in %s()\n",
> +			port_id, rx_queue_id, local_conf.offloads,
> +			dev_info.rx_queue_offload_capa,
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> +						      nb_rx_desc, socket_id,
> +						      &local_conf,
> +						      hairpin_conf);
> +
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
>   		       const struct rte_eth_txconf *tx_conf)
> @@ -1799,6 +1908,110 @@ struct rte_eth_dev *
>   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>   }
>   
> +int
> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> +			       uint16_t nb_tx_desc, unsigned int socket_id,
> +			       const struct rte_eth_txconf *tx_conf,
> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_txconf local_conf;
> +	void **txq;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
> +				-ENOTSUP);
> +
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +
> +	/* Use default specified by driver, if nb_tx_desc is zero */
> +	if (nb_tx_desc == 0) {
> +		nb_tx_desc = dev_info.default_txportconf.ring_size;
> +		/* If driver default is zero, fall back on EAL default */
> +		if (nb_tx_desc == 0)
> +			nb_tx_desc = RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
> +	}
> +	if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
> +	    nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
> +	    nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_tx_desc(=%hu), "
> +			       "should be: <= %hu, >= %hu, and a product of "
> +			       " %hu\n",
> +			       nb_tx_desc, dev_info.tx_desc_lim.nb_max,
> +			       dev_info.tx_desc_lim.nb_min,
> +			       dev_info.tx_desc_lim.nb_align);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> +		return -EBUSY;
> +
> +	if (dev->data->dev_started &&
> +		(dev->data->tx_queue_state[tx_queue_id] !=
> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +
> +	txq = dev->data->tx_queues;
> +	if (txq[tx_queue_id]) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> +		txq[tx_queue_id] = NULL;
> +	}
> +
> +	if (tx_conf == NULL)
> +		tx_conf = &dev_info.default_txconf;
> +
> +	local_conf = *tx_conf;
> +
> +	/*
> +	 * If an offloading has already been enabled in
> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> +	 * so there is no need to enable it in this queue again.
> +	 * The local_conf.offloads input to underlying PMD only carries
> +	 * those offloadings which are only enabled on this queue and
> +	 * not enabled on all queues.
> +	 */
> +	local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
> +
> +	/*
> +	 * New added offloadings for this queue are those not enabled in
> +	 * rte_eth_dev_configure() and they must be per-queue type.
> +	 * A pure per-port offloading can't be enabled on a queue while
> +	 * disabled on another queue. A pure per-port offloading can't
> +	 * be enabled for any queue as new added one if it hasn't been
> +	 * enabled in rte_eth_dev_configure().
> +	 */
> +	if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
> +	     local_conf.offloads) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Ethdev port_id=%d tx_queue_id=%d, new added "
> +			       "offloads 0x%"PRIx64" must be within "
> +			       "per-queue offload capabilities 0x%"PRIx64" "
> +			       "in %s()\n",
> +			       port_id, tx_queue_id, local_conf.offloads,
> +			       dev_info.tx_queue_offload_capa,
> +			       __func__);
> +		return -EINVAL;
> +	}
> +
> +	return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
> +		       (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
> +			hairpin_conf));
> +}
> +
>   void
>   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
>   		void *userdata __rte_unused)
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 475dbda..b3b1597 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -803,6 +803,30 @@ struct rte_eth_txconf {
>   	uint64_t offloads;
>   };
>   
> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to hold hairpin peer data.
> + */
> +struct rte_eth_hairpin_peer {
> +	uint16_t port; /**< Peer port. */
> +	uint16_t queue; /**< Peer queue. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure hairpin binding.
> + */
> +struct rte_eth_hairpin_conf {
> +	uint16_t peer_n; /**< The number of peers. */
> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> +};
> +
>   /**
>    * A structure contains information about HW descriptor ring limitations.
>    */
> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		struct rte_mempool *mb_pool);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> + *
> + * The function set up the selected queue to be used in hairpin.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().

Is any Rx queue may be setup as hairpin queue?
Can it be still used for regular traffic?

> + * @param nb_rx_desc
> + *   The number of receive descriptors to allocate for the receive ring.

Does it still make sense for hairpin queue?

> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
> + *   the DMA memory allocated for the receive descriptors of the ring.

Is it still required to be provided for hairpin Rx queue?

> + * @param rx_conf
> + *   The pointer to the configuration data to be used for the receive queue.
> + *   NULL value is allowed, in which case default RX configuration
> + *   will be used.
> + *   The *rx_conf* structure contains an *rx_thresh* structure with the values
> + *   of the Prefetch, Host, and Write-Back threshold registers of the receive
> + *   ring.
> + *   In addition it contains the hardware offloads features to activate using
> + *   the DEV_RX_OFFLOAD_* flags.
> + *   If an offloading set in rx_conf->offloads
> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
> + *   per-queue type and it is enabled for the queue.
> + *   No need to repeat any bit in rx_conf->offloads which has already been
> + *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
> + *   at port level can't be disabled at queue level.

Which offloads still make sense in the case of hairpin Rx queue?
What about threshhods, drop enable?

> + * @param hairpin_conf
> + *   The pointer to the hairpin binding configuration.
> + * @return
> + *   - 0: Success, receive queue correctly set up.
> + *   - -EINVAL: The size of network buffers which can be allocated from the
> + *      memory pool does not fit the various buffer sizes allowed by the
> + *      device controller.
> + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
> + *      allocate network memory buffers from the memory pool when
> + *      initializing receive descriptors.
> + */
> +__rte_experimental
> +int rte_eth_rx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t rx_queue_id,
> +	 uint16_t nb_rx_desc, unsigned int socket_id,
> +	 const struct rte_eth_rxconf *rx_conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> +
> +/**
>    * Allocate and set up a transmit queue for an Ethernet device.
>    *
>    * @param port_id
> @@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		const struct rte_eth_txconf *tx_conf);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param tx_queue_id
> + *   The index of the transmit queue to set up.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().

Is any Tx queue may be setup as hairpin queue?

> + * @param nb_tx_desc
> + *   The number of transmit descriptors to allocate for the transmit ring.

Is it really required for hairpin queue? Are min/max/align limits still 
the same?

> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> + *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
> + *   the DMA memory allocated for the transmit descriptors of the ring.

Does it still make sense for Tx hairpin queue?

> + * @param tx_conf
> + *   The pointer to the configuration data to be used for the transmit queue.
> + *   NULL value is allowed, in which case default RX configuration
> + *   will be used.
> + *   The *tx_conf* structure contains the following data:
> + *   - The *tx_thresh* structure with the values of the Prefetch, Host, and
> + *     Write-Back threshold registers of the transmit ring.
> + *     When setting Write-Back threshold to the value greater then zero,
> + *     *tx_rs_thresh* value should be explicitly set to one.
> + *   - The *tx_free_thresh* value indicates the [minimum] number of network
> + *     buffers that must be pending in the transmit ring to trigger their
> + *     [implicit] freeing by the driver transmit function.
> + *   - The *tx_rs_thresh* value indicates the [minimum] number of transmit
> + *     descriptors that must be pending in the transmit ring before setting the
> + *     RS bit on a descriptor by the driver transmit function.
> + *     The *tx_rs_thresh* value should be less or equal then
> + *     *tx_free_thresh* value, and both of them should be less then
> + *     *nb_tx_desc* - 3.

I'm not sure that everything above makes sense for hairpin Tx queue.

> + *   - The *txq_flags* member contains flags to pass to the TX queue setup
> + *     function to configure the behavior of the TX queue. This should be set
> + *     to 0 if no special configuration is required.
> + *     This API is obsolete and will be deprecated. Applications
> + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
> + *     the offloads field below.

There is no txq_flags for a long time already. So, I'm wondering when it was
copies from rte_eth_tx_queue_setup().

> + *   - The *offloads* member contains Tx offloads to be enabled.
> + *     If an offloading set in tx_conf->offloads
> + *     hasn't been set in the input argument eth_conf->txmode.offloads
> + *     to rte_eth_dev_configure(), it is a new added offloading, it must be
> + *     per-queue type and it is enabled for the queue.
> + *     No need to repeat any bit in tx_conf->offloads which has already been
> + *     enabled in rte_eth_dev_configure() at port level. An offloading enabled
> + *     at port level can't be disabled at queue level.

Which offloads do really make sense and valid to use for hairpin Tx queues?
Do we need separate caps for hairpin offloads?

> + *
> + *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0 forces
> + *     the transmit function to use default values.
> + * @param hairpin_conf
> + *   The hairpin binding configuration.
> + *
> + * @return
> + *   - 0: Success, the transmit queue is correctly set up.
> + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
> + */
> +__rte_experimental
> +int rte_eth_tx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t tx_queue_id,
> +	 uint16_t nb_tx_desc, unsigned int socket_id,
> +	 const struct rte_eth_txconf *tx_conf,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> +
> +/**
>    * Return the NUMA socket to which an Ethernet device is connected
>    *
>    * @param port_id
>

[snip]

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 00/13] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (12 preceding siblings ...)
  2019-09-26  6:29 ` [dpdk-dev] [PATCH 13/13] doc: add hairpin feature Ori Kam
@ 2019-09-26 12:32 ` Andrew Rybchenko
  2019-09-26 15:22   ` Ori Kam
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-09-26 12:32 UTC (permalink / raw)
  To: Ori Kam
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger, thomas,
	ferruh.yigit, viacheslavo

On 9/26/19 9:28 AM, Ori Kam wrote:
> This patch set implements the hairpin feature.
> The hairpin feature was introduced in RFC[1]
>
> The hairpin feature (different name can be forward) acts as "bump on the wire",
> meaning that a packet that is received from the wire can be modified using
> offloaded action and then sent back to the wire without application intervention
> which save CPU cycles.
>
> The hairpin is the inverse function of loopback in which application
> sends a packet then it is received again by the
> application without being sent to the wire.
>
> The hairpin can be used by a number of different NVF, for example load
> balancer, gateway and so on.
>
> As can be seen from the hairpin description, hairpin is basically RX queue
> connected to TX queue.

Is it just a pipe or RTE flow API rules required?
If it is just a pipe, what about transformations which could be useful 
in this
case (encaps/decaps, NAT etc)? How to achieve it?
If it is not a pipe and flow API rules are required, why is peer information
required?

> During the design phase I was thinking of two ways to implement this
> feature the first one is adding a new rte flow action. and the second
> one is create a special kind of queue.
>
> The advantages of using the queue approch:
> 1. More control for the application. queue depth (the memory size that
> should be used).

But it inherits many parameters which are not really applicable to hairpin
queues. If all parameters are applicable, it should be explained in the
context of the hairpin queues.

> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> will be easy to integrate with such system.

Could you elaborate it.

> 3. Native integression with the rte flow API. Just setting the target
> queue/rss to hairpin queue, will result that the traffic will be routed
> to the hairpin queue.

It sounds like queues are not required for flow API at all.
If the goal is to send traffic outside to specified physical port,
just specify it as an flow API action. That's it.

> 4. Enable queue offloading.

Which offloads are applicable to hairpin queues?

> Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
> different ports assuming the PMD supports it. The same goes the other
> way each hairpin Txq can be connected to one or more Rxqs.
> This is the reason that both the Txq setup and Rxq setup are getting the
> hairpin configuration structure.
>
>  From PMD prespctive the number of Rxq/Txq is the total of standard
> queues + hairpin queues.
>
> To configure hairpin queue the user should call
> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
> of the normal queue setup functions.
>
> The hairpin queues are not part of the normal RSS functiosn.
>
> To use the queues the user simply create a flow that points to RSS/queue
> actions that are hairpin queues.
> The reason for selecting 2 new functions for hairpin queue setup are:
> 1. avoid API break.
> 2. avoid extra and unused parameters.
>
>
> This series must be applied after series[2]
>
> [1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
> [2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/

[snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 00/13] add hairpin feature
  2019-09-26 12:32 ` [dpdk-dev] [PATCH 00/13] " Andrew Rybchenko
@ 2019-09-26 15:22   ` Ori Kam
  2019-09-26 15:48     ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26 15:22 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	Thomas Monjalon, ferruh.yigit, Slava Ovsiienko

Hi Andrew,
Thanks for your comments please see blow.

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> On 9/26/19 9:28 AM, Ori Kam wrote:
> > This patch set implements the hairpin feature.
> > The hairpin feature was introduced in RFC[1]
> >
> > The hairpin feature (different name can be forward) acts as "bump on the
> wire",
> > meaning that a packet that is received from the wire can be modified using
> > offloaded action and then sent back to the wire without application
> intervention
> > which save CPU cycles.
> >
> > The hairpin is the inverse function of loopback in which application
> > sends a packet then it is received again by the
> > application without being sent to the wire.
> >
> > The hairpin can be used by a number of different NVF, for example load
> > balancer, gateway and so on.
> >
> > As can be seen from the hairpin description, hairpin is basically RX queue
> > connected to TX queue.
> 
> Is it just a pipe or RTE flow API rules required?
> If it is just a pipe, what about transformations which could be useful
> in this
> case (encaps/decaps, NAT etc)? How to achieve it?
> If it is not a pipe and flow API rules are required, why is peer information
> required?
> 

RTE flow is required, and the peer information is needed in order to connect between the RX queue to the
TX queue. From application it simply set ingress RTE flow rule that has queue or RSS actions,
with queues that are hairpin queues.
It may be possible to have one RX connected to number of TX queues in order to distribute the sending. 
 
> > During the design phase I was thinking of two ways to implement this
> > feature the first one is adding a new rte flow action. and the second
> > one is create a special kind of queue.
> >
> > The advantages of using the queue approch:
> > 1. More control for the application. queue depth (the memory size that
> > should be used).
> 
> But it inherits many parameters which are not really applicable to hairpin
> queues. If all parameters are applicable, it should be explained in the
> context of the hairpin queues.
> 
Most if not all parameters can be applicable also for hairpin queue.
And the one that wasn’t for example mempool was removed.

> > 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> > will be easy to integrate with such system.
> 
> Could you elaborate it.
> 
I will try.
If you are asking about use cases, we can assume a cloud provider that has number
of customers each with different bandwidth. We can configure a Tx queue with higher 
priority which will result in that this queue will get more bandwidth.
This is true also for hairpin and non-hairpin.
We are working on more detail API how to use it, but the HW can support it.

> > 3. Native integression with the rte flow API. Just setting the target
> > queue/rss to hairpin queue, will result that the traffic will be routed
> > to the hairpin queue.
> 
> It sounds like queues are not required for flow API at all.
> If the goal is to send traffic outside to specified physical port,
> just specify it as an flow API action. That's it.
> 
This was one of the possible options, but like stated above we think that there is more meaning to look
at it as a queue, which will give the application better control, for example selecting which queues
to connect to which queues. If it would have been done as RTE flow action then the PMD will create the queues and
binding internally and the application will lose control.

> > 4. Enable queue offloading.
> 
> Which offloads are applicable to hairpin queues?
> 
Vlan striping for example,  and all of the rte flow actions that targets a queue.

> > Each hairpin Rxq can be connected Txq / number of Txqs which can belong to
> a
> > different ports assuming the PMD supports it. The same goes the other
> > way each hairpin Txq can be connected to one or more Rxqs.
> > This is the reason that both the Txq setup and Rxq setup are getting the
> > hairpin configuration structure.
> >
> >  From PMD prespctive the number of Rxq/Txq is the total of standard
> > queues + hairpin queues.
> >
> > To configure hairpin queue the user should call
> > rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
> > of the normal queue setup functions.
> >
> > The hairpin queues are not part of the normal RSS functiosn.
> >
> > To use the queues the user simply create a flow that points to RSS/queue
> > actions that are hairpin queues.
> > The reason for selecting 2 new functions for hairpin queue setup are:
> > 1. avoid API break.
> > 2. avoid extra and unused parameters.
> >
> >
> > This series must be applied after series[2]
> >
> > [1]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
> k.org%2Fdev%2F1565703468-55617-1-git-send-email-
> orika%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.com%7
> C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d149256f4
> 61b%7C0%7C0%7C637050979561965175&amp;sdata=M%2F9hfQxEeYx23oHeS
> AQlzJmeWtOzaL%2FhWNmCC7u3E9g%3D&amp;reserved=0
> > [2]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
> k.org%2Fdev%2F1569398015-6027-1-git-send-email-
> viacheslavo%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.
> com%7C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d1
> 49256f461b%7C0%7C0%7C637050979561965175&amp;sdata=MP8hZ81ZO6br
> RoGeUY5v4%2FMIlFAhzAzryH4NW0MmnTI%3D&amp;reserved=0
> 
> [snip]

Thanks
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 00/13] add hairpin feature
  2019-09-26 15:22   ` Ori Kam
@ 2019-09-26 15:48     ` Andrew Rybchenko
  2019-09-26 16:11       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-09-26 15:48 UTC (permalink / raw)
  To: Ori Kam
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	Thomas Monjalon, ferruh.yigit, Slava Ovsiienko

Hi Ori,

On 9/26/19 6:22 PM, Ori Kam wrote:
> Hi Andrew,
> Thanks for your comments please see blow.
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>
>> On 9/26/19 9:28 AM, Ori Kam wrote:
>>> This patch set implements the hairpin feature.
>>> The hairpin feature was introduced in RFC[1]
>>>
>>> The hairpin feature (different name can be forward) acts as "bump on the
>> wire",
>>> meaning that a packet that is received from the wire can be modified using
>>> offloaded action and then sent back to the wire without application
>> intervention
>>> which save CPU cycles.
>>>
>>> The hairpin is the inverse function of loopback in which application
>>> sends a packet then it is received again by the
>>> application without being sent to the wire.
>>>
>>> The hairpin can be used by a number of different NVF, for example load
>>> balancer, gateway and so on.
>>>
>>> As can be seen from the hairpin description, hairpin is basically RX queue
>>> connected to TX queue.
>> Is it just a pipe or RTE flow API rules required?
>> If it is just a pipe, what about transformations which could be useful
>> in this
>> case (encaps/decaps, NAT etc)? How to achieve it?
>> If it is not a pipe and flow API rules are required, why is peer information
>> required?
>>
> RTE flow is required, and the peer information is needed in order to connect between the RX queue to the
> TX queue. From application it simply set ingress RTE flow rule that has queue or RSS actions,
> with queues that are hairpin queues.
> It may be possible to have one RX connected to number of TX queues in order to distribute the sending.

It looks like I start to understand. First, RTE flow does its job and
redirects some packets to hairpin Rx queue(s). Then, connection
of hairpin Rx queues to Tx queues does its job. What happens if
an Rx queue is connected to many Tx queues? Are packets duplicated?

>>> During the design phase I was thinking of two ways to implement this
>>> feature the first one is adding a new rte flow action. and the second
>>> one is create a special kind of queue.
>>>
>>> The advantages of using the queue approch:
>>> 1. More control for the application. queue depth (the memory size that
>>> should be used).
>> But it inherits many parameters which are not really applicable to hairpin
>> queues. If all parameters are applicable, it should be explained in the
>> context of the hairpin queues.
>>
> Most if not all parameters can be applicable also for hairpin queue.
> And the one that wasn’t for example mempool was removed.

I would really like to understand meaning of each Rx/Tx queue
configuration parameter for hairpin case. So, I hope to see it in the
documentation.

>>> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
>>> will be easy to integrate with such system.
>> Could you elaborate it.
>>
> I will try.
> If you are asking about use cases, we can assume a cloud provider that has number
> of customers each with different bandwidth. We can configure a Tx queue with higher
> priority which will result in that this queue will get more bandwidth.
> This is true also for hairpin and non-hairpin.
> We are working on more detail API how to use it, but the HW can support it.

OK, a bit abstract still, but makes sense.

>>> 3. Native integression with the rte flow API. Just setting the target
>>> queue/rss to hairpin queue, will result that the traffic will be routed
>>> to the hairpin queue.
>> It sounds like queues are not required for flow API at all.
>> If the goal is to send traffic outside to specified physical port,
>> just specify it as an flow API action. That's it.
>>
> This was one of the possible options, but like stated above we think that there is more meaning to look
> at it as a queue, which will give the application better control, for example selecting which queues
> to connect to which queues. If it would have been done as RTE flow action then the PMD will create the queues and
> binding internally and the application will lose control.
>
>>> 4. Enable queue offloading.
>> Which offloads are applicable to hairpin queues?
>>
> Vlan striping for example,  and all of the rte flow actions that targets a queue.

Can it be done with VLAN_POP action at RTE flow level?
The question is why we need it here as Rx queue offload.
Who will get and process stripped VLAN?
I don't understand what do you mean by the rte flow actions here.
Sorry, but I still think that many Rx and Tx offloads are not applicable.

>>> Each hairpin Rxq can be connected Txq / number of Txqs which can belong to
>> a
>>> different ports assuming the PMD supports it. The same goes the other
>>> way each hairpin Txq can be connected to one or more Rxqs.
>>> This is the reason that both the Txq setup and Rxq setup are getting the
>>> hairpin configuration structure.
>>>
>>>   From PMD prespctive the number of Rxq/Txq is the total of standard
>>> queues + hairpin queues.
>>>
>>> To configure hairpin queue the user should call
>>> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
>>> of the normal queue setup functions.
>>>
>>> The hairpin queues are not part of the normal RSS functiosn.
>>>
>>> To use the queues the user simply create a flow that points to RSS/queue
>>> actions that are hairpin queues.
>>> The reason for selecting 2 new functions for hairpin queue setup are:
>>> 1. avoid API break.
>>> 2. avoid extra and unused parameters.
>>>
>>>
>>> This series must be applied after series[2]
>>>
>>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
>> k.org%2Fdev%2F1565703468-55617-1-git-send-email-
>> orika%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.com%7
>> C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d149256f4
>> 61b%7C0%7C0%7C637050979561965175&amp;sdata=M%2F9hfQxEeYx23oHeS
>> AQlzJmeWtOzaL%2FhWNmCC7u3E9g%3D&amp;reserved=0
>>> [2]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
>> k.org%2Fdev%2F1569398015-6027-1-git-send-email-
>> viacheslavo%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.
>> com%7C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d1
>> 49256f461b%7C0%7C0%7C637050979561965175&amp;sdata=MP8hZ81ZO6br
>> RoGeUY5v4%2FMIlFAhzAzryH4NW0MmnTI%3D&amp;reserved=0
>>
>> [snip]
> Thanks
> Ori


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
       [not found]     ` <AM0PR0502MB4019A2FEADE5F9DCD0D9DDFED2860@AM0PR0502MB4019.eurprd05.prod.outlook.com>
@ 2019-09-26 15:58       ` Ori Kam
  2019-09-26 17:24         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-26 15:58 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, jingjing.wu, stephen

Hi Andrew,
Thanks for your comments PSB.
 
> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> On 9/26/19 9:28 AM, Ori Kam wrote:
> > This commit introduce the RX/TX hairpin setup function.
>
> > RX/TX should be Rx/Tx here and everywhere below.
> >
> > Hairpin is RX/TX queue that is used by the nic in order to offload
> > wire to wire traffic.
> >
> > Each hairpin queue is binded to one or more queues from other type.
> > For example TX hairpin queue should be binded to at least 1 RX hairpin
> > queue and vice versa.
>
> How should application find out that hairpin queues are supported?

It should be stated in the release note of the DPDK, when manufacture adds support for this.
In addition if the application try to set hairpin queue and it fails it can mean depending on the
error that the hairpin is not supported.

> How many?

There is no limit to the number of hairpin queues from application all queues can be hairpin queues.

> How should application find out which ports/queues could be used for
> pining?

All ports and queues can be supported, if the application request invalid combination, for example
in current Mellanox implementation binding between two ports then the setup function will  fail.

If you would like I can add capability for this, but there are too many options. For example number
of queues, binding limitations all of those will be very hard to declare. 


> Is hair-pinning domain on device level sufficient to expose limitations?
>
I'm sorry but I don’t understand your question.

> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > ---
> >   lib/librte_ethdev/rte_ethdev.c           | 213
>> +++++++++++++++++++++++++++++++
> >   lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
> >   lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
> >   lib/librte_ethdev/rte_ethdev_version.map |   4 +
> >   4 files changed, 380 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 30b0c78..4021f38 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
> >   }
> >
> >   int
> > +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t
> > rx_queue_id,
> > +			       uint16_t nb_rx_desc, unsigned int socket_id,
> > +			       const struct rte_eth_rxconf *rx_conf,
> > +			       const struct rte_eth_hairpin_conf *hairpin_conf)
>
> > Below code duplicates rte_eth_rx_queue_setup() a lot and it is very bad
> > from maintenance point of view. Similar problem with Tx hairpin queue
> > setup.
> >

I'm aware of that. The reasons I choose it are: (same goes to Tx)
1. use the same function approach, meaning to use the current  setup function
    the issues with this are:
     * API break.
     * It will have extra parameters, for example mempool will not be used
        for hairpin and hairpin configuration will not be used for normal queue.
        It is possible to use a struct but again API break and some fields are not used.
     * we are just starting with the hairpin, most likely there will be modification so
         it is better to have a different function.
     * From application he undertand that this is a different kind of queue, which shouldn't be 
         used by the application.

> > +{
> > +	int ret;
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_eth_rxconf local_conf;
> > +	void **rxq;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +
> > +	rte_eth_dev_info_get(port_id, &dev_info);
> > +
> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0) {
> > +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> > +		/* If driver default is also zero, fall back on EAL default */
> > +		if (nb_rx_desc == 0)
> > +			nb_rx_desc =
> RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> > +	}
> > +
> > +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> > +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> > +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> > +
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> > +			       "<= %hu, >= %hu, and a product of %hu\n",
> > +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> > +			dev_info.rx_desc_lim.nb_min,
> > +			dev_info.rx_desc_lim.nb_align);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +
> 	RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +
> > +	if (dev->data->dev_started &&
> > +		(dev->data->rx_queue_state[rx_queue_id] !=
> > +			RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> > +
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id]) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +
> > +	if (rx_conf == NULL)
> > +		rx_conf = &dev_info.default_rxconf;
> > +
> > +	local_conf = *rx_conf;
> > +
> > +	/*
> > +	 * If an offloading has already been enabled in
> > +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> > +	 * so there is no need to enable it in this queue again.
> > +	 * The local_conf.offloads input to underlying PMD only carries
> > +	 * those offloadings which are only enabled on this queue and
> > +	 * not enabled on all queues.
> > +	 */
> > +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> > +
> > +	/*
> > +	 * New added offloadings for this queue are those not enabled in
> > +	 * rte_eth_dev_configure() and they must be per-queue type.
> > +	 * A pure per-port offloading can't be enabled on a queue while
> > +	 * disabled on another queue. A pure per-port offloading can't
> > +	 * be enabled for any queue as new added one if it hasn't been
> > +	 * enabled in rte_eth_dev_configure().
> > +	 */
> > +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> > +	     local_conf.offloads) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Ethdev port_id=%d rx_queue_id=%d, "
> > +			"new added offloads 0x%"PRIx64" must be "
> > +			"within per-queue offload capabilities "
> > +			"0x%"PRIx64" in %s()\n",
> > +			port_id, rx_queue_id, local_conf.offloads,
> > +			dev_info.rx_queue_offload_capa,
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev,
> rx_queue_id,
> > +						      nb_rx_desc, socket_id,
> > +						      &local_conf,
> > +						      hairpin_conf);
> > +
> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> >   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >   		       uint16_t nb_tx_desc, unsigned int socket_id,
> >   		       const struct rte_eth_txconf *tx_conf) @@ -1799,6
> +1908,110
> > @@ struct rte_eth_dev *
> >   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >   }
> >
> > +int
> > +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t
> tx_queue_id,
> > +			       uint16_t nb_tx_desc, unsigned int socket_id,
> > +			       const struct rte_eth_txconf *tx_conf,
> > +			       const struct rte_eth_hairpin_conf *hairpin_conf)
> {
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_eth_txconf local_conf;
> > +	void **txq;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> tx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +
> > +	rte_eth_dev_info_get(port_id, &dev_info);
> > +
> > +	/* Use default specified by driver, if nb_tx_desc is zero */
> > +	if (nb_tx_desc == 0) {
> > +		nb_tx_desc = dev_info.default_txportconf.ring_size;
> > +		/* If driver default is zero, fall back on EAL default */
> > +		if (nb_tx_desc == 0)
> > +			nb_tx_desc =
> RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
> > +	}
> > +	if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
> > +	    nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
> > +	    nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_tx_desc(=%hu), "
> > +			       "should be: <= %hu, >= %hu, and a product of "
> > +			       " %hu\n",
> > +			       nb_tx_desc, dev_info.tx_desc_lim.nb_max,
> > +			       dev_info.tx_desc_lim.nb_min,
> > +			       dev_info.tx_desc_lim.nb_align);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +
> > +	if (dev->data->dev_started &&
> > +		(dev->data->tx_queue_state[tx_queue_id] !=
> > +		 RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> > +
> > +	txq = dev->data->tx_queues;
> > +	if (txq[tx_queue_id]) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > +		txq[tx_queue_id] = NULL;
> > +	}
> > +
> > +	if (tx_conf == NULL)
> > +		tx_conf = &dev_info.default_txconf;
> > +
> > +	local_conf = *tx_conf;
> > +
> > +	/*
> > +	 * If an offloading has already been enabled in
> > +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> > +	 * so there is no need to enable it in this queue again.
> > +	 * The local_conf.offloads input to underlying PMD only carries
> > +	 * those offloadings which are only enabled on this queue and
> > +	 * not enabled on all queues.
> > +	 */
> > +	local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
> > +
> > +	/*
> > +	 * New added offloadings for this queue are those not enabled in
> > +	 * rte_eth_dev_configure() and they must be per-queue type.
> > +	 * A pure per-port offloading can't be enabled on a queue while
> > +	 * disabled on another queue. A pure per-port offloading can't
> > +	 * be enabled for any queue as new added one if it hasn't been
> > +	 * enabled in rte_eth_dev_configure().
> > +	 */
> > +	if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
> > +	     local_conf.offloads) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Ethdev port_id=%d tx_queue_id=%d, new
> added "
> > +			       "offloads 0x%"PRIx64" must be within "
> > +			       "per-queue offload capabilities 0x%"PRIx64" "
> > +			       "in %s()\n",
> > +			       port_id, tx_queue_id, local_conf.offloads,
> > +			       dev_info.tx_queue_offload_capa,
> > +			       __func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
> > +		       (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
> > +			hairpin_conf));
> > +}
> > +
> >   void
> >   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
> unsent,
> >   		void *userdata __rte_unused)
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index 475dbda..b3b1597 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -803,6 +803,30 @@ struct rte_eth_txconf {
> >   	uint64_t offloads;
> >   };
> >
> > +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * A structure used to hold hairpin peer data.
> > + */
> > +struct rte_eth_hairpin_peer {
> > +	uint16_t port; /**< Peer port. */
> > +	uint16_t queue; /**< Peer queue. */
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * A structure used to configure hairpin binding.
> > + */
> > +struct rte_eth_hairpin_conf {
> > +	uint16_t peer_n; /**< The number of peers. */
> > +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> };
> > +
> >   /**
> >    * A structure contains information about HW descriptor ring limitations.
> >    */
> > @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		struct rte_mempool *mb_pool);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > + notice
> > + *
> > + * Allocate and set up a hairpin receive queue for an Ethernet device.
> > + *
> > + * The function set up the selected queue to be used in hairpin.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param rx_queue_id
> > + *   The index of the receive queue to set up.
> > + *   The value must be in the range [0, nb_rx_queue - 1] previously
> supplied
> > + *   to rte_eth_dev_configure().
>
> Is any Rx queue may be setup as hairpin queue?
> Can it be still used for regular traffic?
>

No if a queue is used as hairpin it can't be used for normal traffic.
This is also why I like the idea of two different functions, in order to create
This distinction.

> > + * @param nb_rx_desc
> > + *   The number of receive descriptors to allocate for the receive ring.
>
> Does it still make sense for hairpin queue?
>

Yes, since it can affect memory size used by the device, and can affect performance.

> > + * @param socket_id
> > + *   The *socket_id* argument is the socket identifier in case of NUMA.
> > + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint
> for
> > + *   the DMA memory allocated for the receive descriptors of the ring.
>
> Is it still required to be provided for hairpin Rx queue?
>

Yes, for internal PMD structures to be allocated, but we can if pressed remove it.

> > + * @param rx_conf
> > + *   The pointer to the configuration data to be used for the receive
> queue.
> > + *   NULL value is allowed, in which case default RX configuration
> > + *   will be used.
> > + *   The *rx_conf* structure contains an *rx_thresh* structure with the
> values
> > + *   of the Prefetch, Host, and Write-Back threshold registers of the
> receive
> > + *   ring.
> > + *   In addition it contains the hardware offloads features to activate using
> > + *   the DEV_RX_OFFLOAD_* flags.
> > + *   If an offloading set in rx_conf->offloads
> > + *   hasn't been set in the input argument eth_conf->rxmode.offloads
> > + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
> > + *   per-queue type and it is enabled for the queue.
> > + *   No need to repeat any bit in rx_conf->offloads which has already been
> > + *   enabled in rte_eth_dev_configure() at port level. An offloading
> enabled
> > + *   at port level can't be disabled at queue level.
>
> Which offloads still make sense in the case of hairpin Rx queue?
> What about threshhods, drop enable?
>

Drop and thresholds make sense, for example the application can state that,
in case of back pressure to start dropping packets in order not to affect the
entire nic.
regarding offloads mainly vlan strip or vlan insert but those can also 
be used in rte_flow.
But future offloads like QoS or other maybe shared.

> > + * @param hairpin_conf
> > + *   The pointer to the hairpin binding configuration.
> > + * @return
> > + *   - 0: Success, receive queue correctly set up.
> > + *   - -EINVAL: The size of network buffers which can be allocated from the
> > + *      memory pool does not fit the various buffer sizes allowed by the
> > + *      device controller.
> > + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
> > + *      allocate network memory buffers from the memory pool when
> > + *      initializing receive descriptors.
> > + */
> > +__rte_experimental
> > +int rte_eth_rx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t rx_queue_id,
> > +	 uint16_t nb_rx_desc, unsigned int socket_id,
> > +	 const struct rte_eth_rxconf *rx_conf,
> > +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> > +
> > +/**
> >    * Allocate and set up a transmit queue for an Ethernet device.
> >    *
> >    * @param port_id
> > @@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >   		const struct rte_eth_txconf *tx_conf);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > + notice
> > + *
> > + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param tx_queue_id
> > + *   The index of the transmit queue to set up.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously
> supplied
> > + *   to rte_eth_dev_configure().
>
> Is any Tx queue may be setup as hairpin queue?
>

Yes just like any Rx queue.

> > + * @param nb_tx_desc
> > + *   The number of transmit descriptors to allocate for the transmit ring.
>
> Is it really required for hairpin queue? Are min/max/align limits still the
> same?
>
The number of descriptors can effect memory and performance.
Regarding min/max/align I guess this depends on the implementation in the nic.

> > + * @param socket_id
> > + *   The *socket_id* argument is the socket identifier in case of NUMA.
> > + *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
> > + *   the DMA memory allocated for the transmit descriptors of the ring.
>
> Does it still make sense for Tx hairpin queue?
>

Same as for the RX, it is used for internal PMD structures, but maybe on 
other nics they can use this.

> > + * @param tx_conf
> > + *   The pointer to the configuration data to be used for the transmit
> queue.
> > + *   NULL value is allowed, in which case default RX configuration
> > + *   will be used.
> > + *   The *tx_conf* structure contains the following data:
> > + *   - The *tx_thresh* structure with the values of the Prefetch, Host, and
> > + *     Write-Back threshold registers of the transmit ring.
> > + *     When setting Write-Back threshold to the value greater then zero,
> > + *     *tx_rs_thresh* value should be explicitly set to one.
> > + *   - The *tx_free_thresh* value indicates the [minimum] number of
> network
> > + *     buffers that must be pending in the transmit ring to trigger their
> > + *     [implicit] freeing by the driver transmit function.
> > + *   - The *tx_rs_thresh* value indicates the [minimum] number of
> transmit
> > + *     descriptors that must be pending in the transmit ring before setting
> the
> > + *     RS bit on a descriptor by the driver transmit function.
> > + *     The *tx_rs_thresh* value should be less or equal then
> > + *     *tx_free_thresh* value, and both of them should be less then
> > + *     *nb_tx_desc* - 3.
>
> I'm not sure that everything above makes sense for hairpin Tx queue.
>

You are right not all of them make sense,
But since I don't know other nics I prefer to give them those values, if they need them.
If you wish I can change the documentation.

> > + *   - The *txq_flags* member contains flags to pass to the TX queue
> setup
> > + *     function to configure the behavior of the TX queue. This should be
> set
> > + *     to 0 if no special configuration is required.
> > + *     This API is obsolete and will be deprecated. Applications
> > + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
> > + *     the offloads field below.
>
> There is no txq_flags for a long time already. So, I'm wondering when it was
> copies from rte_eth_tx_queue_setup().
>
My bad from 17.11. will fix.

> > + *   - The *offloads* member contains Tx offloads to be enabled.
> > + *     If an offloading set in tx_conf->offloads
> > + *     hasn't been set in the input argument eth_conf->txmode.offloads
> > + *     to rte_eth_dev_configure(), it is a new added offloading, it must be
> > + *     per-queue type and it is enabled for the queue.
> > + *     No need to repeat any bit in tx_conf->offloads which has already
> been
> > + *     enabled in rte_eth_dev_configure() at port level. An offloading
> enabled
> > + *     at port level can't be disabled at queue level.
>
> Which offloads do really make sense and valid to use for hairpin Tx queues?
> Do we need separate caps for hairpin offloads?
>
I'm sure that we will need caps for example QoS but I don't know which yet.


> > + *
> > + *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0
> forces
> > + *     the transmit function to use default values.
> > + * @param hairpin_conf
> > + *   The hairpin binding configuration.
> > + *
> > + * @return
> > + *   - 0: Success, the transmit queue is correctly set up.
> > + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
> > + */
> > +__rte_experimental
> > +int rte_eth_tx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t tx_queue_id,
> > +	 uint16_t nb_tx_desc, unsigned int socket_id,
> > +	 const struct rte_eth_txconf *tx_conf,
> > +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> > +
> > +/**
> >    * Return the NUMA socket to which an Ethernet device is connected
> >    *
> >    * @param port_id
> >
>
> [snip]

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 00/13] add hairpin feature
  2019-09-26 15:48     ` Andrew Rybchenko
@ 2019-09-26 16:11       ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-09-26 16:11 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	Thomas Monjalon, ferruh.yigit, Slava Ovsiienko


Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> Hi Ori,
> 
> On 9/26/19 6:22 PM, Ori Kam wrote:
> > Hi Andrew,
> > Thanks for your comments please see blow.
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >>
> >> On 9/26/19 9:28 AM, Ori Kam wrote:
> >>> This patch set implements the hairpin feature.
> >>> The hairpin feature was introduced in RFC[1]
> >>>
> >>> The hairpin feature (different name can be forward) acts as "bump on the
> >> wire",
> >>> meaning that a packet that is received from the wire can be modified using
> >>> offloaded action and then sent back to the wire without application
> >> intervention
> >>> which save CPU cycles.
> >>>
> >>> The hairpin is the inverse function of loopback in which application
> >>> sends a packet then it is received again by the
> >>> application without being sent to the wire.
> >>>
> >>> The hairpin can be used by a number of different NVF, for example load
> >>> balancer, gateway and so on.
> >>>
> >>> As can be seen from the hairpin description, hairpin is basically RX queue
> >>> connected to TX queue.
> >> Is it just a pipe or RTE flow API rules required?
> >> If it is just a pipe, what about transformations which could be useful
> >> in this
> >> case (encaps/decaps, NAT etc)? How to achieve it?
> >> If it is not a pipe and flow API rules are required, why is peer information
> >> required?
> >>
> > RTE flow is required, and the peer information is needed in order to connect
> between the RX queue to the
> > TX queue. From application it simply set ingress RTE flow rule that has queue
> or RSS actions,
> > with queues that are hairpin queues.
> > It may be possible to have one RX connected to number of TX queues in order
> to distribute the sending.
> 
> It looks like I start to understand. First, RTE flow does its job and
> redirects some packets to hairpin Rx queue(s). Then, connection
> of hairpin Rx queues to Tx queues does its job. What happens if
> an Rx queue is connected to many Tx queues? Are packets duplicated?
> 


Yes you are correct in your understanding.
Regarding number of TX to a single Rx queue, that is an answer I can't
give you, it depends on the nic. It could duplicate or it could RSS it.
In Mellanox we currently support only 1 to 1 connection.

> >>> During the design phase I was thinking of two ways to implement this
> >>> feature the first one is adding a new rte flow action. and the second
> >>> one is create a special kind of queue.
> >>>
> >>> The advantages of using the queue approch:
> >>> 1. More control for the application. queue depth (the memory size that
> >>> should be used).
> >> But it inherits many parameters which are not really applicable to hairpin
> >> queues. If all parameters are applicable, it should be explained in the
> >> context of the hairpin queues.
> >>
> > Most if not all parameters can be applicable also for hairpin queue.
> > And the one that wasn’t for example mempool was removed.
> 
> I would really like to understand meaning of each Rx/Tx queue
> configuration parameter for hairpin case. So, I hope to see it in the
> documentation.
> 

Those are just like the normal queue, maybe some nics needs this information.

> >>> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> >>> will be easy to integrate with such system.
> >> Could you elaborate it.
> >>
> > I will try.
> > If you are asking about use cases, we can assume a cloud provider that has
> number
> > of customers each with different bandwidth. We can configure a Tx queue
> with higher
> > priority which will result in that this queue will get more bandwidth.
> > This is true also for hairpin and non-hairpin.
> > We are working on more detail API how to use it, but the HW can support it.
> 
> OK, a bit abstract still, but makes sense.
> 
😊 
> >>> 3. Native integression with the rte flow API. Just setting the target
> >>> queue/rss to hairpin queue, will result that the traffic will be routed
> >>> to the hairpin queue.
> >> It sounds like queues are not required for flow API at all.
> >> If the goal is to send traffic outside to specified physical port,
> >> just specify it as an flow API action. That's it.
> >>
> > This was one of the possible options, but like stated above we think that there
> is more meaning to look
> > at it as a queue, which will give the application better control, for example
> selecting which queues
> > to connect to which queues. If it would have been done as RTE flow action
> then the PMD will create the queues and
> > binding internally and the application will lose control.
> >
> >>> 4. Enable queue offloading.
> >> Which offloads are applicable to hairpin queues?
> >>
> > Vlan striping for example,  and all of the rte flow actions that targets a
> queue.
> 
> Can it be done with VLAN_POP action at RTE flow level?
> The question is why we need it here as Rx queue offload.
> Who will get and process stripped VLAN?
> I don't understand what do you mean by the rte flow actions here.
> Sorry, but I still think that many Rx and Tx offloads are not applicable.
> 

I agree with you, first all important actions can be done using RTE flow.
But maybe some nics don't use RTE flows then it is good for them.
The most important reason is that I think that in future we will have shared
offloads, 

> >>> Each hairpin Rxq can be connected Txq / number of Txqs which can belong
> to
> >> a
> >>> different ports assuming the PMD supports it. The same goes the other
> >>> way each hairpin Txq can be connected to one or more Rxqs.
> >>> This is the reason that both the Txq setup and Rxq setup are getting the
> >>> hairpin configuration structure.
> >>>
> >>>   From PMD prespctive the number of Rxq/Txq is the total of standard
> >>> queues + hairpin queues.
> >>>
> >>> To configure hairpin queue the user should call
> >>> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup
> insteed
> >>> of the normal queue setup functions.
> >>>
> >>> The hairpin queues are not part of the normal RSS functiosn.
> >>>
> >>> To use the queues the user simply create a flow that points to RSS/queue
> >>> actions that are hairpin queues.
> >>> The reason for selecting 2 new functions for hairpin queue setup are:
> >>> 1. avoid API break.
> >>> 2. avoid extra and unused parameters.
> >>>
> >>>
> >>> This series must be applied after series[2]
> >>>
> >>> [1]
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
> >> k.org%2Fdev%2F1565703468-55617-1-git-send-email-
> >>
> orika%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.com%7
> >>
> C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d149256f4
> >>
> 61b%7C0%7C0%7C637050979561965175&amp;sdata=M%2F9hfQxEeYx23oHeS
> >> AQlzJmeWtOzaL%2FhWNmCC7u3E9g%3D&amp;reserved=0
> >>> [2]
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.dpd
> >> k.org%2Fdev%2F1569398015-6027-1-git-send-email-
> >>
> viacheslavo%40mellanox.com%2F&amp;data=02%7C01%7Corika%40mellanox.
> >>
> com%7C3f32608241834727763208d7427d9b85%7Ca652971c7d2e4d9ba6a4d1
> >>
> 49256f461b%7C0%7C0%7C637050979561965175&amp;sdata=MP8hZ81ZO6br
> >> RoGeUY5v4%2FMIlFAhzAzryH4NW0MmnTI%3D&amp;reserved=0
> >>
> >> [snip]
> > Thanks
> > Ori

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-26 15:58       ` Ori Kam
@ 2019-09-26 17:24         ` Andrew Rybchenko
  2019-09-28 15:19           ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-09-26 17:24 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

On 9/26/19 6:58 PM, Ori Kam wrote:
> Hi Andrew,
> Thanks for your comments PSB.
>   
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> On 9/26/19 9:28 AM, Ori Kam wrote:
>>> This commit introduce the RX/TX hairpin setup function.
>>> RX/TX should be Rx/Tx here and everywhere below.
>>>
>>> Hairpin is RX/TX queue that is used by the nic in order to offload
>>> wire to wire traffic.
>>>
>>> Each hairpin queue is binded to one or more queues from other type.
>>> For example TX hairpin queue should be binded to at least 1 RX hairpin
>>> queue and vice versa.
>> How should application find out that hairpin queues are supported?
> It should be stated in the release note of the DPDK, when manufacture adds support for this.
> In addition if the application try to set hairpin queue and it fails it can mean depending on the
> error that the hairpin is not supported.

I'm talking about dev_info-like information. Documentation is nice, but 
it is not
very useful to implement application which works with NICs from 
different vendors.

>> How many?
> There is no limit to the number of hairpin queues from application all queues can be hairpin queues.

I'm pretty sure that it could be vendor specific.

>> How should application find out which ports/queues could be used for
>> pining?
> All ports and queues can be supported, if the application request invalid combination, for example
> in current Mellanox implementation binding between two ports then the setup function will  fail.
>
> If you would like I can add capability for this, but there are too many options. For example number
> of queues, binding limitations all of those will be very hard to declare.
>
>
>> Is hair-pinning domain on device level sufficient to expose limitations?
>>
> I'm sorry but I don’t understand your question.

I was just trying to imagine how we could  say that we can hairpin
one port Rx queues to another port Tx queues.

>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>> ---
>>>    lib/librte_ethdev/rte_ethdev.c           | 213
>>> +++++++++++++++++++++++++++++++
>>>    lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
>>>    lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
>>>    lib/librte_ethdev/rte_ethdev_version.map |   4 +
>>>    4 files changed, 380 insertions(+)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>>> b/lib/librte_ethdev/rte_ethdev.c index 30b0c78..4021f38 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
>>>    }
>>>
>>>    int
>>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t
>>> rx_queue_id,
>>> +			       uint16_t nb_rx_desc, unsigned int socket_id,
>>> +			       const struct rte_eth_rxconf *rx_conf,
>>> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
>>> Below code duplicates rte_eth_rx_queue_setup() a lot and it is very bad
>>> from maintenance point of view. Similar problem with Tx hairpin queue
>>> setup.
>>>
> I'm aware of that. The reasons I choose it are: (same goes to Tx)
> 1. use the same function approach, meaning to use the current  setup function
>      the issues with this are:
>       * API break.
>       * It will have extra parameters, for example mempool will not be used
>          for hairpin and hairpin configuration will not be used for normal queue.
>          It is possible to use a struct but again API break and some fields are not used.
>       * we are just starting with the hairpin, most likely there will be modification so
>           it is better to have a different function.
>       * From application he undertand that this is a different kind of queue, which shouldn't be
>           used by the application.

It does not excuse to duplicate so much code below. If we have separate
dev_info-like limitations for hairpin, it would make sense, but I hope that
it would be still possible to avoid code duplication.

>>> +{
>>> +	int ret;
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	struct rte_eth_rxconf local_conf;
>>> +	void **rxq;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>> rx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
>> ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>>> +
>>> +	rte_eth_dev_info_get(port_id, &dev_info);
>>> +
>>> +	/* Use default specified by driver, if nb_rx_desc is zero */
>>> +	if (nb_rx_desc == 0) {
>>> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
>>> +		/* If driver default is also zero, fall back on EAL default */
>>> +		if (nb_rx_desc == 0)
>>> +			nb_rx_desc =
>> RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
>>> +	}
>>> +
>>> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
>>> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
>>> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
>>> +
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
>>> +			       "<= %hu, >= %hu, and a product of %hu\n",
>>> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
>>> +			dev_info.rx_desc_lim.nb_min,
>>> +			dev_info.rx_desc_lim.nb_align);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		!(dev_info.dev_capa &
>>> +
>> 	RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
>>> +		return -EBUSY;
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		(dev->data->rx_queue_state[rx_queue_id] !=
>>> +			RTE_ETH_QUEUE_STATE_STOPPED))
>>> +		return -EBUSY;
>>> +
>>> +	rxq = dev->data->rx_queues;
>>> +	if (rxq[rx_queue_id]) {
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>>> +		rxq[rx_queue_id] = NULL;
>>> +	}
>>> +
>>> +	if (rx_conf == NULL)
>>> +		rx_conf = &dev_info.default_rxconf;
>>> +
>>> +	local_conf = *rx_conf;
>>> +
>>> +	/*
>>> +	 * If an offloading has already been enabled in
>>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
>>> +	 * so there is no need to enable it in this queue again.
>>> +	 * The local_conf.offloads input to underlying PMD only carries
>>> +	 * those offloadings which are only enabled on this queue and
>>> +	 * not enabled on all queues.
>>> +	 */
>>> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
>>> +
>>> +	/*
>>> +	 * New added offloadings for this queue are those not enabled in
>>> +	 * rte_eth_dev_configure() and they must be per-queue type.
>>> +	 * A pure per-port offloading can't be enabled on a queue while
>>> +	 * disabled on another queue. A pure per-port offloading can't
>>> +	 * be enabled for any queue as new added one if it hasn't been
>>> +	 * enabled in rte_eth_dev_configure().
>>> +	 */
>>> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
>>> +	     local_conf.offloads) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Ethdev port_id=%d rx_queue_id=%d, "
>>> +			"new added offloads 0x%"PRIx64" must be "
>>> +			"within per-queue offload capabilities "
>>> +			"0x%"PRIx64" in %s()\n",
>>> +			port_id, rx_queue_id, local_conf.offloads,
>>> +			dev_info.rx_queue_offload_capa,
>>> +			__func__);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev,
>> rx_queue_id,
>>> +						      nb_rx_desc, socket_id,
>>> +						      &local_conf,
>>> +						      hairpin_conf);
>>> +
>>> +	return eth_err(port_id, ret);
>>> +}
>>> +
>>> +int
>>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
>>>    		       const struct rte_eth_txconf *tx_conf) @@ -1799,6
>> +1908,110
>>> @@ struct rte_eth_dev *
>>>    		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>>>    }
>>>
>>> +int
>>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t
>> tx_queue_id,
>>> +			       uint16_t nb_tx_desc, unsigned int socket_id,
>>> +			       const struct rte_eth_txconf *tx_conf,
>>> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
>> {
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	struct rte_eth_txconf local_conf;
>>> +	void **txq;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>> tx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
>> ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>>> +
>>> +	rte_eth_dev_info_get(port_id, &dev_info);
>>> +
>>> +	/* Use default specified by driver, if nb_tx_desc is zero */
>>> +	if (nb_tx_desc == 0) {
>>> +		nb_tx_desc = dev_info.default_txportconf.ring_size;
>>> +		/* If driver default is zero, fall back on EAL default */
>>> +		if (nb_tx_desc == 0)
>>> +			nb_tx_desc =
>> RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
>>> +	}
>>> +	if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
>>> +	    nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
>>> +	    nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for nb_tx_desc(=%hu), "
>>> +			       "should be: <= %hu, >= %hu, and a product of "
>>> +			       " %hu\n",
>>> +			       nb_tx_desc, dev_info.tx_desc_lim.nb_max,
>>> +			       dev_info.tx_desc_lim.nb_min,
>>> +			       dev_info.tx_desc_lim.nb_align);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		!(dev_info.dev_capa &
>>> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
>>> +		return -EBUSY;
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		(dev->data->tx_queue_state[tx_queue_id] !=
>>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
>>> +		return -EBUSY;
>>> +
>>> +	txq = dev->data->tx_queues;
>>> +	if (txq[tx_queue_id]) {
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>>> +		txq[tx_queue_id] = NULL;
>>> +	}
>>> +
>>> +	if (tx_conf == NULL)
>>> +		tx_conf = &dev_info.default_txconf;
>>> +
>>> +	local_conf = *tx_conf;
>>> +
>>> +	/*
>>> +	 * If an offloading has already been enabled in
>>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
>>> +	 * so there is no need to enable it in this queue again.
>>> +	 * The local_conf.offloads input to underlying PMD only carries
>>> +	 * those offloadings which are only enabled on this queue and
>>> +	 * not enabled on all queues.
>>> +	 */
>>> +	local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
>>> +
>>> +	/*
>>> +	 * New added offloadings for this queue are those not enabled in
>>> +	 * rte_eth_dev_configure() and they must be per-queue type.
>>> +	 * A pure per-port offloading can't be enabled on a queue while
>>> +	 * disabled on another queue. A pure per-port offloading can't
>>> +	 * be enabled for any queue as new added one if it hasn't been
>>> +	 * enabled in rte_eth_dev_configure().
>>> +	 */
>>> +	if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
>>> +	     local_conf.offloads) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Ethdev port_id=%d tx_queue_id=%d, new
>> added "
>>> +			       "offloads 0x%"PRIx64" must be within "
>>> +			       "per-queue offload capabilities 0x%"PRIx64" "
>>> +			       "in %s()\n",
>>> +			       port_id, tx_queue_id, local_conf.offloads,
>>> +			       dev_info.tx_queue_offload_capa,
>>> +			       __func__);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
>>> +		       (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
>>> +			hairpin_conf));
>>> +}
>>> +
>>>    void
>>>    rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
>> unsent,
>>>    		void *userdata __rte_unused)
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h
>>> b/lib/librte_ethdev/rte_ethdev.h index 475dbda..b3b1597 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -803,6 +803,30 @@ struct rte_eth_txconf {
>>>    	uint64_t offloads;
>>>    };
>>>
>>> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>> +notice
>>> + *
>>> + * A structure used to hold hairpin peer data.
>>> + */
>>> +struct rte_eth_hairpin_peer {
>>> +	uint16_t port; /**< Peer port. */
>>> +	uint16_t queue; /**< Peer queue. */
>>> +};
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>> +notice
>>> + *
>>> + * A structure used to configure hairpin binding.
>>> + */
>>> +struct rte_eth_hairpin_conf {
>>> +	uint16_t peer_n; /**< The number of peers. */
>>> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
>> };
>>> +
>>>    /**
>>>     * A structure contains information about HW descriptor ring limitations.
>>>     */
>>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		struct rte_mempool *mb_pool);
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>> + notice
>>> + *
>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
>>> + *
>>> + * The function set up the selected queue to be used in hairpin.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param rx_queue_id
>>> + *   The index of the receive queue to set up.
>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>> supplied
>>> + *   to rte_eth_dev_configure().
>> Is any Rx queue may be setup as hairpin queue?
>> Can it be still used for regular traffic?
>>
> No if a queue is used as hairpin it can't be used for normal traffic.
> This is also why I like the idea of two different functions, in order to create
> This distinction.

If so, do we need at least debug-level checks in Tx/Rx burst functions?
Is it required to patch rte flow RSS action to ensure that Rx queues of
only one kind are specified?
What about attempt to add Rx/Tx callbacks for hairpin queues?

>>> + * @param nb_rx_desc
>>> + *   The number of receive descriptors to allocate for the receive ring.
>> Does it still make sense for hairpin queue?
>>
> Yes, since it can affect memory size used by the device, and can affect performance.
>
>>> + * @param socket_id
>>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
>>> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint
>> for
>>> + *   the DMA memory allocated for the receive descriptors of the ring.
>> Is it still required to be provided for hairpin Rx queue?
>>
> Yes, for internal PMD structures to be allocated, but we can if pressed remove it.
>
>>> + * @param rx_conf
>>> + *   The pointer to the configuration data to be used for the receive
>> queue.
>>> + *   NULL value is allowed, in which case default RX configuration
>>> + *   will be used.
>>> + *   The *rx_conf* structure contains an *rx_thresh* structure with the
>> values
>>> + *   of the Prefetch, Host, and Write-Back threshold registers of the
>> receive
>>> + *   ring.
>>> + *   In addition it contains the hardware offloads features to activate using
>>> + *   the DEV_RX_OFFLOAD_* flags.
>>> + *   If an offloading set in rx_conf->offloads
>>> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
>>> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
>>> + *   per-queue type and it is enabled for the queue.
>>> + *   No need to repeat any bit in rx_conf->offloads which has already been
>>> + *   enabled in rte_eth_dev_configure() at port level. An offloading
>> enabled
>>> + *   at port level can't be disabled at queue level.
>> Which offloads still make sense in the case of hairpin Rx queue?
>> What about threshhods, drop enable?
>>
> Drop and thresholds make sense, for example the application can state that,
> in case of back pressure to start dropping packets in order not to affect the
> entire nic.
> regarding offloads mainly vlan strip or vlan insert but those can also
> be used in rte_flow.
> But future offloads like QoS or other maybe shared.

I'm not a fan of dead parameters which are added just to use
the same structure. It raises too many questions on maintenance.
Also I don't like idea to share hairpin and regular offloads.
May be it is OK to share namespace (still unsure), but capabilities
are definitely different and some regular offloads are simply not
applicable to hairpin case.

>>> + * @param hairpin_conf
>>> + *   The pointer to the hairpin binding configuration.
>>> + * @return
>>> + *   - 0: Success, receive queue correctly set up.
>>> + *   - -EINVAL: The size of network buffers which can be allocated from the
>>> + *      memory pool does not fit the various buffer sizes allowed by the
>>> + *      device controller.
>>> + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
>>> + *      allocate network memory buffers from the memory pool when
>>> + *      initializing receive descriptors.
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_rx_hairpin_queue_setup
>>> +	(uint16_t port_id, uint16_t rx_queue_id,
>>> +	 uint16_t nb_rx_desc, unsigned int socket_id,
>>> +	 const struct rte_eth_rxconf *rx_conf,
>>> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>>> +
>>> +/**
>>>     * Allocate and set up a transmit queue for an Ethernet device.
>>>     *
>>>     * @param port_id
>>> @@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
>> uint16_t tx_queue_id,
>>>    		const struct rte_eth_txconf *tx_conf);
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>> + notice
>>> + *
>>> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param tx_queue_id
>>> + *   The index of the transmit queue to set up.
>>> + *   The value must be in the range [0, nb_tx_queue - 1] previously
>> supplied
>>> + *   to rte_eth_dev_configure().
>> Is any Tx queue may be setup as hairpin queue?
>>
> Yes just like any Rx queue.
>
>>> + * @param nb_tx_desc
>>> + *   The number of transmit descriptors to allocate for the transmit ring.
>> Is it really required for hairpin queue? Are min/max/align limits still the
>> same?
>>
> The number of descriptors can effect memory and performance.
> Regarding min/max/align I guess this depends on the implementation in the nic.

Again, it looks like separate dev_info-like information.

>>> + * @param socket_id
>>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
>>> + *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
>>> + *   the DMA memory allocated for the transmit descriptors of the ring.
>> Does it still make sense for Tx hairpin queue?
>>
> Same as for the RX, it is used for internal PMD structures, but maybe on
> other nics they can use this.
>
>>> + * @param tx_conf
>>> + *   The pointer to the configuration data to be used for the transmit
>> queue.
>>> + *   NULL value is allowed, in which case default RX configuration
>>> + *   will be used.
>>> + *   The *tx_conf* structure contains the following data:
>>> + *   - The *tx_thresh* structure with the values of the Prefetch, Host, and
>>> + *     Write-Back threshold registers of the transmit ring.
>>> + *     When setting Write-Back threshold to the value greater then zero,
>>> + *     *tx_rs_thresh* value should be explicitly set to one.
>>> + *   - The *tx_free_thresh* value indicates the [minimum] number of
>> network
>>> + *     buffers that must be pending in the transmit ring to trigger their
>>> + *     [implicit] freeing by the driver transmit function.
>>> + *   - The *tx_rs_thresh* value indicates the [minimum] number of
>> transmit
>>> + *     descriptors that must be pending in the transmit ring before setting
>> the
>>> + *     RS bit on a descriptor by the driver transmit function.
>>> + *     The *tx_rs_thresh* value should be less or equal then
>>> + *     *tx_free_thresh* value, and both of them should be less then
>>> + *     *nb_tx_desc* - 3.
>> I'm not sure that everything above makes sense for hairpin Tx queue.
>>
> You are right not all of them make sense,
> But since I don't know other nics I prefer to give them those values, if they need them.
> If you wish I can change the documentation.

Dead parameters are not nice.

>>> + *   - The *txq_flags* member contains flags to pass to the TX queue
>> setup
>>> + *     function to configure the behavior of the TX queue. This should be
>> set
>>> + *     to 0 if no special configuration is required.
>>> + *     This API is obsolete and will be deprecated. Applications
>>> + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
>>> + *     the offloads field below.
>> There is no txq_flags for a long time already. So, I'm wondering when it was
>> copies from rte_eth_tx_queue_setup().
>>
> My bad from 17.11. will fix.
>
>>> + *   - The *offloads* member contains Tx offloads to be enabled.
>>> + *     If an offloading set in tx_conf->offloads
>>> + *     hasn't been set in the input argument eth_conf->txmode.offloads
>>> + *     to rte_eth_dev_configure(), it is a new added offloading, it must be
>>> + *     per-queue type and it is enabled for the queue.
>>> + *     No need to repeat any bit in tx_conf->offloads which has already
>> been
>>> + *     enabled in rte_eth_dev_configure() at port level. An offloading
>> enabled
>>> + *     at port level can't be disabled at queue level.
>> Which offloads do really make sense and valid to use for hairpin Tx queues?
>> Do we need separate caps for hairpin offloads?
>>
> I'm sure that we will need caps for example QoS but I don't know which yet.

Same as Rx.

>>> + *
>>> + *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0
>> forces
>>> + *     the transmit function to use default values.
>>> + * @param hairpin_conf
>>> + *   The hairpin binding configuration.
>>> + *
>>> + * @return
>>> + *   - 0: Success, the transmit queue is correctly set up.
>>> + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_tx_hairpin_queue_setup
>>> +	(uint16_t port_id, uint16_t tx_queue_id,
>>> +	 uint16_t nb_tx_desc, unsigned int socket_id,
>>> +	 const struct rte_eth_txconf *tx_conf,
>>> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>>> +
>>> +/**
>>>     * Return the NUMA socket to which an Ethernet device is connected
>>>     *
>>>     * @param port_id
>>>
>> [snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-26 17:24         ` Andrew Rybchenko
@ 2019-09-28 15:19           ` Ori Kam
  2019-09-29 12:10             ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-09-28 15:19 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew.
PSB

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Thursday, September 26, 2019 8:24 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for
> hairpin queue
> 
> On 9/26/19 6:58 PM, Ori Kam wrote:
> > Hi Andrew,
> > Thanks for your comments PSB.
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> On 9/26/19 9:28 AM, Ori Kam wrote:
> >>> This commit introduce the RX/TX hairpin setup function.
> >>> RX/TX should be Rx/Tx here and everywhere below.
> >>>
> >>> Hairpin is RX/TX queue that is used by the nic in order to offload
> >>> wire to wire traffic.
> >>>
> >>> Each hairpin queue is binded to one or more queues from other type.
> >>> For example TX hairpin queue should be binded to at least 1 RX hairpin
> >>> queue and vice versa.
> >> How should application find out that hairpin queues are supported?
> > It should be stated in the release note of the DPDK, when manufacture adds
> support for this.
> > In addition if the application try to set hairpin queue and it fails it can mean
> depending on the
> > error that the hairpin is not supported.
> 
> I'm talking about dev_info-like information. Documentation is nice, but
> it is not
> very useful to implement application which works with NICs from
> different vendors.
> 

What if we add get hairpin capabilities function.
We could have,  the max number of queues, if the nic support 1:n connection,
which offloads are supported and so on. So basically create a new set of capabilities
for hairpin this I think will also remove other concern that you have.
What do you think?
  
> >> How many?
> > There is no limit to the number of hairpin queues from application all queues
> can be hairpin queues.
> 
> I'm pretty sure that it could be vendor specific.
>

Please see my answer above.
 
> >> How should application find out which ports/queues could be used for
> >> pining?
> > All ports and queues can be supported, if the application request invalid
> combination, for example
> > in current Mellanox implementation binding between two ports then the
> setup function will  fail.
> >
> > If you would like I can add capability for this, but there are too many options.
> For example number
> > of queues, binding limitations all of those will be very hard to declare.
> >
> >
> >> Is hair-pinning domain on device level sufficient to expose limitations?
> >>
> > I'm sorry but I don’t understand your question.
> 
> I was just trying to imagine how we could  say that we can hairpin
> one port Rx queues to another port Tx queues.
>

Like I suggested above if I will add a capability function we could have
a field that says port_binidng supported, or something else, along this line.
 
> >>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >>> ---
> >>>    lib/librte_ethdev/rte_ethdev.c           | 213
> >>> +++++++++++++++++++++++++++++++
> >>>    lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
> >>>    lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
> >>>    lib/librte_ethdev/rte_ethdev_version.map |   4 +
> >>>    4 files changed, 380 insertions(+)
> >>>
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.c
> >>> b/lib/librte_ethdev/rte_ethdev.c index 30b0c78..4021f38 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.c
> >>> +++ b/lib/librte_ethdev/rte_ethdev.c
> >>> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
> >>>    }
> >>>
> >>>    int
> >>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t
> >>> rx_queue_id,
> >>> +			       uint16_t nb_rx_desc, unsigned int socket_id,
> >>> +			       const struct rte_eth_rxconf *rx_conf,
> >>> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
> >>> Below code duplicates rte_eth_rx_queue_setup() a lot and it is very bad
> >>> from maintenance point of view. Similar problem with Tx hairpin queue
> >>> setup.
> >>>
> > I'm aware of that. The reasons I choose it are: (same goes to Tx)
> > 1. use the same function approach, meaning to use the current  setup
> function
> >      the issues with this are:
> >       * API break.
> >       * It will have extra parameters, for example mempool will not be used
> >          for hairpin and hairpin configuration will not be used for normal queue.
> >          It is possible to use a struct but again API break and some fields are not
> used.
> >       * we are just starting with the hairpin, most likely there will be
> modification so
> >           it is better to have a different function.
> >       * From application he undertand that this is a different kind of queue,
> which shouldn't be
> >           used by the application.
> 
> It does not excuse to duplicate so much code below. If we have separate
> dev_info-like limitations for hairpin, it would make sense, but I hope that
> it would be still possible to avoid code duplication.
> 

We can start with the most basic implementation, which will mean that the function
will almost be empty, when other vendors or Mellanox will require some additional
test or code they will be able to decide if to add new code to he function, or
extract the shared code from the standard function to a specific function, and
use this function in both setup functions.
What do you think? 

> >>> +{
> >>> +	int ret;
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_dev_info dev_info;
> >>> +	struct rte_eth_rxconf local_conf;
> >>> +	void **rxq;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> >> rx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> >> ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >>> +
> >>> +	rte_eth_dev_info_get(port_id, &dev_info);
> >>> +
> >>> +	/* Use default specified by driver, if nb_rx_desc is zero */
> >>> +	if (nb_rx_desc == 0) {
> >>> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> >>> +		/* If driver default is also zero, fall back on EAL default */
> >>> +		if (nb_rx_desc == 0)
> >>> +			nb_rx_desc =
> >> RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> >>> +	}
> >>> +
> >>> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> >>> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> >>> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> >>> +
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> >>> +			       "<= %hu, >= %hu, and a product of %hu\n",
> >>> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> >>> +			dev_info.rx_desc_lim.nb_min,
> >>> +			dev_info.rx_desc_lim.nb_align);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		!(dev_info.dev_capa &
> >>> +
> >> 	RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> >>> +		return -EBUSY;
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		(dev->data->rx_queue_state[rx_queue_id] !=
> >>> +			RTE_ETH_QUEUE_STATE_STOPPED))
> >>> +		return -EBUSY;
> >>> +
> >>> +	rxq = dev->data->rx_queues;
> >>> +	if (rxq[rx_queue_id]) {
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >>> +		rxq[rx_queue_id] = NULL;
> >>> +	}
> >>> +
> >>> +	if (rx_conf == NULL)
> >>> +		rx_conf = &dev_info.default_rxconf;
> >>> +
> >>> +	local_conf = *rx_conf;
> >>> +
> >>> +	/*
> >>> +	 * If an offloading has already been enabled in
> >>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> >>> +	 * so there is no need to enable it in this queue again.
> >>> +	 * The local_conf.offloads input to underlying PMD only carries
> >>> +	 * those offloadings which are only enabled on this queue and
> >>> +	 * not enabled on all queues.
> >>> +	 */
> >>> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> >>> +
> >>> +	/*
> >>> +	 * New added offloadings for this queue are those not enabled in
> >>> +	 * rte_eth_dev_configure() and they must be per-queue type.
> >>> +	 * A pure per-port offloading can't be enabled on a queue while
> >>> +	 * disabled on another queue. A pure per-port offloading can't
> >>> +	 * be enabled for any queue as new added one if it hasn't been
> >>> +	 * enabled in rte_eth_dev_configure().
> >>> +	 */
> >>> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> >>> +	     local_conf.offloads) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Ethdev port_id=%d rx_queue_id=%d, "
> >>> +			"new added offloads 0x%"PRIx64" must be "
> >>> +			"within per-queue offload capabilities "
> >>> +			"0x%"PRIx64" in %s()\n",
> >>> +			port_id, rx_queue_id, local_conf.offloads,
> >>> +			dev_info.rx_queue_offload_capa,
> >>> +			__func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev,
> >> rx_queue_id,
> >>> +						      nb_rx_desc, socket_id,
> >>> +						      &local_conf,
> >>> +						      hairpin_conf);
> >>> +
> >>> +	return eth_err(port_id, ret);
> >>> +}
> >>> +
> >>> +int
> >>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
> >>>    		       const struct rte_eth_txconf *tx_conf) @@ -1799,6
> >> +1908,110
> >>> @@ struct rte_eth_dev *
> >>>    		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >>>    }
> >>>
> >>> +int
> >>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t
> >> tx_queue_id,
> >>> +			       uint16_t nb_tx_desc, unsigned int socket_id,
> >>> +			       const struct rte_eth_txconf *tx_conf,
> >>> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
> >> {
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_dev_info dev_info;
> >>> +	struct rte_eth_txconf local_conf;
> >>> +	void **txq;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> >> tx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> >> ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >>> +
> >>> +	rte_eth_dev_info_get(port_id, &dev_info);
> >>> +
> >>> +	/* Use default specified by driver, if nb_tx_desc is zero */
> >>> +	if (nb_tx_desc == 0) {
> >>> +		nb_tx_desc = dev_info.default_txportconf.ring_size;
> >>> +		/* If driver default is zero, fall back on EAL default */
> >>> +		if (nb_tx_desc == 0)
> >>> +			nb_tx_desc =
> >> RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
> >>> +	}
> >>> +	if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
> >>> +	    nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
> >>> +	    nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for nb_tx_desc(=%hu), "
> >>> +			       "should be: <= %hu, >= %hu, and a product of "
> >>> +			       " %hu\n",
> >>> +			       nb_tx_desc, dev_info.tx_desc_lim.nb_max,
> >>> +			       dev_info.tx_desc_lim.nb_min,
> >>> +			       dev_info.tx_desc_lim.nb_align);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		!(dev_info.dev_capa &
> >>> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> >>> +		return -EBUSY;
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		(dev->data->tx_queue_state[tx_queue_id] !=
> >>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> >>> +		return -EBUSY;
> >>> +
> >>> +	txq = dev->data->tx_queues;
> >>> +	if (txq[tx_queue_id]) {
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> >>> +		txq[tx_queue_id] = NULL;
> >>> +	}
> >>> +
> >>> +	if (tx_conf == NULL)
> >>> +		tx_conf = &dev_info.default_txconf;
> >>> +
> >>> +	local_conf = *tx_conf;
> >>> +
> >>> +	/*
> >>> +	 * If an offloading has already been enabled in
> >>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> >>> +	 * so there is no need to enable it in this queue again.
> >>> +	 * The local_conf.offloads input to underlying PMD only carries
> >>> +	 * those offloadings which are only enabled on this queue and
> >>> +	 * not enabled on all queues.
> >>> +	 */
> >>> +	local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
> >>> +
> >>> +	/*
> >>> +	 * New added offloadings for this queue are those not enabled in
> >>> +	 * rte_eth_dev_configure() and they must be per-queue type.
> >>> +	 * A pure per-port offloading can't be enabled on a queue while
> >>> +	 * disabled on another queue. A pure per-port offloading can't
> >>> +	 * be enabled for any queue as new added one if it hasn't been
> >>> +	 * enabled in rte_eth_dev_configure().
> >>> +	 */
> >>> +	if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
> >>> +	     local_conf.offloads) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Ethdev port_id=%d tx_queue_id=%d, new
> >> added "
> >>> +			       "offloads 0x%"PRIx64" must be within "
> >>> +			       "per-queue offload capabilities 0x%"PRIx64" "
> >>> +			       "in %s()\n",
> >>> +			       port_id, tx_queue_id, local_conf.offloads,
> >>> +			       dev_info.tx_queue_offload_capa,
> >>> +			       __func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
> >>> +		       (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
> >>> +			hairpin_conf));
> >>> +}
> >>> +
> >>>    void
> >>>    rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
> >> unsent,
> >>>    		void *userdata __rte_unused)
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.h
> >>> b/lib/librte_ethdev/rte_ethdev.h index 475dbda..b3b1597 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.h
> >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> >>> @@ -803,6 +803,30 @@ struct rte_eth_txconf {
> >>>    	uint64_t offloads;
> >>>    };
> >>>
> >>> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >>> +notice
> >>> + *
> >>> + * A structure used to hold hairpin peer data.
> >>> + */
> >>> +struct rte_eth_hairpin_peer {
> >>> +	uint16_t port; /**< Peer port. */
> >>> +	uint16_t queue; /**< Peer queue. */
> >>> +};
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >>> +notice
> >>> + *
> >>> + * A structure used to configure hairpin binding.
> >>> + */
> >>> +struct rte_eth_hairpin_conf {
> >>> +	uint16_t peer_n; /**< The number of peers. */
> >>> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> >> };
> >>> +
> >>>    /**
> >>>     * A structure contains information about HW descriptor ring limitations.
> >>>     */
> >>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		struct rte_mempool *mb_pool);
> >>>
> >>>    /**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >>> + notice
> >>> + *
> >>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> >>> + *
> >>> + * The function set up the selected queue to be used in hairpin.
> >>> + *
> >>> + * @param port_id
> >>> + *   The port identifier of the Ethernet device.
> >>> + * @param rx_queue_id
> >>> + *   The index of the receive queue to set up.
> >>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> >> supplied
> >>> + *   to rte_eth_dev_configure().
> >> Is any Rx queue may be setup as hairpin queue?
> >> Can it be still used for regular traffic?
> >>
> > No if a queue is used as hairpin it can't be used for normal traffic.
> > This is also why I like the idea of two different functions, in order to create
> > This distinction.
> 
> If so, do we need at least debug-level checks in Tx/Rx burst functions?
> Is it required to patch rte flow RSS action to ensure that Rx queues of
> only one kind are specified?
> What about attempt to add Rx/Tx callbacks for hairpin queues?
> 

I think the checks should be done in PMD level. Since from high level they are the 
same. Call backs for Rx/Tx doesn't make sense, since the idea is to bypass the
CPU. 

> >>> + * @param nb_rx_desc
> >>> + *   The number of receive descriptors to allocate for the receive ring.
> >> Does it still make sense for hairpin queue?
> >>
> > Yes, since it can affect memory size used by the device, and can affect
> performance.
> >
> >>> + * @param socket_id
> >>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> >>> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint
> >> for
> >>> + *   the DMA memory allocated for the receive descriptors of the ring.
> >> Is it still required to be provided for hairpin Rx queue?
> >>
> > Yes, for internal PMD structures to be allocated, but we can if pressed
> remove it.
> >
> >>> + * @param rx_conf
> >>> + *   The pointer to the configuration data to be used for the receive
> >> queue.
> >>> + *   NULL value is allowed, in which case default RX configuration
> >>> + *   will be used.
> >>> + *   The *rx_conf* structure contains an *rx_thresh* structure with the
> >> values
> >>> + *   of the Prefetch, Host, and Write-Back threshold registers of the
> >> receive
> >>> + *   ring.
> >>> + *   In addition it contains the hardware offloads features to activate using
> >>> + *   the DEV_RX_OFFLOAD_* flags.
> >>> + *   If an offloading set in rx_conf->offloads
> >>> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
> >>> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
> >>> + *   per-queue type and it is enabled for the queue.
> >>> + *   No need to repeat any bit in rx_conf->offloads which has already been
> >>> + *   enabled in rte_eth_dev_configure() at port level. An offloading
> >> enabled
> >>> + *   at port level can't be disabled at queue level.
> >> Which offloads still make sense in the case of hairpin Rx queue?
> >> What about threshhods, drop enable?
> >>
> > Drop and thresholds make sense, for example the application can state that,
> > in case of back pressure to start dropping packets in order not to affect the
> > entire nic.
> > regarding offloads mainly vlan strip or vlan insert but those can also
> > be used in rte_flow.
> > But future offloads like QoS or other maybe shared.
> 
> I'm not a fan of dead parameters which are added just to use
> the same structure. It raises too many questions on maintenance.
> Also I don't like idea to share hairpin and regular offloads.
> May be it is OK to share namespace (still unsure), but capabilities
> are definitely different and some regular offloads are simply not
> applicable to hairpin case.
> 
I agree with you I think that my suggestion above (new caps for hairpin)
solve this issue. Do you agree?
I will remove the rte_eth_txconf and only hae the hairpin_conf with some new
fields, same for the Rx, is that O.K.?


> >>> + * @param hairpin_conf
> >>> + *   The pointer to the hairpin binding configuration.
> >>> + * @return
> >>> + *   - 0: Success, receive queue correctly set up.
> >>> + *   - -EINVAL: The size of network buffers which can be allocated from the
> >>> + *      memory pool does not fit the various buffer sizes allowed by the
> >>> + *      device controller.
> >>> + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
> >>> + *      allocate network memory buffers from the memory pool when
> >>> + *      initializing receive descriptors.
> >>> + */
> >>> +__rte_experimental
> >>> +int rte_eth_rx_hairpin_queue_setup
> >>> +	(uint16_t port_id, uint16_t rx_queue_id,
> >>> +	 uint16_t nb_rx_desc, unsigned int socket_id,
> >>> +	 const struct rte_eth_rxconf *rx_conf,
> >>> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> >>> +
> >>> +/**
> >>>     * Allocate and set up a transmit queue for an Ethernet device.
> >>>     *
> >>>     * @param port_id
> >>> @@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> >> uint16_t tx_queue_id,
> >>>    		const struct rte_eth_txconf *tx_conf);
> >>>
> >>>    /**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >>> + notice
> >>> + *
> >>> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> >>> + *
> >>> + * @param port_id
> >>> + *   The port identifier of the Ethernet device.
> >>> + * @param tx_queue_id
> >>> + *   The index of the transmit queue to set up.
> >>> + *   The value must be in the range [0, nb_tx_queue - 1] previously
> >> supplied
> >>> + *   to rte_eth_dev_configure().
> >> Is any Tx queue may be setup as hairpin queue?
> >>
> > Yes just like any Rx queue.
> >
> >>> + * @param nb_tx_desc
> >>> + *   The number of transmit descriptors to allocate for the transmit ring.
> >> Is it really required for hairpin queue? Are min/max/align limits still the
> >> same?
> >>
> > The number of descriptors can effect memory and performance.
> > Regarding min/max/align I guess this depends on the implementation in the
> nic.
> 
> Again, it looks like separate dev_info-like information.
> 

Please see comments above.

> >>> + * @param socket_id
> >>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> >>> + *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
> >>> + *   the DMA memory allocated for the transmit descriptors of the ring.
> >> Does it still make sense for Tx hairpin queue?
> >>
> > Same as for the RX, it is used for internal PMD structures, but maybe on
> > other nics they can use this.
> >
> >>> + * @param tx_conf
> >>> + *   The pointer to the configuration data to be used for the transmit
> >> queue.
> >>> + *   NULL value is allowed, in which case default RX configuration
> >>> + *   will be used.
> >>> + *   The *tx_conf* structure contains the following data:
> >>> + *   - The *tx_thresh* structure with the values of the Prefetch, Host, and
> >>> + *     Write-Back threshold registers of the transmit ring.
> >>> + *     When setting Write-Back threshold to the value greater then zero,
> >>> + *     *tx_rs_thresh* value should be explicitly set to one.
> >>> + *   - The *tx_free_thresh* value indicates the [minimum] number of
> >> network
> >>> + *     buffers that must be pending in the transmit ring to trigger their
> >>> + *     [implicit] freeing by the driver transmit function.
> >>> + *   - The *tx_rs_thresh* value indicates the [minimum] number of
> >> transmit
> >>> + *     descriptors that must be pending in the transmit ring before setting
> >> the
> >>> + *     RS bit on a descriptor by the driver transmit function.
> >>> + *     The *tx_rs_thresh* value should be less or equal then
> >>> + *     *tx_free_thresh* value, and both of them should be less then
> >>> + *     *nb_tx_desc* - 3.
> >> I'm not sure that everything above makes sense for hairpin Tx queue.
> >>
> > You are right not all of them make sense,
> > But since I don't know other nics I prefer to give them those values, if they
> need them.
> > If you wish I can change the documentation.
> 
> Dead parameters are not nice.
>

See comments above.
 
> >>> + *   - The *txq_flags* member contains flags to pass to the TX queue
> >> setup
> >>> + *     function to configure the behavior of the TX queue. This should be
> >> set
> >>> + *     to 0 if no special configuration is required.
> >>> + *     This API is obsolete and will be deprecated. Applications
> >>> + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
> >>> + *     the offloads field below.
> >> There is no txq_flags for a long time already. So, I'm wondering when it was
> >> copies from rte_eth_tx_queue_setup().
> >>
> > My bad from 17.11. will fix.
> >
> >>> + *   - The *offloads* member contains Tx offloads to be enabled.
> >>> + *     If an offloading set in tx_conf->offloads
> >>> + *     hasn't been set in the input argument eth_conf->txmode.offloads
> >>> + *     to rte_eth_dev_configure(), it is a new added offloading, it must be
> >>> + *     per-queue type and it is enabled for the queue.
> >>> + *     No need to repeat any bit in tx_conf->offloads which has already
> >> been
> >>> + *     enabled in rte_eth_dev_configure() at port level. An offloading
> >> enabled
> >>> + *     at port level can't be disabled at queue level.
> >> Which offloads do really make sense and valid to use for hairpin Tx queues?
> >> Do we need separate caps for hairpin offloads?
> >>
> > I'm sure that we will need caps for example QoS but I don't know which yet.
> 
> Same as Rx.
>

Agree please see comments above.
 
> >>> + *
> >>> + *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0
> >> forces
> >>> + *     the transmit function to use default values.
> >>> + * @param hairpin_conf
> >>> + *   The hairpin binding configuration.
> >>> + *
> >>> + * @return
> >>> + *   - 0: Success, the transmit queue is correctly set up.
> >>> + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
> >>> + */
> >>> +__rte_experimental
> >>> +int rte_eth_tx_hairpin_queue_setup
> >>> +	(uint16_t port_id, uint16_t tx_queue_id,
> >>> +	 uint16_t nb_tx_desc, unsigned int socket_id,
> >>> +	 const struct rte_eth_txconf *tx_conf,
> >>> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> >>> +
> >>> +/**
> >>>     * Return the NUMA socket to which an Ethernet device is connected
> >>>     *
> >>>     * @param port_id
> >>>
> >> [snip]

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-28 15:19           ` Ori Kam
@ 2019-09-29 12:10             ` Andrew Rybchenko
  2019-10-02 12:19               ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-09-29 12:10 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

On 9/28/19 6:19 PM, Ori Kam wrote:
> Hi Andrew.
> PSB
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Thursday, September 26, 2019 8:24 PM
>> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
>> Subject: Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for
>> hairpin queue
>>
>> On 9/26/19 6:58 PM, Ori Kam wrote:
>>> Hi Andrew,
>>> Thanks for your comments PSB.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> On 9/26/19 9:28 AM, Ori Kam wrote:
>>>>> This commit introduce the RX/TX hairpin setup function.
>>>>> RX/TX should be Rx/Tx here and everywhere below.
>>>>>
>>>>> Hairpin is RX/TX queue that is used by the nic in order to offload
>>>>> wire to wire traffic.
>>>>>
>>>>> Each hairpin queue is binded to one or more queues from other type.
>>>>> For example TX hairpin queue should be binded to at least 1 RX hairpin
>>>>> queue and vice versa.
>>>> How should application find out that hairpin queues are supported?
>>> It should be stated in the release note of the DPDK, when manufacture adds
>> support for this.
>>> In addition if the application try to set hairpin queue and it fails it can mean
>> depending on the
>>> error that the hairpin is not supported.
>> I'm talking about dev_info-like information. Documentation is nice, but
>> it is not
>> very useful to implement application which works with NICs from
>> different vendors.
>>
> What if we add get hairpin capabilities function.
> We could have,  the max number of queues, if the nic support 1:n connection,
> which offloads are supported and so on. So basically create a new set of capabilities
> for hairpin this I think will also remove other concern that you have.
> What do you think?

Yes, I think an API to report capabilities would be useful.
It should be also used in setup functions in order to do checks on
generic level that setup request is OK vs caps.

>>>> How many?
>>> There is no limit to the number of hairpin queues from application all queues
>> can be hairpin queues.
>>
>> I'm pretty sure that it could be vendor specific.
>>
> Please see my answer above.
>   
>>>> How should application find out which ports/queues could be used for
>>>> pining?
>>> All ports and queues can be supported, if the application request invalid
>> combination, for example
>>> in current Mellanox implementation binding between two ports then the
>> setup function will  fail.
>>> If you would like I can add capability for this, but there are too many options.
>> For example number
>>> of queues, binding limitations all of those will be very hard to declare.
>>>
>>>
>>>> Is hair-pinning domain on device level sufficient to expose limitations?
>>>>
>>> I'm sorry but I don’t understand your question.
>> I was just trying to imagine how we could  say that we can hairpin
>> one port Rx queues to another port Tx queues.
>>
> Like I suggested above if I will add a capability function we could have
> a field that says port_binidng supported, or something else, along this line.

Not sure that I understand, but I'll take a look when submitted.

>>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>>>> ---
>>>>>     lib/librte_ethdev/rte_ethdev.c           | 213
>>>>> +++++++++++++++++++++++++++++++
>>>>>     lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
>>>>>     lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
>>>>>     lib/librte_ethdev/rte_ethdev_version.map |   4 +
>>>>>     4 files changed, 380 insertions(+)
>>>>>
>>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>>>>> b/lib/librte_ethdev/rte_ethdev.c index 30b0c78..4021f38 100644
>>>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>>>> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
>>>>>     }
>>>>>
>>>>>     int
>>>>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t
>>>>> rx_queue_id,
>>>>> +			       uint16_t nb_rx_desc, unsigned int socket_id,
>>>>> +			       const struct rte_eth_rxconf *rx_conf,
>>>>> +			       const struct rte_eth_hairpin_conf *hairpin_conf)
>>>>> Below code duplicates rte_eth_rx_queue_setup() a lot and it is very bad
>>>>> from maintenance point of view. Similar problem with Tx hairpin queue
>>>>> setup.
>>>>>
>>> I'm aware of that. The reasons I choose it are: (same goes to Tx)
>>> 1. use the same function approach, meaning to use the current  setup
>> function
>>>       the issues with this are:
>>>        * API break.
>>>        * It will have extra parameters, for example mempool will not be used
>>>           for hairpin and hairpin configuration will not be used for normal queue.
>>>           It is possible to use a struct but again API break and some fields are not
>> used.
>>>        * we are just starting with the hairpin, most likely there will be
>> modification so
>>>            it is better to have a different function.
>>>        * From application he undertand that this is a different kind of queue,
>> which shouldn't be
>>>            used by the application.
>> It does not excuse to duplicate so much code below. If we have separate
>> dev_info-like limitations for hairpin, it would make sense, but I hope that
>> it would be still possible to avoid code duplication.
>>
> We can start with the most basic implementation, which will mean that the function
> will almost be empty, when other vendors or Mellanox will require some additional
> test or code they will be able to decide if to add new code to he function, or
> extract the shared code from the standard function to a specific function, and
> use this function in both setup functions.
> What do you think?

Let's try and take a look at the code.

[snip]

>>>>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
>>>> uint16_t rx_queue_id,
>>>>>     		struct rte_mempool *mb_pool);
>>>>>
>>>>>     /**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>>>> + notice
>>>>> + *
>>>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
>>>>> + *
>>>>> + * The function set up the selected queue to be used in hairpin.
>>>>> + *
>>>>> + * @param port_id
>>>>> + *   The port identifier of the Ethernet device.
>>>>> + * @param rx_queue_id
>>>>> + *   The index of the receive queue to set up.
>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>>>> supplied
>>>>> + *   to rte_eth_dev_configure().
>>>> Is any Rx queue may be setup as hairpin queue?
>>>> Can it be still used for regular traffic?
>>>>
>>> No if a queue is used as hairpin it can't be used for normal traffic.
>>> This is also why I like the idea of two different functions, in order to create
>>> This distinction.
>> If so, do we need at least debug-level checks in Tx/Rx burst functions?
>> Is it required to patch rte flow RSS action to ensure that Rx queues of
>> only one kind are specified?
>> What about attempt to add Rx/Tx callbacks for hairpin queues?
>>
> I think the checks should be done in PMD level. Since from high level they are the
> same.

Sorry, I don't understand why. If something could be checked on generic 
level,
it should be done to avoid duplication in all drivers.

> Call backs for Rx/Tx doesn't make sense, since the idea is to bypass the CPU.

If so, I think rte_eth_add_tx_callback() should be patched to return an 
error
if specified queue is hairpin. Same for Rx.
Any other cases?

>>>>> + * @param nb_rx_desc
>>>>> + *   The number of receive descriptors to allocate for the receive ring.
>>>> Does it still make sense for hairpin queue?
>>>>
>>> Yes, since it can affect memory size used by the device, and can affect
>> performance.
>>>>> + * @param socket_id
>>>>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
>>>>> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint
>>>> for
>>>>> + *   the DMA memory allocated for the receive descriptors of the ring.
>>>> Is it still required to be provided for hairpin Rx queue?
>>>>
>>> Yes, for internal PMD structures to be allocated, but we can if pressed
>> remove it.
>>>>> + * @param rx_conf
>>>>> + *   The pointer to the configuration data to be used for the receive
>>>> queue.
>>>>> + *   NULL value is allowed, in which case default RX configuration
>>>>> + *   will be used.
>>>>> + *   The *rx_conf* structure contains an *rx_thresh* structure with the
>>>> values
>>>>> + *   of the Prefetch, Host, and Write-Back threshold registers of the
>>>> receive
>>>>> + *   ring.
>>>>> + *   In addition it contains the hardware offloads features to activate using
>>>>> + *   the DEV_RX_OFFLOAD_* flags.
>>>>> + *   If an offloading set in rx_conf->offloads
>>>>> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
>>>>> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
>>>>> + *   per-queue type and it is enabled for the queue.
>>>>> + *   No need to repeat any bit in rx_conf->offloads which has already been
>>>>> + *   enabled in rte_eth_dev_configure() at port level. An offloading
>>>> enabled
>>>>> + *   at port level can't be disabled at queue level.
>>>> Which offloads still make sense in the case of hairpin Rx queue?
>>>> What about threshhods, drop enable?
>>>>
>>> Drop and thresholds make sense, for example the application can state that,
>>> in case of back pressure to start dropping packets in order not to affect the
>>> entire nic.
>>> regarding offloads mainly vlan strip or vlan insert but those can also
>>> be used in rte_flow.
>>> But future offloads like QoS or other maybe shared.
>> I'm not a fan of dead parameters which are added just to use
>> the same structure. It raises too many questions on maintenance.
>> Also I don't like idea to share hairpin and regular offloads.
>> May be it is OK to share namespace (still unsure), but capabilities
>> are definitely different and some regular offloads are simply not
>> applicable to hairpin case.
>>
> I agree with you I think that my suggestion above (new caps for hairpin)
> solve this issue. Do you agree?
> I will remove the rte_eth_txconf and only hae the hairpin_conf with some new
> fields, same for the Rx, is that O.K.?

I think it would be better to keep only used parameters.
Anyway, it is experimental API and we can add missing parameters
when required.

[snip]

Thanks,
Andrew.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-29 12:10             ` Andrew Rybchenko
@ 2019-10-02 12:19               ` Ori Kam
  2019-10-03 13:26                 ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-02 12:19 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

Sorry it took me some time to responded, (I'm on vacation 😊)
I think we are in most cases in agreement. The only open issue is the 
checks so please see my comments below.
As soon as we get to understanding about this issue, I will start working on V2.

Thanks,
Ori
 
> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Sunday, September 29, 2019 3:11 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for
> hairpin queue
> 
> Hi Ori,
> 
> On 9/28/19 6:19 PM, Ori Kam wrote:
> > Hi Andrew.
> > PSB
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Thursday, September 26, 2019 8:24 PM
> >> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> >> Subject: Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for
> >> hairpin queue
> >>
> >> On 9/26/19 6:58 PM, Ori Kam wrote:
> >>> Hi Andrew,
> >>> Thanks for your comments PSB.
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >>>> On 9/26/19 9:28 AM, Ori Kam wrote:
> >>>>> This commit introduce the RX/TX hairpin setup function.
> >>>>> RX/TX should be Rx/Tx here and everywhere below.
> >>>>>
> >>>>> Hairpin is RX/TX queue that is used by the nic in order to offload
> >>>>> wire to wire traffic.
> >>>>>
> >>>>> Each hairpin queue is binded to one or more queues from other type.
> >>>>> For example TX hairpin queue should be binded to at least 1 RX hairpin
> >>>>> queue and vice versa.
> >>>> How should application find out that hairpin queues are supported?
> >>> It should be stated in the release note of the DPDK, when manufacture
> adds
> >> support for this.
> >>> In addition if the application try to set hairpin queue and it fails it can
> mean
> >> depending on the
> >>> error that the hairpin is not supported.
> >> I'm talking about dev_info-like information. Documentation is nice, but
> >> it is not
> >> very useful to implement application which works with NICs from
> >> different vendors.
> >>
> > What if we add get hairpin capabilities function.
> > We could have,  the max number of queues, if the nic support 1:n connection,
> > which offloads are supported and so on. So basically create a new set of
> capabilities
> > for hairpin this I think will also remove other concern that you have.
> > What do you think?
> 
> Yes, I think an API to report capabilities would be useful.
> It should be also used in setup functions in order to do checks on
> generic level that setup request is OK vs caps.
> 

Will be in my next version.

> >>>> How many?
> >>> There is no limit to the number of hairpin queues from application all
> queues
> >> can be hairpin queues.
> >>
> >> I'm pretty sure that it could be vendor specific.
> >>
> > Please see my answer above.
> >
> >>>> How should application find out which ports/queues could be used for
> >>>> pining?
> >>> All ports and queues can be supported, if the application request invalid
> >> combination, for example
> >>> in current Mellanox implementation binding between two ports then the
> >> setup function will  fail.
> >>> If you would like I can add capability for this, but there are too many
> options.
> >> For example number
> >>> of queues, binding limitations all of those will be very hard to declare.
> >>>
> >>>
> >>>> Is hair-pinning domain on device level sufficient to expose limitations?
> >>>>
> >>> I'm sorry but I don’t understand your question.
> >> I was just trying to imagine how we could  say that we can hairpin
> >> one port Rx queues to another port Tx queues.
> >>
> > Like I suggested above if I will add a capability function we could have
> > a field that says port_binidng supported, or something else, along this line.
> 
> Not sure that I understand, but I'll take a look when submitted.
> 

Thanks.

> >>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >>>>> ---
> >>>>>     lib/librte_ethdev/rte_ethdev.c           | 213
> >>>>> +++++++++++++++++++++++++++++++
> >>>>>     lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
> >>>>>     lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
> >>>>>     lib/librte_ethdev/rte_ethdev_version.map |   4 +
> >>>>>     4 files changed, 380 insertions(+)
> >>>>>
> >>>>> diff --git a/lib/librte_ethdev/rte_ethdev.c
> >>>>> b/lib/librte_ethdev/rte_ethdev.c index 30b0c78..4021f38 100644
> >>>>> --- a/lib/librte_ethdev/rte_ethdev.c
> >>>>> +++ b/lib/librte_ethdev/rte_ethdev.c
> >>>>> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
> >>>>>     }
> >>>>>
> >>>>>     int
> >>>>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t
> >>>>> rx_queue_id,
> >>>>> +			       uint16_t nb_rx_desc, unsigned int
> socket_id,
> >>>>> +			       const struct rte_eth_rxconf *rx_conf,
> >>>>> +			       const struct rte_eth_hairpin_conf
> *hairpin_conf)
> >>>>> Below code duplicates rte_eth_rx_queue_setup() a lot and it is very bad
> >>>>> from maintenance point of view. Similar problem with Tx hairpin queue
> >>>>> setup.
> >>>>>
> >>> I'm aware of that. The reasons I choose it are: (same goes to Tx)
> >>> 1. use the same function approach, meaning to use the current  setup
> >> function
> >>>       the issues with this are:
> >>>        * API break.
> >>>        * It will have extra parameters, for example mempool will not be used
> >>>           for hairpin and hairpin configuration will not be used for normal
> queue.
> >>>           It is possible to use a struct but again API break and some fields are
> not
> >> used.
> >>>        * we are just starting with the hairpin, most likely there will be
> >> modification so
> >>>            it is better to have a different function.
> >>>        * From application he undertand that this is a different kind of queue,
> >> which shouldn't be
> >>>            used by the application.
> >> It does not excuse to duplicate so much code below. If we have separate
> >> dev_info-like limitations for hairpin, it would make sense, but I hope that
> >> it would be still possible to avoid code duplication.
> >>
> > We can start with the most basic implementation, which will mean that the
> function
> > will almost be empty, when other vendors or Mellanox will require some
> additional
> > test or code they will be able to decide if to add new code to he function, or
> > extract the shared code from the standard function to a specific function, and
> > use this function in both setup functions.
> > What do you think?
> 
> Let's try and take a look at the code.
>

Thanks, 

 
> [snip]
> 
> >>>>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t
> port_id,
> >>>> uint16_t rx_queue_id,
> >>>>>     		struct rte_mempool *mb_pool);
> >>>>>
> >>>>>     /**
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without
> prior
> >>>>> + notice
> >>>>> + *
> >>>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> >>>>> + *
> >>>>> + * The function set up the selected queue to be used in hairpin.
> >>>>> + *
> >>>>> + * @param port_id
> >>>>> + *   The port identifier of the Ethernet device.
> >>>>> + * @param rx_queue_id
> >>>>> + *   The index of the receive queue to set up.
> >>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> >>>> supplied
> >>>>> + *   to rte_eth_dev_configure().
> >>>> Is any Rx queue may be setup as hairpin queue?
> >>>> Can it be still used for regular traffic?
> >>>>
> >>> No if a queue is used as hairpin it can't be used for normal traffic.
> >>> This is also why I like the idea of two different functions, in order to create
> >>> This distinction.
> >> If so, do we need at least debug-level checks in Tx/Rx burst functions?
> >> Is it required to patch rte flow RSS action to ensure that Rx queues of
> >> only one kind are specified?
> >> What about attempt to add Rx/Tx callbacks for hairpin queues?
> >>
> > I think the checks should be done in PMD level. Since from high level they are
> the
> > same.
> 
> Sorry, I don't understand why. If something could be checked on generic
> level,
> it should be done to avoid duplication in all drivers.
> 

The issue with this approach is that at the ethdev level we don't know anything about the queue.
This will mean that we will need to add extra functions to query the queue type for each PMD.
We could also assume that if to get type function exist in the pmd then the queue is always standard queue.
So my suggestion if you would like to move the checks is to add queue type enum in the ethdev level, and add
function call to query the queue type. What do you think?

> > Call backs for Rx/Tx doesn't make sense, since the idea is to bypass the CPU.
> 
> If so, I think rte_eth_add_tx_callback() should be patched to return an
> error
> if specified queue is hairpin. Same for Rx.
> Any other cases?
> 

Same answer as above.

> >>>>> + * @param nb_rx_desc
> >>>>> + *   The number of receive descriptors to allocate for the receive ring.
> >>>> Does it still make sense for hairpin queue?
> >>>>
> >>> Yes, since it can affect memory size used by the device, and can affect
> >> performance.
> >>>>> + * @param socket_id
> >>>>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> >>>>> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint
> >>>> for
> >>>>> + *   the DMA memory allocated for the receive descriptors of the ring.
> >>>> Is it still required to be provided for hairpin Rx queue?
> >>>>
> >>> Yes, for internal PMD structures to be allocated, but we can if pressed
> >> remove it.
> >>>>> + * @param rx_conf
> >>>>> + *   The pointer to the configuration data to be used for the receive
> >>>> queue.
> >>>>> + *   NULL value is allowed, in which case default RX configuration
> >>>>> + *   will be used.
> >>>>> + *   The *rx_conf* structure contains an *rx_thresh* structure with the
> >>>> values
> >>>>> + *   of the Prefetch, Host, and Write-Back threshold registers of the
> >>>> receive
> >>>>> + *   ring.
> >>>>> + *   In addition it contains the hardware offloads features to activate
> using
> >>>>> + *   the DEV_RX_OFFLOAD_* flags.
> >>>>> + *   If an offloading set in rx_conf->offloads
> >>>>> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
> >>>>> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
> >>>>> + *   per-queue type and it is enabled for the queue.
> >>>>> + *   No need to repeat any bit in rx_conf->offloads which has already
> been
> >>>>> + *   enabled in rte_eth_dev_configure() at port level. An offloading
> >>>> enabled
> >>>>> + *   at port level can't be disabled at queue level.
> >>>> Which offloads still make sense in the case of hairpin Rx queue?
> >>>> What about threshhods, drop enable?
> >>>>
> >>> Drop and thresholds make sense, for example the application can state
> that,
> >>> in case of back pressure to start dropping packets in order not to affect the
> >>> entire nic.
> >>> regarding offloads mainly vlan strip or vlan insert but those can also
> >>> be used in rte_flow.
> >>> But future offloads like QoS or other maybe shared.
> >> I'm not a fan of dead parameters which are added just to use
> >> the same structure. It raises too many questions on maintenance.
> >> Also I don't like idea to share hairpin and regular offloads.
> >> May be it is OK to share namespace (still unsure), but capabilities
> >> are definitely different and some regular offloads are simply not
> >> applicable to hairpin case.
> >>
> > I agree with you I think that my suggestion above (new caps for hairpin)
> > solve this issue. Do you agree?
> > I will remove the rte_eth_txconf and only hae the hairpin_conf with some
> new
> > fields, same for the Rx, is that O.K.?
> 
> I think it would be better to keep only used parameters.
> Anyway, it is experimental API and we can add missing parameters
> when required.
> 

Agree.

> [snip]
> 
> Thanks,
> Andrew.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-10-02 12:19               ` Ori Kam
@ 2019-10-03 13:26                 ` Andrew Rybchenko
  2019-10-03 17:46                   ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-03 13:26 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

@Thomas, @Ferruh, please, see question below.

On 10/2/19 3:19 PM, Ori Kam wrote:
> Hi Andrew,
>
> Sorry it took me some time to responded, (I'm on vacation 😊)
> I think we are in most cases in agreement. The only open issue is the
> checks so please see my comments below.
> As soon as we get to understanding about this issue, I will start working on V2.
>
> Thanks,
> Ori

[snip]

>>>>>>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
>>>>>> uint16_t rx_queue_id,
>>>>>>>      		struct rte_mempool *mb_pool);
>>>>>>>
>>>>>>>      /**
>>>>>>> + * @warning
>>>>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>>>>>> + notice
>>>>>>> + *
>>>>>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
>>>>>>> + *
>>>>>>> + * The function set up the selected queue to be used in hairpin.
>>>>>>> + *
>>>>>>> + * @param port_id
>>>>>>> + *   The port identifier of the Ethernet device.
>>>>>>> + * @param rx_queue_id
>>>>>>> + *   The index of the receive queue to set up.
>>>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
>>>>>>> + *   to rte_eth_dev_configure().
>>>>>> Is any Rx queue may be setup as hairpin queue?
>>>>>> Can it be still used for regular traffic?
>>>>>>
>>>>> No if a queue is used as hairpin it can't be used for normal traffic.
>>>>> This is also why I like the idea of two different functions, in order to create
>>>>> This distinction.
>>>> If so, do we need at least debug-level checks in Tx/Rx burst functions?
>>>> Is it required to patch rte flow RSS action to ensure that Rx queues of
>>>> only one kind are specified?
>>>> What about attempt to add Rx/Tx callbacks for hairpin queues?
>>>>
>>> I think the checks should be done in PMD level. Since from high level they are the
>>> same.
>> Sorry, I don't understand why. If something could be checked on generic level,
>> it should be done to avoid duplication in all drivers.
> The issue with this approach is that at the ethdev level we don't know anything about the queue.
> This will mean that we will need to add extra functions to query the queue type for each PMD.
> We could also assume that if to get type function exist in the pmd then the queue is always standard queue.
> So my suggestion if you would like to move the checks is to add queue type enum in the ethdev level, and add
> function call to query the queue type. What do you think?

I would consider to use dev_data rx_queue_state and tx_queue_state to
keep the information to have it directly available without extra function
calls. Or add extra information. dev_data is internal and it looks like not
a problem. What do you think?

>>> Call backs for Rx/Tx doesn't make sense, since the idea is to bypass the CPU.
>> If so, I think rte_eth_add_tx_callback() should be patched to return an
>> error
>> if specified queue is hairpin. Same for Rx.
>> Any other cases?
>>
> Same answer as above.

[snip]

Andrew.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-10-03 13:26                 ` Andrew Rybchenko
@ 2019-10-03 17:46                   ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-03 17:46 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

@Thomas Monjalon, @Ferruh Yigit

Please comment if you have any issues with my answer.

Thanks,
Ori

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Thursday, October 3, 2019 4:26 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for
> hairpin queue
> 
> Hi Ori,
> 
> @Thomas, @Ferruh, please, see question below.
> 
> On 10/2/19 3:19 PM, Ori Kam wrote:
> > Hi Andrew,
> >
> > Sorry it took me some time to responded, (I'm on vacation 😊)
> > I think we are in most cases in agreement. The only open issue is the
> > checks so please see my comments below.
> > As soon as we get to understanding about this issue, I will start working on V2.
> >
> > Thanks,
> > Ori
> 
> [snip]
> 
> >>>>>>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t
> port_id,
> >>>>>> uint16_t rx_queue_id,
> >>>>>>>      		struct rte_mempool *mb_pool);
> >>>>>>>
> >>>>>>>      /**
> >>>>>>> + * @warning
> >>>>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without
> prior
> >>>>>>> + notice
> >>>>>>> + *
> >>>>>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> >>>>>>> + *
> >>>>>>> + * The function set up the selected queue to be used in hairpin.
> >>>>>>> + *
> >>>>>>> + * @param port_id
> >>>>>>> + *   The port identifier of the Ethernet device.
> >>>>>>> + * @param rx_queue_id
> >>>>>>> + *   The index of the receive queue to set up.
> >>>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> supplied
> >>>>>>> + *   to rte_eth_dev_configure().
> >>>>>> Is any Rx queue may be setup as hairpin queue?
> >>>>>> Can it be still used for regular traffic?
> >>>>>>
> >>>>> No if a queue is used as hairpin it can't be used for normal traffic.
> >>>>> This is also why I like the idea of two different functions, in order to
> create
> >>>>> This distinction.
> >>>> If so, do we need at least debug-level checks in Tx/Rx burst functions?
> >>>> Is it required to patch rte flow RSS action to ensure that Rx queues of
> >>>> only one kind are specified?
> >>>> What about attempt to add Rx/Tx callbacks for hairpin queues?
> >>>>
> >>> I think the checks should be done in PMD level. Since from high level they
> are the
> >>> same.
> >> Sorry, I don't understand why. If something could be checked on generic
> level,
> >> it should be done to avoid duplication in all drivers.
> > The issue with this approach is that at the ethdev level we don't know
> anything about the queue.
> > This will mean that we will need to add extra functions to query the queue
> type for each PMD.
> > We could also assume that if to get type function exist in the pmd then the
> queue is always standard queue.
> > So my suggestion if you would like to move the checks is to add queue type
> enum in the ethdev level, and add
> > function call to query the queue type. What do you think?
> 
> I would consider to use dev_data rx_queue_state and tx_queue_state to
> keep the information to have it directly available without extra function
> calls. Or add extra information. dev_data is internal and it looks like not
> a problem. What do you think?
> 

I like the new state idea, it will save some memory in the dev_data, compared to having it
in the dev_data. Will also avoid extra ABI change.

> >>> Call backs for Rx/Tx doesn't make sense, since the idea is to bypass the
> CPU.
> >> If so, I think rte_eth_add_tx_callback() should be patched to return an
> >> error
> >> if specified queue is hairpin. Same for Rx.
> >> Any other cases?
> >>
> > Same answer as above.
> 
> [snip]
> 
> Andrew.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue
  2019-09-26 12:18   ` Andrew Rybchenko
       [not found]     ` <AM0PR0502MB4019A2FEADE5F9DCD0D9DDFED2860@AM0PR0502MB4019.eurprd05.prod.outlook.com>
@ 2019-10-03 18:39     ` Ray Kinsella
  1 sibling, 0 replies; 186+ messages in thread
From: Ray Kinsella @ 2019-10-03 18:39 UTC (permalink / raw)
  To: dev, Haiyue Wang

Hi

On 26/09/2019 13:18, Andrew Rybchenko wrote:
> On 9/26/19 9:28 AM, Ori Kam wrote:
>> This commit introduce the RX/TX hairpin setup function.
> 
> RX/TX should be Rx/Tx here and everywhere below.
> 
>> Hairpin is RX/TX queue that is used by the nic in order to offload
>> wire to wire traffic.
>>
>> Each hairpin queue is binded to one or more queues from other type.
>> For example TX hairpin queue should be binded to at least 1 RX hairpin
>> queue and vice versa.
> 
> How should application find out that hairpin queues are supported?

You might want to look this patch "[dpdk-dev] [PATCH v2 0/4] get Rx/Tx
packet burst mode information" from Haiyue Wang.

Where he adds a information bitmask to describe the capabilities of the PMD.

Ray K

> How many?
> How should application find out which ports/queues could be used for
> pining?
> Is hair-pinning domain on device level sufficient to expose limitations?
> 
>> Signed-off-by: Ori Kam <orika@mellanox.com>
>> ---
>>   lib/librte_ethdev/rte_ethdev.c           | 213
>> +++++++++++++++++++++++++++++++
>>   lib/librte_ethdev/rte_ethdev.h           | 145 +++++++++++++++++++++
>>   lib/librte_ethdev/rte_ethdev_core.h      |  18 +++
>>   lib/librte_ethdev/rte_ethdev_version.map |   4 +
>>   4 files changed, 380 insertions(+)
>>
>> diff --git a/lib/librte_ethdev/rte_ethdev.c
>> b/lib/librte_ethdev/rte_ethdev.c
>> index 30b0c78..4021f38 100644
>> --- a/lib/librte_ethdev/rte_ethdev.c
>> +++ b/lib/librte_ethdev/rte_ethdev.c
>> @@ -1701,6 +1701,115 @@ struct rte_eth_dev *
>>   }
>>     int
>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>> +                   uint16_t nb_rx_desc, unsigned int socket_id,
>> +                   const struct rte_eth_rxconf *rx_conf,
>> +                   const struct rte_eth_hairpin_conf *hairpin_conf)
> 
> Below code duplicates rte_eth_rx_queue_setup() a lot and it is very
> bad from maintenance point of view. Similar problem with Tx hairpin
> queue setup.
> 
>> +{
>> +    int ret;
>> +    struct rte_eth_dev *dev;
>> +    struct rte_eth_dev_info dev_info;
>> +    struct rte_eth_rxconf local_conf;
>> +    void **rxq;
>> +
>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>> +
>> +    dev = &rte_eth_devices[port_id];
>> +    if (rx_queue_id >= dev->data->nb_rx_queues) {
>> +        RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
>> +        return -EINVAL;
>> +    }
>> +
>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
>> +                -ENOTSUP);
>> +
>> +    rte_eth_dev_info_get(port_id, &dev_info);
>> +
>> +    /* Use default specified by driver, if nb_rx_desc is zero */
>> +    if (nb_rx_desc == 0) {
>> +        nb_rx_desc = dev_info.default_rxportconf.ring_size;
>> +        /* If driver default is also zero, fall back on EAL default */
>> +        if (nb_rx_desc == 0)
>> +            nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
>> +    }
>> +
>> +    if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
>> +            nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
>> +            nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
>> +
>> +        RTE_ETHDEV_LOG(ERR,
>> +                   "Invalid value for nb_rx_desc(=%hu), should be: "
>> +                   "<= %hu, >= %hu, and a product of %hu\n",
>> +            nb_rx_desc, dev_info.rx_desc_lim.nb_max,
>> +            dev_info.rx_desc_lim.nb_min,
>> +            dev_info.rx_desc_lim.nb_align);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (dev->data->dev_started &&
>> +        !(dev_info.dev_capa &
>> +            RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
>> +        return -EBUSY;
>> +
>> +    if (dev->data->dev_started &&
>> +        (dev->data->rx_queue_state[rx_queue_id] !=
>> +            RTE_ETH_QUEUE_STATE_STOPPED))
>> +        return -EBUSY;
>> +
>> +    rxq = dev->data->rx_queues;
>> +    if (rxq[rx_queue_id]) {
>> +        RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
>> +                    -ENOTSUP);
>> +        (*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>> +        rxq[rx_queue_id] = NULL;
>> +    }
>> +
>> +    if (rx_conf == NULL)
>> +        rx_conf = &dev_info.default_rxconf;
>> +
>> +    local_conf = *rx_conf;
>> +
>> +    /*
>> +     * If an offloading has already been enabled in
>> +     * rte_eth_dev_configure(), it has been enabled on all queues,
>> +     * so there is no need to enable it in this queue again.
>> +     * The local_conf.offloads input to underlying PMD only carries
>> +     * those offloadings which are only enabled on this queue and
>> +     * not enabled on all queues.
>> +     */
>> +    local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
>> +
>> +    /*
>> +     * New added offloadings for this queue are those not enabled in
>> +     * rte_eth_dev_configure() and they must be per-queue type.
>> +     * A pure per-port offloading can't be enabled on a queue while
>> +     * disabled on another queue. A pure per-port offloading can't
>> +     * be enabled for any queue as new added one if it hasn't been
>> +     * enabled in rte_eth_dev_configure().
>> +     */
>> +    if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
>> +         local_conf.offloads) {
>> +        RTE_ETHDEV_LOG(ERR,
>> +            "Ethdev port_id=%d rx_queue_id=%d, "
>> +            "new added offloads 0x%"PRIx64" must be "
>> +            "within per-queue offload capabilities "
>> +            "0x%"PRIx64" in %s()\n",
>> +            port_id, rx_queue_id, local_conf.offloads,
>> +            dev_info.rx_queue_offload_capa,
>> +            __func__);
>> +        return -EINVAL;
>> +    }
>> +
>> +    ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
>> +                              nb_rx_desc, socket_id,
>> +                              &local_conf,
>> +                              hairpin_conf);
>> +
>> +    return eth_err(port_id, ret);
>> +}
>> +
>> +int
>>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>                  uint16_t nb_tx_desc, unsigned int socket_id,
>>                  const struct rte_eth_txconf *tx_conf)
>> @@ -1799,6 +1908,110 @@ struct rte_eth_dev *
>>                  tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>>   }
>>   +int
>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>> +                   uint16_t nb_tx_desc, unsigned int socket_id,
>> +                   const struct rte_eth_txconf *tx_conf,
>> +                   const struct rte_eth_hairpin_conf *hairpin_conf)
>> +{
>> +    struct rte_eth_dev *dev;
>> +    struct rte_eth_dev_info dev_info;
>> +    struct rte_eth_txconf local_conf;
>> +    void **txq;
>> +
>> +    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>> +
>> +    dev = &rte_eth_devices[port_id];
>> +    if (tx_queue_id >= dev->data->nb_tx_queues) {
>> +        RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
>> +        return -EINVAL;
>> +    }
>> +
>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>> +    RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
>> +                -ENOTSUP);
>> +
>> +    rte_eth_dev_info_get(port_id, &dev_info);
>> +
>> +    /* Use default specified by driver, if nb_tx_desc is zero */
>> +    if (nb_tx_desc == 0) {
>> +        nb_tx_desc = dev_info.default_txportconf.ring_size;
>> +        /* If driver default is zero, fall back on EAL default */
>> +        if (nb_tx_desc == 0)
>> +            nb_tx_desc = RTE_ETH_DEV_FALLBACK_TX_RINGSIZE;
>> +    }
>> +    if (nb_tx_desc > dev_info.tx_desc_lim.nb_max ||
>> +        nb_tx_desc < dev_info.tx_desc_lim.nb_min ||
>> +        nb_tx_desc % dev_info.tx_desc_lim.nb_align != 0) {
>> +        RTE_ETHDEV_LOG(ERR,
>> +                   "Invalid value for nb_tx_desc(=%hu), "
>> +                   "should be: <= %hu, >= %hu, and a product of "
>> +                   " %hu\n",
>> +                   nb_tx_desc, dev_info.tx_desc_lim.nb_max,
>> +                   dev_info.tx_desc_lim.nb_min,
>> +                   dev_info.tx_desc_lim.nb_align);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (dev->data->dev_started &&
>> +        !(dev_info.dev_capa &
>> +          RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
>> +        return -EBUSY;
>> +
>> +    if (dev->data->dev_started &&
>> +        (dev->data->tx_queue_state[tx_queue_id] !=
>> +         RTE_ETH_QUEUE_STATE_STOPPED))
>> +        return -EBUSY;
>> +
>> +    txq = dev->data->tx_queues;
>> +    if (txq[tx_queue_id]) {
>> +        RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
>> +                    -ENOTSUP);
>> +        (*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>> +        txq[tx_queue_id] = NULL;
>> +    }
>> +
>> +    if (tx_conf == NULL)
>> +        tx_conf = &dev_info.default_txconf;
>> +
>> +    local_conf = *tx_conf;
>> +
>> +    /*
>> +     * If an offloading has already been enabled in
>> +     * rte_eth_dev_configure(), it has been enabled on all queues,
>> +     * so there is no need to enable it in this queue again.
>> +     * The local_conf.offloads input to underlying PMD only carries
>> +     * those offloadings which are only enabled on this queue and
>> +     * not enabled on all queues.
>> +     */
>> +    local_conf.offloads &= ~dev->data->dev_conf.txmode.offloads;
>> +
>> +    /*
>> +     * New added offloadings for this queue are those not enabled in
>> +     * rte_eth_dev_configure() and they must be per-queue type.
>> +     * A pure per-port offloading can't be enabled on a queue while
>> +     * disabled on another queue. A pure per-port offloading can't
>> +     * be enabled for any queue as new added one if it hasn't been
>> +     * enabled in rte_eth_dev_configure().
>> +     */
>> +    if ((local_conf.offloads & dev_info.tx_queue_offload_capa) !=
>> +         local_conf.offloads) {
>> +        RTE_ETHDEV_LOG(ERR,
>> +                   "Ethdev port_id=%d tx_queue_id=%d, new added "
>> +                   "offloads 0x%"PRIx64" must be within "
>> +                   "per-queue offload capabilities 0x%"PRIx64" "
>> +                   "in %s()\n",
>> +                   port_id, tx_queue_id, local_conf.offloads,
>> +                   dev_info.tx_queue_offload_capa,
>> +                   __func__);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return eth_err(port_id, (*dev->dev_ops->tx_hairpin_queue_setup)
>> +               (dev, tx_queue_id, nb_tx_desc, socket_id, &local_conf,
>> +            hairpin_conf));
>> +}
>> +
>>   void
>>   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
>> unsent,
>>           void *userdata __rte_unused)
>> diff --git a/lib/librte_ethdev/rte_ethdev.h
>> b/lib/librte_ethdev/rte_ethdev.h
>> index 475dbda..b3b1597 100644
>> --- a/lib/librte_ethdev/rte_ethdev.h
>> +++ b/lib/librte_ethdev/rte_ethdev.h
>> @@ -803,6 +803,30 @@ struct rte_eth_txconf {
>>       uint64_t offloads;
>>   };
>>   +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>> + *
>> + * A structure used to hold hairpin peer data.
>> + */
>> +struct rte_eth_hairpin_peer {
>> +    uint16_t port; /**< Peer port. */
>> +    uint16_t queue; /**< Peer queue. */
>> +};
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>> + *
>> + * A structure used to configure hairpin binding.
>> + */
>> +struct rte_eth_hairpin_conf {
>> +    uint16_t peer_n; /**< The number of peers. */
>> +    struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
>> +};
>> +
>>   /**
>>    * A structure contains information about HW descriptor ring
>> limitations.
>>    */
>> @@ -1769,6 +1793,60 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>           struct rte_mempool *mb_pool);
>>     /**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>> + *
>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
>> + *
>> + * The function set up the selected queue to be used in hairpin.
>> + *
>> + * @param port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param rx_queue_id
>> + *   The index of the receive queue to set up.
>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>> supplied
>> + *   to rte_eth_dev_configure().
> 
> Is any Rx queue may be setup as hairpin queue?
> Can it be still used for regular traffic?
> 
>> + * @param nb_rx_desc
>> + *   The number of receive descriptors to allocate for the receive ring.
> 
> Does it still make sense for hairpin queue?
> 
>> + * @param socket_id
>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
>> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
>> + *   the DMA memory allocated for the receive descriptors of the ring.
> 
> Is it still required to be provided for hairpin Rx queue?
> 
>> + * @param rx_conf
>> + *   The pointer to the configuration data to be used for the receive
>> queue.
>> + *   NULL value is allowed, in which case default RX configuration
>> + *   will be used.
>> + *   The *rx_conf* structure contains an *rx_thresh* structure with
>> the values
>> + *   of the Prefetch, Host, and Write-Back threshold registers of the
>> receive
>> + *   ring.
>> + *   In addition it contains the hardware offloads features to
>> activate using
>> + *   the DEV_RX_OFFLOAD_* flags.
>> + *   If an offloading set in rx_conf->offloads
>> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
>> + *   to rte_eth_dev_configure(), it is a new added offloading, it
>> must be
>> + *   per-queue type and it is enabled for the queue.
>> + *   No need to repeat any bit in rx_conf->offloads which has already
>> been
>> + *   enabled in rte_eth_dev_configure() at port level. An offloading
>> enabled
>> + *   at port level can't be disabled at queue level.
> 
> Which offloads still make sense in the case of hairpin Rx queue?
> What about threshhods, drop enable?
> 
>> + * @param hairpin_conf
>> + *   The pointer to the hairpin binding configuration.
>> + * @return
>> + *   - 0: Success, receive queue correctly set up.
>> + *   - -EINVAL: The size of network buffers which can be allocated
>> from the
>> + *      memory pool does not fit the various buffer sizes allowed by the
>> + *      device controller.
>> + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
>> + *      allocate network memory buffers from the memory pool when
>> + *      initializing receive descriptors.
>> + */
>> +__rte_experimental
>> +int rte_eth_rx_hairpin_queue_setup
>> +    (uint16_t port_id, uint16_t rx_queue_id,
>> +     uint16_t nb_rx_desc, unsigned int socket_id,
>> +     const struct rte_eth_rxconf *rx_conf,
>> +     const struct rte_eth_hairpin_conf *hairpin_conf);
>> +
>> +/**
>>    * Allocate and set up a transmit queue for an Ethernet device.
>>    *
>>    * @param port_id
>> @@ -1821,6 +1899,73 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
>> uint16_t tx_queue_id,
>>           const struct rte_eth_txconf *tx_conf);
>>     /**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>> + *
>> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
>> + *
>> + * @param port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param tx_queue_id
>> + *   The index of the transmit queue to set up.
>> + *   The value must be in the range [0, nb_tx_queue - 1] previously
>> supplied
>> + *   to rte_eth_dev_configure().
> 
> Is any Tx queue may be setup as hairpin queue?
> 
>> + * @param nb_tx_desc
>> + *   The number of transmit descriptors to allocate for the transmit
>> ring.
> 
> Is it really required for hairpin queue? Are min/max/align limits still
> the same?
> 
>> + * @param socket_id
>> + *   The *socket_id* argument is the socket identifier in case of NUMA.
>> + *   Its value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
>> + *   the DMA memory allocated for the transmit descriptors of the ring.
> 
> Does it still make sense for Tx hairpin queue?
> 
>> + * @param tx_conf
>> + *   The pointer to the configuration data to be used for the
>> transmit queue.
>> + *   NULL value is allowed, in which case default RX configuration
>> + *   will be used.
>> + *   The *tx_conf* structure contains the following data:
>> + *   - The *tx_thresh* structure with the values of the Prefetch,
>> Host, and
>> + *     Write-Back threshold registers of the transmit ring.
>> + *     When setting Write-Back threshold to the value greater then zero,
>> + *     *tx_rs_thresh* value should be explicitly set to one.
>> + *   - The *tx_free_thresh* value indicates the [minimum] number of
>> network
>> + *     buffers that must be pending in the transmit ring to trigger
>> their
>> + *     [implicit] freeing by the driver transmit function.
>> + *   - The *tx_rs_thresh* value indicates the [minimum] number of
>> transmit
>> + *     descriptors that must be pending in the transmit ring before
>> setting the
>> + *     RS bit on a descriptor by the driver transmit function.
>> + *     The *tx_rs_thresh* value should be less or equal then
>> + *     *tx_free_thresh* value, and both of them should be less then
>> + *     *nb_tx_desc* - 3.
> 
> I'm not sure that everything above makes sense for hairpin Tx queue.
> 
>> + *   - The *txq_flags* member contains flags to pass to the TX queue
>> setup
>> + *     function to configure the behavior of the TX queue. This
>> should be set
>> + *     to 0 if no special configuration is required.
>> + *     This API is obsolete and will be deprecated. Applications
>> + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
>> + *     the offloads field below.
> 
> There is no txq_flags for a long time already. So, I'm wondering when it
> was
> copies from rte_eth_tx_queue_setup().
> 
>> + *   - The *offloads* member contains Tx offloads to be enabled.
>> + *     If an offloading set in tx_conf->offloads
>> + *     hasn't been set in the input argument eth_conf->txmode.offloads
>> + *     to rte_eth_dev_configure(), it is a new added offloading, it
>> must be
>> + *     per-queue type and it is enabled for the queue.
>> + *     No need to repeat any bit in tx_conf->offloads which has
>> already been
>> + *     enabled in rte_eth_dev_configure() at port level. An
>> offloading enabled
>> + *     at port level can't be disabled at queue level.
> 
> Which offloads do really make sense and valid to use for hairpin Tx queues?
> Do we need separate caps for hairpin offloads?
> 
>> + *
>> + *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to
>> 0 forces
>> + *     the transmit function to use default values.
>> + * @param hairpin_conf
>> + *   The hairpin binding configuration.
>> + *
>> + * @return
>> + *   - 0: Success, the transmit queue is correctly set up.
>> + *   - -ENOMEM: Unable to allocate the transmit ring descriptors.
>> + */
>> +__rte_experimental
>> +int rte_eth_tx_hairpin_queue_setup
>> +    (uint16_t port_id, uint16_t tx_queue_id,
>> +     uint16_t nb_tx_desc, unsigned int socket_id,
>> +     const struct rte_eth_txconf *tx_conf,
>> +     const struct rte_eth_hairpin_conf *hairpin_conf);
>> +
>> +/**
>>    * Return the NUMA socket to which an Ethernet device is connected
>>    *
>>    * @param port_id
>>
> 
> [snip]
> 

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 00/14] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (13 preceding siblings ...)
  2019-09-26 12:32 ` [dpdk-dev] [PATCH 00/13] " Andrew Rybchenko
@ 2019-10-04 19:54 ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue Ori Kam
                     ` (13 more replies)
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                   ` (4 subsequent siblings)
  19 siblings, 14 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.


This series must be applied after series[2]

[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
[2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V2:
 - update according to comments from ML.

Ori Kam (14):
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows
  doc: add hairpin feature

 app/test-pmd/parameters.c                |  12 +
 app/test-pmd/testpmd.c                   |  62 ++++-
 app/test-pmd/testpmd.h                   |   1 +
 doc/guides/rel_notes/release_19_11.rst   |   5 +
 drivers/net/mlx5/mlx5.c                  | 162 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 125 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  73 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 lib/librte_ethdev/rte_ethdev.c           | 214 ++++++++++++++++-
 lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  27 ++-
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 23 files changed, 2503 insertions(+), 157 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-08 16:11     ` Andrew Rybchenko
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
V2:
 - update according to ML comments.

---
 lib/librte_ethdev/rte_ethdev.c           | 214 ++++++++++++++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 4 files changed, 368 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index af82360..ee8af42 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1752,12 +1752,102 @@ struct rte_eth_dev *
 		if (!dev->data->min_rx_buf_size ||
 		    dev->data->min_rx_buf_size > mbp_buf_size)
 			dev->data->min_rx_buf_size = mbp_buf_size;
+		if (dev->data->rx_queue_state[rx_queue_id] ==
+		    RTE_ETH_QUEUE_STATE_HAIRPIN)
+			dev->data->rx_queue_state[rx_queue_id] =
+				RTE_ETH_QUEUE_STATE_STOPPED;
 	}
 
 	return eth_err(port_id, ret);
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	struct rte_eth_dev_info dev_info;
+	void **rxq;
+	int i;
+	int count = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_rx_desc(=%hu), should be: "
+			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+		 RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+	if (conf->peer_n > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != -1) {
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			if (dev->data->rx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (!ret)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1851,9 +1941,97 @@ struct rte_eth_dev *
 			__func__);
 		return -EINVAL;
 	}
+	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
+					      socket_id, &local_conf);
+	if (!ret)
+		if (dev->data->tx_queue_state[tx_queue_id] ==
+		    RTE_ETH_QUEUE_STATE_HAIRPIN)
+			dev->data->tx_queue_state[tx_queue_id] =
+				RTE_ETH_QUEUE_STATE_STOPPED;
+	return eth_err(port_id, ret);
+}
+
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	struct rte_eth_dev_info dev_info;
+	void **txq;
+	int i;
+	int count = 0;
+	int ret;
 
-	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
-		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	rte_eth_dev_info_get(port_id, &dev_info);
+	rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_tx_desc(=%hu), should be: "
+			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_n > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != -1) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			if (dev->data->tx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
+		return -EBUSY;
+	if (dev->data->dev_started &&
+		(dev->data->tx_queue_state[tx_queue_id] !=
+		 RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (!ret)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
 }
 
 void
@@ -3981,12 +4159,20 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4058,6 +4244,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4065,6 +4253,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4510,6 +4705,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)
+		       (dev, cap));
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index d937fb4..29dcfea 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -804,6 +804,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supportd.
+ */
+struct rte_eth_hairpin_cap {
+	int16_t max_n_queues;
+	/**< The max number of hairpin queuesi. -1 no limit. */
+	int16_t max_rx_2_tx;
+	/**< Max number of Rx queues to be connected to one Tx queue. */
+	int16_t max_tx_2_rx;
+	/**< Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_n; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1080,6 +1120,8 @@ struct rte_eth_conf {
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
 /**< Device supports Tx queue setup after device started*/
+#define RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT 0x00000004
+/**< Device supports hairpin queues. */
 
 /*
  * If new Tx offload capabilities are defined, they also must be
@@ -1277,6 +1319,7 @@ struct rte_eth_dcb_info {
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
 
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
@@ -1771,6 +1814,34 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ * @return
+ *   - 0: Success, receive queue correctly set up.
+ *   - -EINVAL: Selected Queue can't be configured for hairpin.
+ *   - -ENOMEM: Unable to allocate the resources required for the queue.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1823,6 +1894,33 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - 0: Success, transmit queue correctly set up.
+ *   - -EINVAL: Selected Queue can't be configured for hairpin.
+ *   - -ENOMEM: Unable to allocate the resources required for the queue.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -4037,6 +4135,22 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - 0 on success, -ENOTSUP if the device doesn't support hairpin.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
@@ -4137,6 +4251,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4403,6 +4523,12 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index dcb5ae6..ef46e71 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+/**< @internal Set up a receive hairpin queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
+
 typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
 				    uint16_t rx_queue_id);
 /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
@@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 						const char *pool);
 /**< @internal Test if a port supports specific mempool ops */
 
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+/**< @internal get the hairpin capabilities. */
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -557,6 +573,8 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 	eth_rx_queue_count_t       rx_queue_count;
 	/**< Get the number of used RX descriptors. */
@@ -568,6 +586,8 @@ struct eth_dev_ops {
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
 	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
@@ -639,6 +659,9 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
 };
 
 /**
@@ -746,9 +769,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 6df42a4..77b0a86 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -283,4 +283,9 @@ EXPERIMENTAL {
 
 	# added in 19.08
 	rte_eth_read_clock;
+
+	# added in 19.11
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 02/14] net/mlx5: query hca hairpin capabilities
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 03/14] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 164df11..35eaddc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,10 @@ struct mlx5_hca_attr {
 	uint32_t tunnel_lro_vxlan:1;
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index acfe1de..b072c37 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 03/14] net/mlx5: support Rx hairpin queues
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 04/14] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 13d112e..1b84fdc 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -974,6 +974,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1040,6 +1041,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a1fdeef..97a2031 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 04/14] net/mlx5: prepare txq to work with different types
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 03/14] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 05/14] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1b84fdc..b3d3365 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -911,7 +911,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 35eaddc..fa39a44 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -645,7 +645,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 10d0ca1..f23708c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d9fd143..e1ed4eb 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -980,7 +980,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1006,8 +1006,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 05/14] net/mlx5: support Tx hairpin queues
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 04/14] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 06/14] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c           |  26 +++++
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 614 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b3d3365..d010193 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -325,6 +325,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -394,6 +397,19 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "Fail to extract pdn from PD");
 		goto error;
 	}
+	sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+	if (!sh->td) {
+		DRV_LOG(ERR, "TD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	tis_attr.transport_domain = sh->td->id;
+	sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+	if (!sh->tis) {
+		DRV_LOG(ERR, "TIS allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -425,6 +441,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -485,6 +505,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -976,6 +1000,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1043,6 +1068,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fa39a44..f4c1680 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -350,6 +350,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -591,6 +628,8 @@ struct mlx5_ibv_shared {
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -911,5 +950,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index b072c37..917bbf9 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -709,3 +709,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 3765df0..faa7996 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -666,9 +666,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1311,6 +1315,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1427,6 +1448,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1572,6 +1611,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index e1ed4eb..6b8f3da 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -953,6 +1112,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -961,6 +1121,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 06/14] net/mlx5: add get hairpin capabilities
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 05/14] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 07/14] app/testpmd: add hairpin support Ori Kam
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c        |  2 ++
 drivers/net/mlx5/mlx5.h        |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 23 +++++++++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d010193..55b891a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1026,6 +1026,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1088,6 +1089,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f4c1680..b4c0c88 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -782,7 +782,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index aa645d0..9ed1cf0 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -618,6 +618,7 @@ struct ethtool_link_settings {
 			break;
 		}
 	}
+	info->dev_capa = RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT;
 	return 0;
 }
 
@@ -2028,3 +2029,25 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	dev = (void *)dev;
+	cap->max_n_queues = -1;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 07/14] app/testpmd: add hairpin support
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 06/14] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 08/14] net/mlx5: add hairpin binding function Ori Kam
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 app/test-pmd/parameters.c | 12 +++++++++
 app/test-pmd/testpmd.c    | 62 +++++++++++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h    |  1 +
 3 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 6c78dca..16bdcc8 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -147,6 +147,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -618,6 +620,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1036,6 +1039,15 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 && check_nb_txq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_txq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f31..25241f6 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -2064,6 +2065,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_n = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2096,9 +2102,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2204,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8ebe71..a28d043 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 08/14] net/mlx5: add hairpin binding function
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 07/14] app/testpmd: add hairpin support Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b4c0c88..6f7ad9b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -188,6 +188,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 917bbf9..0243733 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index faa7996..d4084db 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1611,6 +1611,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 09/14] net/mlx5: add support for hairpin hrxq
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 08/14] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 10/14] net/mlx5: add internal tag item and action Ori Kam
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6f7ad9b..9e81045 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -711,6 +711,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -785,6 +786,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 9ed1cf0..a07d4f0 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 97a2031..a8ff8b2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint16_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 10/14] net/mlx5: add internal tag item and action
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 11/14] net/mlx5: add id generation function Ori Kam
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  54 ++++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 257 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 482f65b..00afc18 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 235bccd..0148c1b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,16 +90,17 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
-#define MLX5_FLOW_LAYER_IPIP (1u << 21)
-#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 22)
-#define MLX5_FLOW_LAYER_NVGRE (1u << 23)
+#define MLX5_FLOW_LAYER_IPIP (1u << 22)
+#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
+#define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
@@ -139,6 +177,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -172,7 +211,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 2a7e3ed..dde0831 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4640,6 +4693,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5225,8 +5366,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5541,6 +5683,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5565,8 +5713,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5712,6 +5861,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index d4084db..695578f 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -623,7 +623,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 11/14] net/mlx5: add id generation function
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 10/14] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 12/14] net/mlx5: add default flows for hairpin Ori Kam
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 55b891a..0b5c19c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -179,6 +179,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -329,7 +447,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0148c1b..1b14fb7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -495,8 +495,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 12/14] net/mlx5: add default flows for hairpin
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 11/14] net/mlx5: add id generation function Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 13/14] net/mlx5: split hairpin flows Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9e81045..c9d2ae0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -556,6 +556,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -876,6 +877,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 00afc18..33ed204 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2712,6 +2712,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1b14fb7..bb67380 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -102,6 +108,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
 #define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dde0831..2b48680 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3518,6 +3520,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3526,11 +3531,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3796,6 +3802,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5291,6 +5299,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5866,6 +5919,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 13/14] net/mlx5: split hairpin flows
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 12/14] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0b5c19c..083956b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -528,6 +528,12 @@ struct mlx5_flow_id_pool *
 		err = ENOMEM;
 		goto error;
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -567,6 +573,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -629,6 +637,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c9d2ae0..7870b22 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -574,6 +574,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -632,6 +641,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 33ed204..50e1d11 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2419,6 +2419,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2446,6 +2650,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2453,16 +2658,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2471,11 +2704,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2496,7 +2731,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2504,7 +2739,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2516,8 +2768,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2607,12 +2867,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index bb67380..90a289e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -434,6 +434,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -455,12 +457,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -504,15 +505,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 2b48680..6828bd1 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5763,7 +5763,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -5985,7 +5985,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6000,7 +6000,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6056,7 +6056,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6290,7 +6290,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a8ff8b2..c39118a 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 13/14] net/mlx5: split hairpin flows Ori Kam
@ 2019-10-04 19:54   ` Ori Kam
  2019-10-08 14:55     ` Andrew Rybchenko
  13 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-04 19:54 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, orika, jingjing.wu, stephen

This commit adds the hairpin feature to the release notes.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 doc/guides/rel_notes/release_19_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index cd4e350..4bfd418 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -87,6 +87,10 @@ New Features
 
   Added support for the ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
 
+* **Added hairpin queue.**
+
+  On supported nics, we can now setup haipin queue which will offload packets from the wire,
+  back to the wire.
 
 Removed Items
 -------------
@@ -286,4 +290,5 @@ Tested Platforms
   * Added support for VLAN push flow offload command.
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
+  * Added hairpin support.
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature Ori Kam
@ 2019-10-08 14:55     ` Andrew Rybchenko
  2019-10-10  8:24       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-08 14:55 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic; +Cc: dev, jingjing.wu, stephen

On 10/4/19 10:54 PM, Ori Kam wrote:
> This commit adds the hairpin feature to the release notes.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>
> ---
>   doc/guides/rel_notes/release_19_11.rst | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
> index cd4e350..4bfd418 100644
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -87,6 +87,10 @@ New Features
>   
>     Added support for the ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
>   
> +* **Added hairpin queue.**
> +
> +  On supported nics, we can now setup haipin queue which will offload packets from the wire,
> +  back to the wire.

One more empty line is required above.
Also I guess nics should be NICs.

>   Removed Items
>   -------------
> @@ -286,4 +290,5 @@ Tested Platforms
>     * Added support for VLAN push flow offload command.
>     * Added support for VLAN set PCP offload command.
>     * Added support for VLAN set VID offload command.
> +  * Added hairpin support.
>   


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue
  2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-08 16:11     ` Andrew Rybchenko
  2019-10-10 21:07       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-08 16:11 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

thanks for updated version. See my notes below.

There are few style notes about line breaks which are not defined in
coding style. Of course, it may be ignored.

On 10/4/19 10:54 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>

rte_eth_dev_[rt]x_queue_stop() should return error if used for hairpin 
queue.
Right now rte_eth_dev_[rt]x_queue_start() will return 0. Not sure about it.
What about rte_eth_rx_queue_info_get() and rte_eth_tx_queue_info_get()?
Any other Rx/Tx queue functions?

> ---
> V2:
>   - update according to ML comments.
>
> ---
>   lib/librte_ethdev/rte_ethdev.c           | 214 ++++++++++++++++++++++++++++++-
>   lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++++++++++
>   lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
>   lib/librte_ethdev/rte_ethdev_version.map |   5 +
>   4 files changed, 368 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index af82360..ee8af42 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -1752,12 +1752,102 @@ struct rte_eth_dev *
>   		if (!dev->data->min_rx_buf_size ||
>   		    dev->data->min_rx_buf_size > mbp_buf_size)
>   			dev->data->min_rx_buf_size = mbp_buf_size;
> +		if (dev->data->rx_queue_state[rx_queue_id] ==
> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +			dev->data->rx_queue_state[rx_queue_id] =
> +				RTE_ETH_QUEUE_STATE_STOPPED;

I don't understand it. Why is rte_eth_rx_queue_setup() changed?

>   	}
>   
>   	return eth_err(port_id, ret);
>   }
>   
>   int
> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			       uint16_t nb_rx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	int ret;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	struct rte_eth_dev_info dev_info;
> +	void **rxq;
> +	int i;
> +	int count = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> +				-ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);

Return value should be checked. It makes  hairpin_cap_get check above
unnecessary.

> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0)
> +		nb_rx_desc = cap.max_nb_desc;
> +	if (nb_rx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> +		return -EBUSY;
> +	if (dev->data->dev_started &&
> +		(dev->data->rx_queue_state[rx_queue_id] !=
> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +	if (conf->peer_n > cap.max_rx_2_tx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,
> +			       cap.max_rx_2_tx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);
> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != -1) {
> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +			if (dev->data->rx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id]) {

Please, compare with NULL (I know that rte_eth_rx_queue_setup() does 
like above).

> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> +						      nb_rx_desc, conf);
> +	if (!ret)

Please, compare with 0

> +		dev->data->rx_queue_state[rx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
>   		       const struct rte_eth_txconf *tx_conf)
> @@ -1851,9 +1941,97 @@ struct rte_eth_dev *
>   			__func__);
>   		return -EINVAL;
>   	}
> +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> +					      socket_id, &local_conf);
> +	if (!ret)
> +		if (dev->data->tx_queue_state[tx_queue_id] ==
> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +			dev->data->tx_queue_state[tx_queue_id] =
> +				RTE_ETH_QUEUE_STATE_STOPPED;

Why is it changed?

> +	return eth_err(port_id, ret);
> +}
> +
> +int
> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> +			       uint16_t nb_tx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	struct rte_eth_dev_info dev_info;
> +	void **txq;
> +	int i;
> +	int count = 0;
> +	int ret;
>   
> -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
> -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +	dev = &rte_eth_devices[port_id];
> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
> +		return -EINVAL;
> +	}
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> +				-ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	rte_eth_dev_info_get(port_id, &dev_info);

return value should be checked.

> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);

Check return value and you can rely on  hairpin_cap_get check inside.

> +	/* Use default specified by driver, if nb_tx_desc is zero */
> +	if (nb_tx_desc == 0)
> +		nb_tx_desc = cap.max_nb_desc;
> +	if (nb_tx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n > cap.max_tx_2_rx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,
> +			       cap.max_tx_2_rx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);
> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != -1) {
> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +			if (dev->data->tx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> +		return -EBUSY;
> +	if (dev->data->dev_started &&
> +		(dev->data->tx_queue_state[tx_queue_id] !=
> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +	txq = dev->data->tx_queues;
> +	if (txq[tx_queue_id]) {

Please, compare with NULL

> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> +		txq[tx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> +		(dev, tx_queue_id, nb_tx_desc, conf);
> +	if (!ret)

Please, compare with 0

> +		dev->data->tx_queue_state[tx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
>   }
>   
>   void
> @@ -3981,12 +4159,20 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
>   		rte_errno = EINVAL;
>   		return NULL;
>   	}
> +	dev = &rte_eth_devices[port_id];
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {

It looks like line break is not required above. Just to make code a bit 
shorter.

> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4058,6 +4244,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> @@ -4065,6 +4253,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return NULL;
>   	}
>   
> +	dev = &rte_eth_devices[port_id];
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {

It looks like line break is not required above. Just to make code a bit 
shorter.

> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4510,6 +4705,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   }
>   
>   int
> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> +				   struct rte_eth_hairpin_cap *cap)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> +				-ENOTSUP);

I think it would be useful to memset() cap with 0 to be sure that it is 
initialized.

> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)
> +		       (dev, cap));


Please, consider to avoid line breaks above to make code a bit shorter.
Of course, it should not exceed line length limit.

> +}
> +
> +int
>   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>   {
>   	struct rte_eth_dev *dev;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index d937fb4..29dcfea 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -804,6 +804,46 @@ struct rte_eth_txconf {
>   };
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to return the hairpin capabilities that are supportd.

supportd -> supported

> + */
> +struct rte_eth_hairpin_cap {
> +	int16_t max_n_queues;
> +	/**< The max number of hairpin queuesi. -1 no limit. */

I'm not sure that I like type difference from max_rx_queues here.
I think it would be better to use uint16_t. I would say there is point
to highlight no limit (first of all I'm not sure that it makes sense,
second, UINT16_MAX will obviously do the job in uint16_t.

Is it both Rx and Tx?

> +	int16_t max_rx_2_tx;
> +	/**< Max number of Rx queues to be connected to one Tx queue. */
> +	int16_t max_tx_2_rx;
> +	/**< Max number of Tx queues to be connected to one Rx queue. */

I would prefer to have uint16_t here as well. Mainly for consistency.

> +	uint16_t max_nb_desc; /**< The max num of descriptors. */

Is it common for Rx and Tx? What about min?

> +};
> +
> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to hold hairpin peer data.
> + */
> +struct rte_eth_hairpin_peer {
> +	uint16_t port; /**< Peer port. */
> +	uint16_t queue; /**< Peer queue. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure hairpin binding.

It should be explained what happens if Rx queue has many Tx queues.
Is packet duplicated? Or distributed?

> + */
> +struct rte_eth_hairpin_conf {
> +	uint16_t peer_n; /**< The number of peers. */
> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> +};
> +
> +/**
>    * A structure contains information about HW descriptor ring limitations.
>    */
>   struct rte_eth_desc_lim {
> @@ -1080,6 +1120,8 @@ struct rte_eth_conf {
>   /**< Device supports Rx queue setup after device started*/
>   #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>   /**< Device supports Tx queue setup after device started*/
> +#define RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT 0x00000004
> +/**< Device supports hairpin queues. */

Do we really need it? Isn't rte_eth_dev_hairpin_capability_get() returning
-ENOTSUP sufficient?

>   /*
>    * If new Tx offload capabilities are defined, they also must be
> @@ -1277,6 +1319,7 @@ struct rte_eth_dcb_info {
>    */
>   #define RTE_ETH_QUEUE_STATE_STOPPED 0
>   #define RTE_ETH_QUEUE_STATE_STARTED 1
> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
>   
>   #define RTE_ETH_ALL RTE_MAX_ETHPORTS
>   
> @@ -1771,6 +1814,34 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		struct rte_mempool *mb_pool);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> + *
> + * The function set up the selected queue to be used in hairpin.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_rx_desc
> + *   The number of receive descriptors to allocate for the receive ring.

Is it allows to specify 0 to pick driver recommended value?

> + * @param conf
> + *   The pointer to the hairpin configuration.
> + * @return
> + *   - 0: Success, receive queue correctly set up.
> + *   - -EINVAL: Selected Queue can't be configured for hairpin.
> + *   - -ENOMEM: Unable to allocate the resources required for the queue.

Please, follow return value description style similar to 
rte_eth_dev_info_get()
which is more common to the file.

> + */
> +__rte_experimental
> +int rte_eth_rx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Allocate and set up a transmit queue for an Ethernet device.
>    *
>    * @param port_id
> @@ -1823,6 +1894,33 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		const struct rte_eth_txconf *tx_conf);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param tx_queue_id
> + *   The index of the transmit queue to set up.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_tx_desc
> + *   The number of transmit descriptors to allocate for the transmit ring.

Is it allows to specify 0 to pick driver recommended value?

> + * @param conf
> + *   The hairpin configuration.
> + *
> + * @return
> + *   - 0: Success, transmit queue correctly set up.
> + *   - -EINVAL: Selected Queue can't be configured for hairpin.
> + *   - -ENOMEM: Unable to allocate the resources required for the queue.

Please, follow return value description style similar to 
rte_eth_dev_info_get()
which is more common to the file.

> + */
> +__rte_experimental
> +int rte_eth_tx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Return the NUMA socket to which an Ethernet device is connected
>    *
>    * @param port_id
> @@ -4037,6 +4135,22 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
>   void *
>   rte_eth_dev_get_sec_ctx(uint16_t port_id);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Query the device hairpin capabilities.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   Pointer to a structure that will hold the hairpin capabilities.
> + * @return
> + *   - 0 on success, -ENOTSUP if the device doesn't support hairpin.

Please, follow return value description style similar to 
rte_eth_dev_info_get()
which is more common to the file.

> + */
> +__rte_experimental
> +int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> +				       struct rte_eth_hairpin_cap *cap);
>   
>   #include <rte_ethdev_core.h>
>   
> @@ -4137,6 +4251,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
>   		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
>   		return 0;
>   	}
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
> +			       queue_id);
> +		return 0;
> +	}
>   #endif
>   	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>   				     rx_pkts, nb_pkts);
> @@ -4403,6 +4523,12 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
>   		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
>   		return 0;
>   	}
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
> +			       queue_id);
> +		return 0;
> +	}
>   #endif
>   
>   #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
> index dcb5ae6..ef46e71 100644
> --- a/lib/librte_ethdev/rte_ethdev_core.h
> +++ b/lib/librte_ethdev/rte_ethdev_core.h
> @@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    struct rte_mempool *mb_pool);
>   /**< @internal Set up a receive queue of an Ethernet device. */
>   
> +typedef int (*eth_rx_hairpin_queue_setup_t)
> +	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +	 uint16_t nb_rx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +/**< @internal Set up a receive hairpin queue of an Ethernet device. */
> +
>   typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    uint16_t tx_queue_id,
>   				    uint16_t nb_tx_desc,
> @@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    const struct rte_eth_txconf *tx_conf);
>   /**< @internal Setup a transmit queue of an Ethernet device. */
>   
> +typedef int (*eth_tx_hairpin_queue_setup_t)
> +	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +	 uint16_t nb_tx_desc,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> +/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
> +
>   typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>   				    uint16_t rx_queue_id);
>   /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> @@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
>   						const char *pool);
>   /**< @internal Test if a port supports specific mempool ops */
>   
> +typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
> +				     struct rte_eth_hairpin_cap *cap);
> +/**< @internal get the hairpin capabilities. */
> +
>   /**
>    * @internal A structure containing the functions exported by an Ethernet driver.
>    */
> @@ -557,6 +573,8 @@ struct eth_dev_ops {
>   	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
>   	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
>   	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
> +	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
> +	/**< Set up device RX hairpin queue. */
>   	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
>   	eth_rx_queue_count_t       rx_queue_count;
>   	/**< Get the number of used RX descriptors. */
> @@ -568,6 +586,8 @@ struct eth_dev_ops {
>   	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
>   	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
>   	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
> +	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> +	/**< Set up device TX hairpin queue. */
>   	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
>   	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>   
> @@ -639,6 +659,9 @@ struct eth_dev_ops {
>   
>   	eth_pool_ops_supported_t pool_ops_supported;
>   	/**< Test if a port supports specific mempool ops */
> +
> +	eth_hairpin_cap_get_t hairpin_cap_get;
> +	/**< Returns the hairpin capabilities. */
>   };
>   
>   /**
> @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
>   		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
>   		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
>   	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> -			/**< Queues state: STARTED(1) / STOPPED(0). */
> +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>   	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> -			/**< Queues state: STARTED(1) / STOPPED(0). */
> +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
>   	uint32_t dev_flags;             /**< Capabilities. */
>   	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
>   	int numa_node;                  /**< NUMA node connection. */
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index 6df42a4..77b0a86 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -283,4 +283,9 @@ EXPERIMENTAL {
>   
>   	# added in 19.08
>   	rte_eth_read_clock;
> +
> +	# added in 19.11
> +	rte_eth_rx_hairpin_queue_setup;
> +	rte_eth_tx_hairpin_queue_setup;
> +	rte_eth_dev_hairpin_capability_get;
>   };


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature
  2019-10-08 14:55     ` Andrew Rybchenko
@ 2019-10-10  8:24       ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-10  8:24 UTC (permalink / raw)
  To: Andrew Rybchenko, John McNamara, Marko Kovacevic
  Cc: dev, jingjing.wu, stephen

Hi Andrew,
PSB

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 8, 2019 5:56 PM
> To: Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic <marko.kovacevic@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature
> 
> On 10/4/19 10:54 PM, Ori Kam wrote:
> > This commit adds the hairpin feature to the release notes.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >
> > ---
> >   doc/guides/rel_notes/release_19_11.rst | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_19_11.rst
> b/doc/guides/rel_notes/release_19_11.rst
> > index cd4e350..4bfd418 100644
> > --- a/doc/guides/rel_notes/release_19_11.rst
> > +++ b/doc/guides/rel_notes/release_19_11.rst
> > @@ -87,6 +87,10 @@ New Features
> >
> >     Added support for the ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
> >
> > +* **Added hairpin queue.**
> > +
> > +  On supported nics, we can now setup haipin queue which will offload
> packets from the wire,
> > +  back to the wire.
> 
> One more empty line is required above.
> Also I guess nics should be NICs.
> 

Will fix.

> >   Removed Items
> >   -------------
> > @@ -286,4 +290,5 @@ Tested Platforms
> >     * Added support for VLAN push flow offload command.
> >     * Added support for VLAN set PCP offload command.
> >     * Added support for VLAN set VID offload command.
> > +  * Added hairpin support.
> >


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue
  2019-10-08 16:11     ` Andrew Rybchenko
@ 2019-10-10 21:07       ` Ori Kam
  2019-10-14  9:37         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-10 21:07 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

Thanks for your comments,

PSB,
Ori

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 8, 2019 7:11 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [PATCH v2 01/14] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> thanks for updated version. See my notes below.
> 
> There are few style notes about line breaks which are not defined in
> coding style. Of course, it may be ignored.
> 

I will fix what I can.

> On 10/4/19 10:54 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> 
> rte_eth_dev_[rt]x_queue_stop() should return error if used for hairpin
> queue.
> Right now rte_eth_dev_[rt]x_queue_start() will return 0. Not sure about it.
> What about rte_eth_rx_queue_info_get() and rte_eth_tx_queue_info_get()?
> Any other Rx/Tx queue functions?
> 

I will add error to both functions (Tx/Rx)
I don't see any other function (only the info_get and queue_stop)

> > ---
> > V2:
> >   - update according to ML comments.
> >
> > ---
> >   lib/librte_ethdev/rte_ethdev.c           | 214
> ++++++++++++++++++++++++++++++-
> >   lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++++++++++
> >   lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
> >   lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >   4 files changed, 368 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index af82360..ee8af42 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -1752,12 +1752,102 @@ struct rte_eth_dev *
> >   		if (!dev->data->min_rx_buf_size ||
> >   		    dev->data->min_rx_buf_size > mbp_buf_size)
> >   			dev->data->min_rx_buf_size = mbp_buf_size;
> > +		if (dev->data->rx_queue_state[rx_queue_id] ==
> > +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +			dev->data->rx_queue_state[rx_queue_id] =
> > +				RTE_ETH_QUEUE_STATE_STOPPED;
> 
> I don't understand it. Why is rte_eth_rx_queue_setup() changed?
> 

This was done so user can reset the queue back to normal queue.

> >   	}
> >
> >   	return eth_err(port_id, ret);
> >   }
> >
> >   int
> > +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > +			       uint16_t nb_rx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	int ret;
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	struct rte_eth_dev_info dev_info;
> > +	void **rxq;
> > +	int i;
> > +	int count = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> > +				-ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
> 
> Return value should be checked. It makes  hairpin_cap_get check above
> unnecessary.
> 
I will add a check.
 
> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0)
> > +		nb_rx_desc = cap.max_nb_desc;
> > +	if (nb_rx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +	if (ret != 0)
> > +		return ret;
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +	if (dev->data->dev_started &&
> > +		(dev->data->rx_queue_state[rx_queue_id] !=
> > +		 RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> > +	if (conf->peer_n > cap.max_rx_2_tx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> > +			       cap.max_rx_2_tx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != -1) {
> > +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > +			if (dev->data->rx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id]) {
> 
> Please, compare with NULL (I know that rte_eth_rx_queue_setup() does
> like above).
>

O.K. I will change, but may I ask why? If in the rte_eth_rx_queue_setup function it is written the same way,
why do you want this change?

> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> > +						      nb_rx_desc, conf);
> > +	if (!ret)
> 
> Please, compare with 0
> 

Will do, but again just for my knowledge why?

> > +		dev->data->rx_queue_state[rx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> >   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >   		       uint16_t nb_tx_desc, unsigned int socket_id,
> >   		       const struct rte_eth_txconf *tx_conf)
> > @@ -1851,9 +1941,97 @@ struct rte_eth_dev *
> >   			__func__);
> >   		return -EINVAL;
> >   	}
> > +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> > +					      socket_id, &local_conf);
> > +	if (!ret)
> > +		if (dev->data->tx_queue_state[tx_queue_id] ==
> > +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +			dev->data->tx_queue_state[tx_queue_id] =
> > +				RTE_ETH_QUEUE_STATE_STOPPED;
> 
> Why is it changed?
> 

Like in the Rx to enable switching back to normal queue.

> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> > +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> > +			       uint16_t nb_tx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	struct rte_eth_dev_info dev_info;
> > +	void **txq;
> > +	int i;
> > +	int count = 0;
> > +	int ret;
> >
> > -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
> > -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +	dev = &rte_eth_devices[port_id];
> > +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> tx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> > +				-ENOTSUP);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	rte_eth_dev_info_get(port_id, &dev_info);
> 
> return value should be checked.
> 
> > +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
> 
> Check return value and you can rely on  hairpin_cap_get check inside.
> 

Same as above I will add a check but I'm not sure I understand the second part
Just to make sure I understand, you are talking about that if I don't check
the return value I'm not allowed to check the cap right?

> > +	/* Use default specified by driver, if nb_tx_desc is zero */
> > +	if (nb_tx_desc == 0)
> > +		nb_tx_desc = cap.max_nb_desc;
> > +	if (nb_tx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n > cap.max_tx_2_rx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> > +			       cap.max_tx_2_rx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != -1) {
> > +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +			if (dev->data->tx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +	if (dev->data->dev_started &&
> > +		(dev->data->tx_queue_state[tx_queue_id] !=
> > +		 RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> > +	txq = dev->data->tx_queues;
> > +	if (txq[tx_queue_id]) {
> 
> Please, compare with NULL
> 

O.K.

> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > +		txq[tx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> > +		(dev, tx_queue_id, nb_tx_desc, conf);
> > +	if (!ret)
> 
> Please, compare with 0
> 

O.K.

> > +		dev->data->tx_queue_state[tx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> >   }
> >
> >   void
> > @@ -3981,12 +4159,20 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
> >   		rte_errno = EINVAL;
> >   		return NULL;
> >   	}
> > +	dev = &rte_eth_devices[port_id];
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> 
> It looks like line break is not required above. Just to make code a bit
> shorter.
> 

Will fix if possible.

> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4058,6 +4244,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id,
> uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> > @@ -4065,6 +4253,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return NULL;
> >   	}
> >
> > +	dev = &rte_eth_devices[port_id];
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> 
> It looks like line break is not required above. Just to make code a bit
> shorter.
> 

Will fix.

> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4510,6 +4705,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   }
> >
> >   int
> > +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > +				   struct rte_eth_hairpin_cap *cap)
> > +{
> > +	struct rte_eth_dev *dev;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> > +				-ENOTSUP);
> 
> I think it would be useful to memset() cap with 0 to be sure that it is
> initialized.
> 

O.K but if we follow your comments from above, the result is not valid if we return
error, so it is not a must, but like I said I don't care to add it.

> > +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)
> > +		       (dev, cap));
> 
> 
> Please, consider to avoid line breaks above to make code a bit shorter.
> Of course, it should not exceed line length limit.
> 

Will check.

> > +}
> > +
> > +int
> >   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >   {
> >   	struct rte_eth_dev *dev;
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index d937fb4..29dcfea 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -804,6 +804,46 @@ struct rte_eth_txconf {
> >   };
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to return the hairpin capabilities that are supportd.
> 
> supportd -> supported
>

Will fix.
 
> > + */
> > +struct rte_eth_hairpin_cap {
> > +	int16_t max_n_queues;
> > +	/**< The max number of hairpin queuesi. -1 no limit. */
> 
> I'm not sure that I like type difference from max_rx_queues here.
> I think it would be better to use uint16_t. I would say there is point
> to highlight no limit (first of all I'm not sure that it makes sense,
> second, UINT16_MAX will obviously do the job in uint16_t.
> 
> Is it both Rx and Tx?
> 
> > +	int16_t max_rx_2_tx;
> > +	/**< Max number of Rx queues to be connected to one Tx queue. */
> > +	int16_t max_tx_2_rx;
> > +	/**< Max number of Tx queues to be connected to one Rx queue. */
> 
> I would prefer to have uint16_t here as well. Mainly for consistency.
> 

Agree, will fix,

> > +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> 
> Is it common for Rx and Tx? What about min?
> 

I think min is always 1, by  definition.

> > +};
> > +
> > +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to hold hairpin peer data.
> > + */
> > +struct rte_eth_hairpin_peer {
> > +	uint16_t port; /**< Peer port. */
> > +	uint16_t queue; /**< Peer queue. */
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to configure hairpin binding.
> 
> It should be explained what happens if Rx queue has many Tx queues.
> Is packet duplicated? Or distributed?
> 

Like we said before, I don't know it depends on the Nic. 
In case of Mellanox we don't support 1 to many.

> > + */
> > +struct rte_eth_hairpin_conf {
> > +	uint16_t peer_n; /**< The number of peers. */
> > +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> > +};
> > +
> > +/**
> >    * A structure contains information about HW descriptor ring limitations.
> >    */
> >   struct rte_eth_desc_lim {
> > @@ -1080,6 +1120,8 @@ struct rte_eth_conf {
> >   /**< Device supports Rx queue setup after device started*/
> >   #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> >   /**< Device supports Tx queue setup after device started*/
> > +#define RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT 0x00000004
> > +/**< Device supports hairpin queues. */
> 
> Do we really need it? Isn't rte_eth_dev_hairpin_capability_get() returning
> -ENOTSUP sufficient?
>

I agree with you, but my thinking was that using the cap the application 
can avoid calling to function that will fail.
If you approve I will remove this cap.
 
> >   /*
> >    * If new Tx offload capabilities are defined, they also must be
> > @@ -1277,6 +1319,7 @@ struct rte_eth_dcb_info {
> >    */
> >   #define RTE_ETH_QUEUE_STATE_STOPPED 0
> >   #define RTE_ETH_QUEUE_STATE_STARTED 1
> > +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
> >
> >   #define RTE_ETH_ALL RTE_MAX_ETHPORTS
> >
> > @@ -1771,6 +1814,34 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		struct rte_mempool *mb_pool);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a hairpin receive queue for an Ethernet device.
> > + *
> > + * The function set up the selected queue to be used in hairpin.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param rx_queue_id
> > + *   The index of the receive queue to set up.
> > + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_rx_desc
> > + *   The number of receive descriptors to allocate for the receive ring.
> 
> Is it allows to specify 0 to pick driver recommended value?
> 

Yes, (assuming you are talking about the nb_rx_desc)

> > + * @param conf
> > + *   The pointer to the hairpin configuration.
> > + * @return
> > + *   - 0: Success, receive queue correctly set up.
> > + *   - -EINVAL: Selected Queue can't be configured for hairpin.
> > + *   - -ENOMEM: Unable to allocate the resources required for the queue.
> 
> Please, follow return value description style similar to
> rte_eth_dev_info_get()
> which is more common to the file.
> 

O.K

> > + */
> > +__rte_experimental
> > +int rte_eth_rx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Allocate and set up a transmit queue for an Ethernet device.
> >    *
> >    * @param port_id
> > @@ -1823,6 +1894,33 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >   		const struct rte_eth_txconf *tx_conf);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param tx_queue_id
> > + *   The index of the transmit queue to set up.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_tx_desc
> > + *   The number of transmit descriptors to allocate for the transmit ring.
> 
> Is it allows to specify 0 to pick driver recommended value?
> 

Like above yes, 

> > + * @param conf
> > + *   The hairpin configuration.
> > + *
> > + * @return
> > + *   - 0: Success, transmit queue correctly set up.
> > + *   - -EINVAL: Selected Queue can't be configured for hairpin.
> > + *   - -ENOMEM: Unable to allocate the resources required for the queue.
> 
> Please, follow return value description style similar to
> rte_eth_dev_info_get()
> which is more common to the file.
> 

Will fix.

> > + */
> > +__rte_experimental
> > +int rte_eth_tx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Return the NUMA socket to which an Ethernet device is connected
> >    *
> >    * @param port_id
> > @@ -4037,6 +4135,22 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> port_id,
> >   void *
> >   rte_eth_dev_get_sec_ctx(uint16_t port_id);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Query the device hairpin capabilities.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param cap
> > + *   Pointer to a structure that will hold the hairpin capabilities.
> > + * @return
> > + *   - 0 on success, -ENOTSUP if the device doesn't support hairpin.
> 
> Please, follow return value description style similar to
> rte_eth_dev_info_get()
> which is more common to the file.
> 

Will fix.

> > + */
> > +__rte_experimental
> > +int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > +				       struct rte_eth_hairpin_cap *cap);
> >
> >   #include <rte_ethdev_core.h>
> >
> > @@ -4137,6 +4251,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> port_id,
> >   		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> queue_id);
> >   		return 0;
> >   	}
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
> > +			       queue_id);
> > +		return 0;
> > +	}
> >   #endif
> >   	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> >   				     rx_pkts, nb_pkts);
> > @@ -4403,6 +4523,12 @@ static inline int
> rte_eth_tx_descriptor_status(uint16_t port_id,
> >   		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> queue_id);
> >   		return 0;
> >   	}
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
> > +			       queue_id);
> > +		return 0;
> > +	}
> >   #endif
> >
> >   #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > diff --git a/lib/librte_ethdev/rte_ethdev_core.h
> b/lib/librte_ethdev/rte_ethdev_core.h
> > index dcb5ae6..ef46e71 100644
> > --- a/lib/librte_ethdev/rte_ethdev_core.h
> > +++ b/lib/librte_ethdev/rte_ethdev_core.h
> > @@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct
> rte_eth_dev *dev,
> >   				    struct rte_mempool *mb_pool);
> >   /**< @internal Set up a receive queue of an Ethernet device. */
> >
> > +typedef int (*eth_rx_hairpin_queue_setup_t)
> > +	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> > +	 uint16_t nb_rx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +/**< @internal Set up a receive hairpin queue of an Ethernet device. */
> > +
> >   typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
> >   				    uint16_t tx_queue_id,
> >   				    uint16_t nb_tx_desc,
> > @@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct
> rte_eth_dev *dev,
> >   				    const struct rte_eth_txconf *tx_conf);
> >   /**< @internal Setup a transmit queue of an Ethernet device. */
> >
> > +typedef int (*eth_tx_hairpin_queue_setup_t)
> > +	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> > +	 uint16_t nb_tx_desc,
> > +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> > +/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
> > +
> >   typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >   				    uint16_t rx_queue_id);
> >   /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> > @@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct
> rte_eth_dev *dev,
> >   						const char *pool);
> >   /**< @internal Test if a port supports specific mempool ops */
> >
> > +typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
> > +				     struct rte_eth_hairpin_cap *cap);
> > +/**< @internal get the hairpin capabilities. */
> > +
> >   /**
> >    * @internal A structure containing the functions exported by an Ethernet
> driver.
> >    */
> > @@ -557,6 +573,8 @@ struct eth_dev_ops {
> >   	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
> >   	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
> >   	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX
> queue. */
> > +	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
> > +	/**< Set up device RX hairpin queue. */
> >   	eth_queue_release_t        rx_queue_release; /**< Release RX queue.
> */
> >   	eth_rx_queue_count_t       rx_queue_count;
> >   	/**< Get the number of used RX descriptors. */
> > @@ -568,6 +586,8 @@ struct eth_dev_ops {
> >   	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue
> interrupt. */
> >   	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue
> interrupt. */
> >   	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX
> queue. */
> > +	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> > +	/**< Set up device TX hairpin queue. */
> >   	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
> >   	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >
> > @@ -639,6 +659,9 @@ struct eth_dev_ops {
> >
> >   	eth_pool_ops_supported_t pool_ops_supported;
> >   	/**< Test if a port supports specific mempool ops */
> > +
> > +	eth_hairpin_cap_get_t hairpin_cap_get;
> > +	/**< Returns the hairpin capabilities. */
> >   };
> >
> >   /**
> > @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
> >   		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0).
> */
> >   		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
> >   	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> > -			/**< Queues state: STARTED(1) / STOPPED(0). */
> > +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
> >   	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> > -			/**< Queues state: STARTED(1) / STOPPED(0). */
> > +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
> >   	uint32_t dev_flags;             /**< Capabilities. */
> >   	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
> >   	int numa_node;                  /**< NUMA node connection. */
> > diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> b/lib/librte_ethdev/rte_ethdev_version.map
> > index 6df42a4..77b0a86 100644
> > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > @@ -283,4 +283,9 @@ EXPERIMENTAL {
> >
> >   	# added in 19.08
> >   	rte_eth_read_clock;
> > +
> > +	# added in 19.11
> > +	rte_eth_rx_hairpin_queue_setup;
> > +	rte_eth_tx_hairpin_queue_setup;
> > +	rte_eth_dev_hairpin_capability_get;
> >   };


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue
  2019-10-10 21:07       ` Ori Kam
@ 2019-10-14  9:37         ` Andrew Rybchenko
  2019-10-14 10:19           ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-14  9:37 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

see my answers below.

On 10/11/19 12:07 AM, Ori Kam wrote:
> Hi Andrew,
>
> Thanks for your comments,
>
> PSB,
> Ori
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Tuesday, October 8, 2019 7:11 PM
>> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
>> Subject: Re: [PATCH v2 01/14] ethdev: add support for hairpin queue
>>
>> Hi Ori,
>>
>> thanks for updated version. See my notes below.
>>
>> There are few style notes about line breaks which are not defined in
>> coding style. Of course, it may be ignored.
>>
> I will fix what I can.
>
>> On 10/4/19 10:54 PM, Ori Kam wrote:
>>> This commit introduce hairpin queue type.
>>>
>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>> It is used to offload traffic coming from the wire and redirect it back
>>> to the wire.
>>>
>>> There are 3 new functions:
>>> - rte_eth_dev_hairpin_capability_get
>>> - rte_eth_rx_hairpin_queue_setup
>>> - rte_eth_tx_hairpin_queue_setup
>>>
>>> In order to use the queue, there is a need to create rte_flow
>>> with queue / RSS action that targets one or more of the Rx queues.
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>> rte_eth_dev_[rt]x_queue_stop() should return error if used for hairpin
>> queue.
>> Right now rte_eth_dev_[rt]x_queue_start() will return 0. Not sure about it.
>> What about rte_eth_rx_queue_info_get() and rte_eth_tx_queue_info_get()?
>> Any other Rx/Tx queue functions?
>>
> I will add error to both functions (Tx/Rx)
> I don't see any other function (only the info_get and queue_stop)
>
>>> ---
>>> V2:
>>>    - update according to ML comments.
>>>
>>> ---
>>>    lib/librte_ethdev/rte_ethdev.c           | 214
>> ++++++++++++++++++++++++++++++-
>>>    lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++++++++++
>>>    lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
>>>    lib/librte_ethdev/rte_ethdev_version.map |   5 +
>>>    4 files changed, 368 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>>> index af82360..ee8af42 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -1752,12 +1752,102 @@ struct rte_eth_dev *
>>>    		if (!dev->data->min_rx_buf_size ||
>>>    		    dev->data->min_rx_buf_size > mbp_buf_size)
>>>    			dev->data->min_rx_buf_size = mbp_buf_size;
>>> +		if (dev->data->rx_queue_state[rx_queue_id] ==
>>> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +			dev->data->rx_queue_state[rx_queue_id] =
>>> +				RTE_ETH_QUEUE_STATE_STOPPED;
>> I don't understand it. Why is rte_eth_rx_queue_setup() changed?
>>
> This was done so user can reset the queue back to normal queue.

I think it should be done in queue release.

>>>    	}
>>>
>>>    	return eth_err(port_id, ret);
>>>    }
>>>
>>>    int
>>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>>> +			       uint16_t nb_rx_desc,
>>> +			       const struct rte_eth_hairpin_conf *conf)
>>> +{
>>> +	int ret;
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_hairpin_cap cap;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	void **rxq;
>>> +	int i;
>>> +	int count = 0;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>> rx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
>> ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
>>> +				-ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>>> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
>> Return value should be checked. It makes  hairpin_cap_get check above
>> unnecessary.
>>
> I will add a check.
>   
>>> +	/* Use default specified by driver, if nb_rx_desc is zero */
>>> +	if (nb_rx_desc == 0)
>>> +		nb_rx_desc = cap.max_nb_desc;
>>> +	if (nb_rx_desc > cap.max_nb_desc) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
>>> +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
>>> +		return -EINVAL;
>>> +	}
>>> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
>>> +	if (ret != 0)
>>> +		return ret;
>>> +	if (dev->data->dev_started &&
>>> +		!(dev_info.dev_capa &
>>> +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
>>> +		return -EBUSY;
>>> +	if (dev->data->dev_started &&
>>> +		(dev->data->rx_queue_state[rx_queue_id] !=
>>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
>>> +		return -EBUSY;
>>> +	if (conf->peer_n > cap.max_rx_2_tx) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for number of peers(=%hu), "
>>> +			       "should be: <= %hu", conf->peer_n,
>>> +			       cap.max_rx_2_tx);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_n == 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for number of peers(=%hu), "
>>> +			       "should be: > 0", conf->peer_n);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (cap.max_n_queues != -1) {
>>> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>> +			if (dev->data->rx_queue_state[i] ==
>>> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +				count++;
>>> +		}
>>> +		if (count > cap.max_n_queues) {
>>> +			RTE_ETHDEV_LOG(ERR,
>>> +				       "To many Rx hairpin queues %d", count);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	rxq = dev->data->rx_queues;
>>> +	if (rxq[rx_queue_id]) {
>> Please, compare with NULL (I know that rte_eth_rx_queue_setup() does
>> like above).
>>
> O.K. I will change, but may I ask why? If in the rte_eth_rx_queue_setup function it is written the same way,
> why do you want this change?

https://doc.dpdk.org/guides/contributing/coding_style.html#null-pointers

>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>>> +		rxq[rx_queue_id] = NULL;
>>> +	}
>>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
>>> +						      nb_rx_desc, conf);
>>> +	if (!ret)
>> Please, compare with 0
>>
> Will do, but again just for my knowledge why?

https://doc.dpdk.org/guides/contributing/coding_style.html#function-calls

>>> +		dev->data->rx_queue_state[rx_queue_id] =
>>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
>>> +	return eth_err(port_id, ret);
>>> +}
>>> +
>>> +int
>>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
>>>    		       const struct rte_eth_txconf *tx_conf)
>>> @@ -1851,9 +1941,97 @@ struct rte_eth_dev *
>>>    			__func__);
>>>    		return -EINVAL;
>>>    	}
>>> +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
>>> +					      socket_id, &local_conf);
>>> +	if (!ret)
>>> +		if (dev->data->tx_queue_state[tx_queue_id] ==
>>> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +			dev->data->tx_queue_state[tx_queue_id] =
>>> +				RTE_ETH_QUEUE_STATE_STOPPED;
>> Why is it changed?
>>
> Like in the Rx to enable switching back to normal queue.
>
>>> +	return eth_err(port_id, ret);
>>> +}
>>> +
>>> +int
>>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>> +			       uint16_t nb_tx_desc,
>>> +			       const struct rte_eth_hairpin_conf *conf)
>>> +{
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_hairpin_cap cap;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	void **txq;
>>> +	int i;
>>> +	int count = 0;
>>> +	int ret;
>>>
>>> -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
>>> -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>> tx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
>> ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
>>> +				-ENOTSUP);
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>>> +	rte_eth_dev_info_get(port_id, &dev_info);
>> return value should be checked.
>>
>>> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
>> Check return value and you can rely on  hairpin_cap_get check inside.
>>
> Same as above I will add a check but I'm not sure I understand the second part
> Just to make sure I understand, you are talking about that if I don't check
> the return value I'm not allowed to check the cap right?

It is about dev_ops->hairpin_cap_get vs NULL check.
It is checked inside rte_eth_dev_hairpin_capability_get() and we should
not duplicate the check above.

>>> +	/* Use default specified by driver, if nb_tx_desc is zero */
>>> +	if (nb_tx_desc == 0)
>>> +		nb_tx_desc = cap.max_nb_desc;
>>> +	if (nb_tx_desc > cap.max_nb_desc) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for nb_tx_desc(=%hu), should be: "
>>> +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_n > cap.max_tx_2_rx) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for number of peers(=%hu), "
>>> +			       "should be: <= %hu", conf->peer_n,
>>> +			       cap.max_tx_2_rx);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_n == 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			       "Invalid value for number of peers(=%hu), "
>>> +			       "should be: > 0", conf->peer_n);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (cap.max_n_queues != -1) {
>>> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
>>> +			if (dev->data->tx_queue_state[i] ==
>>> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +				count++;
>>> +		}
>>> +		if (count > cap.max_n_queues) {
>>> +			RTE_ETHDEV_LOG(ERR,
>>> +				       "To many Rx hairpin queues %d", count);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	if (dev->data->dev_started &&
>>> +		!(dev_info.dev_capa &
>>> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
>>> +		return -EBUSY;
>>> +	if (dev->data->dev_started &&
>>> +		(dev->data->tx_queue_state[tx_queue_id] !=
>>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
>>> +		return -EBUSY;
>>> +	txq = dev->data->tx_queues;
>>> +	if (txq[tx_queue_id]) {
>> Please, compare with NULL
>>
> O.K.
>
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>>> +		txq[tx_queue_id] = NULL;
>>> +	}
>>> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
>>> +		(dev, tx_queue_id, nb_tx_desc, conf);
>>> +	if (!ret)
>> Please, compare with 0
>>
> O.K.
>
>>> +		dev->data->tx_queue_state[tx_queue_id] =
>>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
>>> +	return eth_err(port_id, ret);
>>>    }
>>>
>>>    void
>>> @@ -3981,12 +4159,20 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    	rte_errno = ENOTSUP;
>>>    	return NULL;
>>>    #endif
>>> +	struct rte_eth_dev *dev;
>>> +
>>>    	/* check input parameters */
>>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>>>    		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
>>>    		rte_errno = EINVAL;
>>>    		return NULL;
>>>    	}
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (dev->data->rx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
>> It looks like line break is not required above. Just to make code a bit
>> shorter.
>>
> Will fix if possible.
>
>>> +		rte_errno = EINVAL;
>>> +		return NULL;
>>> +	}
>>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>>>
>>>    	if (cb == NULL) {
>>> @@ -4058,6 +4244,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id,
>> uint16_t queue_idx,
>>>    	rte_errno = ENOTSUP;
>>>    	return NULL;
>>>    #endif
>>> +	struct rte_eth_dev *dev;
>>> +
>>>    	/* check input parameters */
>>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>>>    		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
>>> @@ -4065,6 +4253,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    		return NULL;
>>>    	}
>>>
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (dev->data->tx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
>> It looks like line break is not required above. Just to make code a bit
>> shorter.
>>
> Will fix.
>
>>> +		rte_errno = EINVAL;
>>> +		return NULL;
>>> +	}
>>> +
>>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>>>
>>>    	if (cb == NULL) {
>>> @@ -4510,6 +4705,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    }
>>>
>>>    int
>>> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>>> +				   struct rte_eth_hairpin_cap *cap)
>>> +{
>>> +	struct rte_eth_dev *dev;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
>>> +				-ENOTSUP);
>> I think it would be useful to memset() cap with 0 to be sure that it is
>> initialized.
>>
> O.K but if we follow your comments from above, the result is not valid if we return
> error, so it is not a must, but like I said I don't care to add it.
>
>>> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)
>>> +		       (dev, cap));
>>
>> Please, consider to avoid line breaks above to make code a bit shorter.
>> Of course, it should not exceed line length limit.
>>
> Will check.
>
>>> +}
>>> +
>>> +int
>>>    rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>>>    {
>>>    	struct rte_eth_dev *dev;
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>> index d937fb4..29dcfea 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -804,6 +804,46 @@ struct rte_eth_txconf {
>>>    };
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * A structure used to return the hairpin capabilities that are supportd.
>> supportd -> supported
>>
> Will fix.
>   
>>> + */
>>> +struct rte_eth_hairpin_cap {
>>> +	int16_t max_n_queues;
>>> +	/**< The max number of hairpin queuesi. -1 no limit. */
>> I'm not sure that I like type difference from max_rx_queues here.
>> I think it would be better to use uint16_t. I would say there is point
>> to highlight no limit (first of all I'm not sure that it makes sense,
>> second, UINT16_MAX will obviously do the job in uint16_t.
>>
>> Is it both Rx and Tx?
>>
>>> +	int16_t max_rx_2_tx;
>>> +	/**< Max number of Rx queues to be connected to one Tx queue. */
>>> +	int16_t max_tx_2_rx;
>>> +	/**< Max number of Tx queues to be connected to one Rx queue. */
>> I would prefer to have uint16_t here as well. Mainly for consistency.
>>
> Agree, will fix,
>
>>> +	uint16_t max_nb_desc; /**< The max num of descriptors. */
>> Is it common for Rx and Tx? What about min?
>>
> I think min is always 1, by  definition.

OK

>>> +};
>>> +
>>> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * A structure used to hold hairpin peer data.
>>> + */
>>> +struct rte_eth_hairpin_peer {
>>> +	uint16_t port; /**< Peer port. */
>>> +	uint16_t queue; /**< Peer queue. */
>>> +};
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * A structure used to configure hairpin binding.
>> It should be explained what happens if Rx queue has many Tx queues.
>> Is packet duplicated? Or distributed?
>>
> Like we said before, I don't know it depends on the Nic.
> In case of Mellanox we don't support 1 to many.

I see, but what application should expect?

>>> + */
>>> +struct rte_eth_hairpin_conf {
>>> +	uint16_t peer_n; /**< The number of peers. */
>>> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
>>> +};
>>> +
>>> +/**
>>>     * A structure contains information about HW descriptor ring limitations.
>>>     */
>>>    struct rte_eth_desc_lim {
>>> @@ -1080,6 +1120,8 @@ struct rte_eth_conf {
>>>    /**< Device supports Rx queue setup after device started*/
>>>    #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>>>    /**< Device supports Tx queue setup after device started*/
>>> +#define RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT 0x00000004
>>> +/**< Device supports hairpin queues. */
>> Do we really need it? Isn't rte_eth_dev_hairpin_capability_get() returning
>> -ENOTSUP sufficient?
>>
> I agree with you, but my thinking was that using the cap the application
> can avoid calling to function that will fail.
> If you approve I will remove this cap.

Runtime setup flags are required since these functions are always
supported, but it is not alway possible to use these function when
device is started. In this case I think there is no necessity to add
the capability since we have no similar caps for anything else.

>>>    /*
>>>     * If new Tx offload capabilities are defined, they also must be
>>> @@ -1277,6 +1319,7 @@ struct rte_eth_dcb_info {
>>>     */
>>>    #define RTE_ETH_QUEUE_STATE_STOPPED 0
>>>    #define RTE_ETH_QUEUE_STATE_STARTED 1
>>> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
>>>
>>>    #define RTE_ETH_ALL RTE_MAX_ETHPORTS
>>>
>>> @@ -1771,6 +1814,34 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		struct rte_mempool *mb_pool);
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
>>> + *
>>> + * The function set up the selected queue to be used in hairpin.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param rx_queue_id
>>> + *   The index of the receive queue to set up.
>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
>>> + *   to rte_eth_dev_configure().
>>> + * @param nb_rx_desc
>>> + *   The number of receive descriptors to allocate for the receive ring.
>> Is it allows to specify 0 to pick driver recommended value?
>>
> Yes, (assuming you are talking about the nb_rx_desc)

Please, highlight it in the description.

>>> + * @param conf
>>> + *   The pointer to the hairpin configuration.
>>> + * @return
>>> + *   - 0: Success, receive queue correctly set up.
>>> + *   - -EINVAL: Selected Queue can't be configured for hairpin.
>>> + *   - -ENOMEM: Unable to allocate the resources required for the queue.
>> Please, follow return value description style similar to
>> rte_eth_dev_info_get()
>> which is more common to the file.
>>
> O.K
>
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_rx_hairpin_queue_setup
>>> +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
>>> +	 const struct rte_eth_hairpin_conf *conf);
>>> +
>>> +/**
>>>     * Allocate and set up a transmit queue for an Ethernet device.
>>>     *
>>>     * @param port_id
>>> @@ -1823,6 +1894,33 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
>> uint16_t tx_queue_id,
>>>    		const struct rte_eth_txconf *tx_conf);
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param tx_queue_id
>>> + *   The index of the transmit queue to set up.
>>> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
>>> + *   to rte_eth_dev_configure().
>>> + * @param nb_tx_desc
>>> + *   The number of transmit descriptors to allocate for the transmit ring.
>> Is it allows to specify 0 to pick driver recommended value?
>>
> Like above yes,
>
>>> + * @param conf
>>> + *   The hairpin configuration.
>>> + *
>>> + * @return
>>> + *   - 0: Success, transmit queue correctly set up.
>>> + *   - -EINVAL: Selected Queue can't be configured for hairpin.
>>> + *   - -ENOMEM: Unable to allocate the resources required for the queue.
>> Please, follow return value description style similar to
>> rte_eth_dev_info_get()
>> which is more common to the file.
>>
> Will fix.
>
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_tx_hairpin_queue_setup
>>> +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
>>> +	 const struct rte_eth_hairpin_conf *conf);
>>> +
>>> +/**
>>>     * Return the NUMA socket to which an Ethernet device is connected
>>>     *
>>>     * @param port_id
>>> @@ -4037,6 +4135,22 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>> port_id,
>>>    void *
>>>    rte_eth_dev_get_sec_ctx(uint16_t port_id);
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * Query the device hairpin capabilities.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param cap
>>> + *   Pointer to a structure that will hold the hairpin capabilities.
>>> + * @return
>>> + *   - 0 on success, -ENOTSUP if the device doesn't support hairpin.
>> Please, follow return value description style similar to
>> rte_eth_dev_info_get()
>> which is more common to the file.
>>
> Will fix.
>
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>>> +				       struct rte_eth_hairpin_cap *cap);
>>>
>>>    #include <rte_ethdev_core.h>
>>>
>>> @@ -4137,6 +4251,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>> port_id,
>>>    		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>> queue_id);
>>>    		return 0;
>>>    	}
>>> +	if (dev->data->rx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
>>> +		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
>>> +			       queue_id);
>>> +		return 0;
>>> +	}
>>>    #endif
>>>    	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>    				     rx_pkts, nb_pkts);
>>> @@ -4403,6 +4523,12 @@ static inline int
>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>    		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>> queue_id);
>>>    		return 0;
>>>    	}
>>> +	if (dev->data->tx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
>>> +		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
>>> +			       queue_id);
>>> +		return 0;
>>> +	}
>>>    #endif
>>>
>>>    #ifdef RTE_ETHDEV_RXTX_CALLBACKS
>>> diff --git a/lib/librte_ethdev/rte_ethdev_core.h
>> b/lib/librte_ethdev/rte_ethdev_core.h
>>> index dcb5ae6..ef46e71 100644
>>> --- a/lib/librte_ethdev/rte_ethdev_core.h
>>> +++ b/lib/librte_ethdev/rte_ethdev_core.h
>>> @@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct
>> rte_eth_dev *dev,
>>>    				    struct rte_mempool *mb_pool);
>>>    /**< @internal Set up a receive queue of an Ethernet device. */
>>>
>>> +typedef int (*eth_rx_hairpin_queue_setup_t)
>>> +	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
>>> +	 uint16_t nb_rx_desc,
>>> +	 const struct rte_eth_hairpin_conf *conf);
>>> +/**< @internal Set up a receive hairpin queue of an Ethernet device. */
>>> +
>>>    typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>>>    				    uint16_t tx_queue_id,
>>>    				    uint16_t nb_tx_desc,
>>> @@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct
>> rte_eth_dev *dev,
>>>    				    const struct rte_eth_txconf *tx_conf);
>>>    /**< @internal Setup a transmit queue of an Ethernet device. */
>>>
>>> +typedef int (*eth_tx_hairpin_queue_setup_t)
>>> +	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
>>> +	 uint16_t nb_tx_desc,
>>> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
>>> +/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
>>> +
>>>    typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>>>    				    uint16_t rx_queue_id);
>>>    /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
>>> @@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct
>> rte_eth_dev *dev,
>>>    						const char *pool);
>>>    /**< @internal Test if a port supports specific mempool ops */
>>>
>>> +typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
>>> +				     struct rte_eth_hairpin_cap *cap);
>>> +/**< @internal get the hairpin capabilities. */
>>> +
>>>    /**
>>>     * @internal A structure containing the functions exported by an Ethernet
>> driver.
>>>     */
>>> @@ -557,6 +573,8 @@ struct eth_dev_ops {
>>>    	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
>>>    	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
>>>    	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX
>> queue. */
>>> +	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
>>> +	/**< Set up device RX hairpin queue. */
>>>    	eth_queue_release_t        rx_queue_release; /**< Release RX queue.
>> */
>>>    	eth_rx_queue_count_t       rx_queue_count;
>>>    	/**< Get the number of used RX descriptors. */
>>> @@ -568,6 +586,8 @@ struct eth_dev_ops {
>>>    	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue
>> interrupt. */
>>>    	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue
>> interrupt. */
>>>    	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX
>> queue. */
>>> +	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
>>> +	/**< Set up device TX hairpin queue. */
>>>    	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
>>>    	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>>>
>>> @@ -639,6 +659,9 @@ struct eth_dev_ops {
>>>
>>>    	eth_pool_ops_supported_t pool_ops_supported;
>>>    	/**< Test if a port supports specific mempool ops */
>>> +
>>> +	eth_hairpin_cap_get_t hairpin_cap_get;
>>> +	/**< Returns the hairpin capabilities. */
>>>    };
>>>
>>>    /**
>>> @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
>>>    		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0).
>> */
>>>    		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
>>>    	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>> -			/**< Queues state: STARTED(1) / STOPPED(0). */
>>> +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>>>    	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>> -			/**< Queues state: STARTED(1) / STOPPED(0). */
>>> +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
>>>    	uint32_t dev_flags;             /**< Capabilities. */
>>>    	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
>>>    	int numa_node;                  /**< NUMA node connection. */
>>> diff --git a/lib/librte_ethdev/rte_ethdev_version.map
>> b/lib/librte_ethdev/rte_ethdev_version.map
>>> index 6df42a4..77b0a86 100644
>>> --- a/lib/librte_ethdev/rte_ethdev_version.map
>>> +++ b/lib/librte_ethdev/rte_ethdev_version.map
>>> @@ -283,4 +283,9 @@ EXPERIMENTAL {
>>>
>>>    	# added in 19.08
>>>    	rte_eth_read_clock;
>>> +
>>> +	# added in 19.11
>>> +	rte_eth_rx_hairpin_queue_setup;
>>> +	rte_eth_tx_hairpin_queue_setup;
>>> +	rte_eth_dev_hairpin_capability_get;
>>>    };


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue
  2019-10-14  9:37         ` Andrew Rybchenko
@ 2019-10-14 10:19           ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-14 10:19 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Thanks for your comments,

I will start working on V3

Ori

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, October 14, 2019 12:37 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [PATCH v2 01/14] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> see my answers below.
> 
> On 10/11/19 12:07 AM, Ori Kam wrote:
> > Hi Andrew,
> >
> > Thanks for your comments,
> >
> > PSB,
> > Ori
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Tuesday, October 8, 2019 7:11 PM
> >> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> >> Subject: Re: [PATCH v2 01/14] ethdev: add support for hairpin queue
> >>
> >> Hi Ori,
> >>
> >> thanks for updated version. See my notes below.
> >>
> >> There are few style notes about line breaks which are not defined in
> >> coding style. Of course, it may be ignored.
> >>
> > I will fix what I can.
> >
> >> On 10/4/19 10:54 PM, Ori Kam wrote:
> >>> This commit introduce hairpin queue type.
> >>>
> >>> The hairpin queue in build from Rx queue binded to Tx queue.
> >>> It is used to offload traffic coming from the wire and redirect it back
> >>> to the wire.
> >>>
> >>> There are 3 new functions:
> >>> - rte_eth_dev_hairpin_capability_get
> >>> - rte_eth_rx_hairpin_queue_setup
> >>> - rte_eth_tx_hairpin_queue_setup
> >>>
> >>> In order to use the queue, there is a need to create rte_flow
> >>> with queue / RSS action that targets one or more of the Rx queues.
> >>>
> >>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >> rte_eth_dev_[rt]x_queue_stop() should return error if used for hairpin
> >> queue.
> >> Right now rte_eth_dev_[rt]x_queue_start() will return 0. Not sure about it.
> >> What about rte_eth_rx_queue_info_get() and rte_eth_tx_queue_info_get()?
> >> Any other Rx/Tx queue functions?
> >>
> > I will add error to both functions (Tx/Rx)
> > I don't see any other function (only the info_get and queue_stop)
> >
> >>> ---
> >>> V2:
> >>>    - update according to ML comments.
> >>>
> >>> ---
> >>>    lib/librte_ethdev/rte_ethdev.c           | 214
> >> ++++++++++++++++++++++++++++++-
> >>>    lib/librte_ethdev/rte_ethdev.h           | 126 ++++++++++++++++++
> >>>    lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
> >>>    lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >>>    4 files changed, 368 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> >>> index af82360..ee8af42 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.c
> >>> +++ b/lib/librte_ethdev/rte_ethdev.c
> >>> @@ -1752,12 +1752,102 @@ struct rte_eth_dev *
> >>>    		if (!dev->data->min_rx_buf_size ||
> >>>    		    dev->data->min_rx_buf_size > mbp_buf_size)
> >>>    			dev->data->min_rx_buf_size = mbp_buf_size;
> >>> +		if (dev->data->rx_queue_state[rx_queue_id] ==
> >>> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> >>> +			dev->data->rx_queue_state[rx_queue_id] =
> >>> +				RTE_ETH_QUEUE_STATE_STOPPED;
> >> I don't understand it. Why is rte_eth_rx_queue_setup() changed?
> >>
> > This was done so user can reset the queue back to normal queue.
> 
> I think it should be done in queue release.
> 
> >>>    	}
> >>>
> >>>    	return eth_err(port_id, ret);
> >>>    }
> >>>
> >>>    int
> >>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> >>> +			       uint16_t nb_rx_desc,
> >>> +			       const struct rte_eth_hairpin_conf *conf)
> >>> +{
> >>> +	int ret;
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_hairpin_cap cap;
> >>> +	struct rte_eth_dev_info dev_info;
> >>> +	void **rxq;
> >>> +	int i;
> >>> +	int count = 0;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> >> rx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> >> ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> >>> +				-ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >>> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
> >> Return value should be checked. It makes  hairpin_cap_get check above
> >> unnecessary.
> >>
> > I will add a check.
> >
> >>> +	/* Use default specified by driver, if nb_rx_desc is zero */
> >>> +	if (nb_rx_desc == 0)
> >>> +		nb_rx_desc = cap.max_nb_desc;
> >>> +	if (nb_rx_desc > cap.max_nb_desc) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> >>> +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> >>> +	if (ret != 0)
> >>> +		return ret;
> >>> +	if (dev->data->dev_started &&
> >>> +		!(dev_info.dev_capa &
> >>> +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> >>> +		return -EBUSY;
> >>> +	if (dev->data->dev_started &&
> >>> +		(dev->data->rx_queue_state[rx_queue_id] !=
> >>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> >>> +		return -EBUSY;
> >>> +	if (conf->peer_n > cap.max_rx_2_tx) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for number of peers(=%hu), "
> >>> +			       "should be: <= %hu", conf->peer_n,
> >>> +			       cap.max_rx_2_tx);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_n == 0) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for number of peers(=%hu), "
> >>> +			       "should be: > 0", conf->peer_n);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (cap.max_n_queues != -1) {
> >>> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >>> +			if (dev->data->rx_queue_state[i] ==
> >>> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> >>> +				count++;
> >>> +		}
> >>> +		if (count > cap.max_n_queues) {
> >>> +			RTE_ETHDEV_LOG(ERR,
> >>> +				       "To many Rx hairpin queues %d", count);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	rxq = dev->data->rx_queues;
> >>> +	if (rxq[rx_queue_id]) {
> >> Please, compare with NULL (I know that rte_eth_rx_queue_setup() does
> >> like above).
> >>
> > O.K. I will change, but may I ask why? If in the rte_eth_rx_queue_setup
> function it is written the same way,
> > why do you want this change?
> 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.
> org%2Fguides%2Fcontributing%2Fcoding_style.html%23null-
> pointers&amp;data=02%7C01%7Corika%40mellanox.com%7C022cd953964f4a2
> 0d50508d7508a259f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C
> 637066426584029629&amp;sdata=IBOo6AyTGn4DiNLHBCC5MbwJkRuqdsGqdrl
> vm0xS9Vw%3D&amp;reserved=0
> 


Thanks,

> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >>> +		rxq[rx_queue_id] = NULL;
> >>> +	}
> >>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> >>> +						      nb_rx_desc, conf);
> >>> +	if (!ret)
> >> Please, compare with 0
> >>
> > Will do, but again just for my knowledge why?
> 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.
> org%2Fguides%2Fcontributing%2Fcoding_style.html%23function-
> calls&amp;data=02%7C01%7Corika%40mellanox.com%7C022cd953964f4a20d5
> 0508d7508a259f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637
> 066426584029629&amp;sdata=G8ZxEWFFsv1kLWc6L7sKT8O6CSiBcj5ZuwQqmK
> 0Q6nY%3D&amp;reserved=0
> 

Thanks,

> >>> +		dev->data->rx_queue_state[rx_queue_id] =
> >>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> >>> +	return eth_err(port_id, ret);
> >>> +}
> >>> +
> >>> +int
> >>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
> >>>    		       const struct rte_eth_txconf *tx_conf)
> >>> @@ -1851,9 +1941,97 @@ struct rte_eth_dev *
> >>>    			__func__);
> >>>    		return -EINVAL;
> >>>    	}
> >>> +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> >>> +					      socket_id, &local_conf);
> >>> +	if (!ret)
> >>> +		if (dev->data->tx_queue_state[tx_queue_id] ==
> >>> +		    RTE_ETH_QUEUE_STATE_HAIRPIN)
> >>> +			dev->data->tx_queue_state[tx_queue_id] =
> >>> +				RTE_ETH_QUEUE_STATE_STOPPED;
> >> Why is it changed?
> >>
> > Like in the Rx to enable switching back to normal queue.
> >
> >>> +	return eth_err(port_id, ret);
> >>> +}
> >>> +
> >>> +int
> >>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >>> +			       uint16_t nb_tx_desc,
> >>> +			       const struct rte_eth_hairpin_conf *conf)
> >>> +{
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_hairpin_cap cap;
> >>> +	struct rte_eth_dev_info dev_info;
> >>> +	void **txq;
> >>> +	int i;
> >>> +	int count = 0;
> >>> +	int ret;
> >>>
> >>> -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
> >>> -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> >> tx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> >> ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> >>> +				-ENOTSUP);
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >>> +	rte_eth_dev_info_get(port_id, &dev_info);
> >> return value should be checked.
> >>
> >>> +	rte_eth_dev_hairpin_capability_get(port_id, &cap);
> >> Check return value and you can rely on  hairpin_cap_get check inside.
> >>
> > Same as above I will add a check but I'm not sure I understand the second
> part
> > Just to make sure I understand, you are talking about that if I don't check
> > the return value I'm not allowed to check the cap right?
> 
> It is about dev_ops->hairpin_cap_get vs NULL check.
> It is checked inside rte_eth_dev_hairpin_capability_get() and we should
> not duplicate the check above.
> 

O.K.

> >>> +	/* Use default specified by driver, if nb_tx_desc is zero */
> >>> +	if (nb_tx_desc == 0)
> >>> +		nb_tx_desc = cap.max_nb_desc;
> >>> +	if (nb_tx_desc > cap.max_nb_desc) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> >>> +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_n > cap.max_tx_2_rx) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for number of peers(=%hu), "
> >>> +			       "should be: <= %hu", conf->peer_n,
> >>> +			       cap.max_tx_2_rx);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_n == 0) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			       "Invalid value for number of peers(=%hu), "
> >>> +			       "should be: > 0", conf->peer_n);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (cap.max_n_queues != -1) {
> >>> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> >>> +			if (dev->data->tx_queue_state[i] ==
> >>> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> >>> +				count++;
> >>> +		}
> >>> +		if (count > cap.max_n_queues) {
> >>> +			RTE_ETHDEV_LOG(ERR,
> >>> +				       "To many Rx hairpin queues %d", count);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	if (dev->data->dev_started &&
> >>> +		!(dev_info.dev_capa &
> >>> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> >>> +		return -EBUSY;
> >>> +	if (dev->data->dev_started &&
> >>> +		(dev->data->tx_queue_state[tx_queue_id] !=
> >>> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> >>> +		return -EBUSY;
> >>> +	txq = dev->data->tx_queues;
> >>> +	if (txq[tx_queue_id]) {
> >> Please, compare with NULL
> >>
> > O.K.
> >
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> >>> +		txq[tx_queue_id] = NULL;
> >>> +	}
> >>> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> >>> +		(dev, tx_queue_id, nb_tx_desc, conf);
> >>> +	if (!ret)
> >> Please, compare with 0
> >>
> > O.K.
> >
> >>> +		dev->data->tx_queue_state[tx_queue_id] =
> >>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> >>> +	return eth_err(port_id, ret);
> >>>    }
> >>>
> >>>    void
> >>> @@ -3981,12 +4159,20 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    	rte_errno = ENOTSUP;
> >>>    	return NULL;
> >>>    #endif
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>>    	/* check input parameters */
> >>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >>>    		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
> >>>    		rte_errno = EINVAL;
> >>>    		return NULL;
> >>>    	}
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (dev->data->rx_queue_state[queue_id] ==
> >>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> >> It looks like line break is not required above. Just to make code a bit
> >> shorter.
> >>
> > Will fix if possible.
> >
> >>> +		rte_errno = EINVAL;
> >>> +		return NULL;
> >>> +	}
> >>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >>>
> >>>    	if (cb == NULL) {
> >>> @@ -4058,6 +4244,8 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id,
> >> uint16_t queue_idx,
> >>>    	rte_errno = ENOTSUP;
> >>>    	return NULL;
> >>>    #endif
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>>    	/* check input parameters */
> >>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >>>    		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> >>> @@ -4065,6 +4253,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    		return NULL;
> >>>    	}
> >>>
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (dev->data->tx_queue_state[queue_id] ==
> >>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> >> It looks like line break is not required above. Just to make code a bit
> >> shorter.
> >>
> > Will fix.
> >
> >>> +		rte_errno = EINVAL;
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >>>
> >>>    	if (cb == NULL) {
> >>> @@ -4510,6 +4705,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    }
> >>>
> >>>    int
> >>> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> >>> +				   struct rte_eth_hairpin_cap *cap)
> >>> +{
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> >>> +				-ENOTSUP);
> >> I think it would be useful to memset() cap with 0 to be sure that it is
> >> initialized.
> >>
> > O.K but if we follow your comments from above, the result is not valid if we
> return
> > error, so it is not a must, but like I said I don't care to add it.
> >
> >>> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)
> >>> +		       (dev, cap));
> >>
> >> Please, consider to avoid line breaks above to make code a bit shorter.
> >> Of course, it should not exceed line length limit.
> >>
> > Will check.
> >
> >>> +}
> >>> +
> >>> +int
> >>>    rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >>>    {
> >>>    	struct rte_eth_dev *dev;
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> >>> index d937fb4..29dcfea 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.h
> >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> >>> @@ -804,6 +804,46 @@ struct rte_eth_txconf {
> >>>    };
> >>>
> >>>    /**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >> notice
> >>> + *
> >>> + * A structure used to return the hairpin capabilities that are supportd.
> >> supportd -> supported
> >>
> > Will fix.
> >
> >>> + */
> >>> +struct rte_eth_hairpin_cap {
> >>> +	int16_t max_n_queues;
> >>> +	/**< The max number of hairpin queuesi. -1 no limit. */
> >> I'm not sure that I like type difference from max_rx_queues here.
> >> I think it would be better to use uint16_t. I would say there is point
> >> to highlight no limit (first of all I'm not sure that it makes sense,
> >> second, UINT16_MAX will obviously do the job in uint16_t.
> >>
> >> Is it both Rx and Tx?
> >>
> >>> +	int16_t max_rx_2_tx;
> >>> +	/**< Max number of Rx queues to be connected to one Tx queue. */
> >>> +	int16_t max_tx_2_rx;
> >>> +	/**< Max number of Tx queues to be connected to one Rx queue. */
> >> I would prefer to have uint16_t here as well. Mainly for consistency.
> >>
> > Agree, will fix,
> >
> >>> +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> >> Is it common for Rx and Tx? What about min?
> >>
> > I think min is always 1, by  definition.
> 
> OK
> 
> >>> +};
> >>> +
> >>> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >> notice
> >>> + *
> >>> + * A structure used to hold hairpin peer data.
> >>> + */
> >>> +struct rte_eth_hairpin_peer {
> >>> +	uint16_t port; /**< Peer port. */
> >>> +	uint16_t queue; /**< Peer queue. */
> >>> +};
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >> notice
> >>> + *
> >>> + * A structure used to configure hairpin binding.
> >> It should be explained what happens if Rx queue has many Tx queues.
> >> Is packet duplicated? Or distributed?
> >>
> > Like we said before, I don't know it depends on the Nic.
> > In case of Mellanox we don't support 1 to many.
> 
> I see, but what application should expect?

I think that when some Nic supports such connection, it should add in the capabilities 
or in documentation the expected results. 
Personally I think that the common use case will be distributed (RSS on the Tx) but it is only 
my personal thinking, your question will be answered when someone will implement it.
 
> 
> >>> + */
> >>> +struct rte_eth_hairpin_conf {
> >>> +	uint16_t peer_n; /**< The number of peers. */
> >>> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> >>> +};
> >>> +
> >>> +/**
> >>>     * A structure contains information about HW descriptor ring limitations.
> >>>     */
> >>>    struct rte_eth_desc_lim {
> >>> @@ -1080,6 +1120,8 @@ struct rte_eth_conf {
> >>>    /**< Device supports Rx queue setup after device started*/
> >>>    #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> >>>    /**< Device supports Tx queue setup after device started*/
> >>> +#define RTE_ETH_DEV_CAPA_HAIRPIN_SUPPORT 0x00000004
> >>> +/**< Device supports hairpin queues. */
> >> Do we really need it? Isn't rte_eth_dev_hairpin_capability_get() returning
> >> -ENOTSUP sufficient?
> >>
> > I agree with you, but my thinking was that using the cap the application
> > can avoid calling to function that will fail.
> > If you approve I will remove this cap.
> 
> Runtime setup flags are required since these functions are always
> supported, but it is not alway possible to use these function when
> device is started. In this case I think there is no necessity to add
> the capability since we have no similar caps for anything else.
> 

O.K. will remove it.

> >>>    /*
> >>>     * If new Tx offload capabilities are defined, they also must be
> >>> @@ -1277,6 +1319,7 @@ struct rte_eth_dcb_info {
> >>>     */
> >>>    #define RTE_ETH_QUEUE_STATE_STOPPED 0
> >>>    #define RTE_ETH_QUEUE_STATE_STARTED 1
> >>> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
> >>>
> >>>    #define RTE_ETH_ALL RTE_MAX_ETHPORTS
> >>>
> >>> @@ -1771,6 +1814,34 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		struct rte_mempool *mb_pool);
> >>>
> >>>    /**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >> notice
> >>> + *
> >>> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> >>> + *
> >>> + * The function set up the selected queue to be used in hairpin.
> >>> + *
> >>> + * @param port_id
> >>> + *   The port identifier of the Ethernet device.
> >>> + * @param rx_queue_id
> >>> + *   The index of the receive queue to set up.
> >>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> supplied
> >>> + *   to rte_eth_dev_configure().
> >>> + * @param nb_rx_desc
> >>> + *   The number of receive descriptors to allocate for the receive ring.
> >> Is it allows to specify 0 to pick driver recommended value?
> >>
> > Yes, (assuming you are talking about the nb_rx_desc)
> 
> Please, highlight it in the description.
> 

O.K.

[nip]

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 00/14] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (14 preceding siblings ...)
  2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
@ 2019-10-15  9:04 ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue Ori Kam
                     ` (13 more replies)
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                   ` (3 subsequent siblings)
  19 siblings, 14 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.


This series must be applied after series[2]

[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
[2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V3:
 - update according to comments from ML.

V2:
 - update according to comments from ML.

Ori Kam (14):
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows
  doc: add hairpin feature

 app/test-pmd/parameters.c                |  28 +++
 app/test-pmd/testpmd.c                   | 109 ++++++++-
 app/test-pmd/testpmd.h                   |   3 +
 doc/guides/rel_notes/release_19_11.rst   |   6 +
 drivers/net/mlx5/mlx5.c                  | 170 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 129 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  73 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 lib/librte_ethdev/rte_ethdev.c           | 260 +++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  27 ++-
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 23 files changed, 2632 insertions(+), 169 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15 10:12     ` Andrew Rybchenko
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>

---
V3:
 - update according to ML comments.

V2:
 - update according to ML comments.

---
 lib/librte_ethdev/rte_ethdev.c           | 260 ++++++++++++++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 4 files changed, 422 insertions(+), 13 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index af82360..22a97de 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -904,10 +904,19 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
-	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
-		RTE_ETHDEV_LOG(INFO,
-			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
+	if (dev->data->rx_queue_state[rx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n",
 			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
+	if (dev->data->rx_queue_state[rx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_STARTED) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" already started\n", rx_queue_id,
+			port_id);
 		return 0;
 	}
 
@@ -931,6 +940,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
+	if (dev->data->rx_queue_state[rx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n", rx_queue_id,
+			port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -964,6 +981,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
+	if (dev->data->tx_queue_state[tx_queue_id] ==
+	   RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n", tx_queue_id,
+			port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -989,6 +1014,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
+	if (dev->data->tx_queue_state[tx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n", tx_queue_id,
+			port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -1758,6 +1791,92 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	struct rte_eth_dev_info dev_info;
+	void **rxq;
+	int i;
+	int count = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_rx_desc(=%hu), should be: "
+			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+		 RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+	if (conf->peer_n > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			if (dev->data->rx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (ret == 0)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1851,9 +1970,92 @@ struct rte_eth_dev *
 			__func__);
 		return -EINVAL;
 	}
+	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
+					      socket_id, &local_conf);
+	return eth_err(port_id, ret);
+}
 
-	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
-		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	struct rte_eth_dev_info dev_info;
+	void **txq;
+	int i;
+	int count = 0;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	rte_eth_dev_info_get(port_id, &dev_info);
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_tx_desc(=%hu), should be: "
+			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_n > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			if (dev->data->tx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
+		return -EBUSY;
+	if (dev->data->dev_started &&
+		(dev->data->tx_queue_state[tx_queue_id] !=
+		 RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (ret == 0)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
 }
 
 void
@@ -3981,12 +4183,20 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4058,6 +4268,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4065,6 +4277,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4180,6 +4399,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
 
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n", queue_id,
+			port_id);
+		return -EINVAL;
+	}
+
 	memset(qinfo, 0, sizeof(*qinfo));
 	dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
 	return 0;
@@ -4202,6 +4429,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
+			"port_id=%"PRIu16" is hairpin queue\n", queue_id,
+			port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4510,6 +4745,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	memset(cap, 0, sizeof(*cap));
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index d937fb4..51843c1 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -804,6 +804,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supported.
+ */
+struct rte_eth_hairpin_cap {
+	uint16_t max_n_queues;
+	/**< The max number of hairpin queues (different bindings). */
+	uint16_t max_rx_2_tx;
+	/**< Max number of Rx queues to be connected to one Tx queue. */
+	uint16_t max_tx_2_rx;
+	/**< Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_n; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1277,6 +1317,7 @@ struct rte_eth_dcb_info {
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
 
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
@@ -1771,6 +1812,36 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ *   0 means the PMD will use default value.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1823,6 +1894,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ *   0 to set default PMD value.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -1857,7 +1957,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1874,7 +1974,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1892,7 +1992,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1909,7 +2009,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -3575,7 +3675,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
@@ -3595,7 +3696,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
@@ -4037,6 +4139,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
@@ -4137,6 +4256,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4403,6 +4528,12 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index dcb5ae6..ef46e71 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+/**< @internal Set up a receive hairpin queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
+
 typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
 				    uint16_t rx_queue_id);
 /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
@@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 						const char *pool);
 /**< @internal Test if a port supports specific mempool ops */
 
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+/**< @internal get the hairpin capabilities. */
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -557,6 +573,8 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 	eth_rx_queue_count_t       rx_queue_count;
 	/**< Get the number of used RX descriptors. */
@@ -568,6 +586,8 @@ struct eth_dev_ops {
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
 	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
@@ -639,6 +659,9 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
 };
 
 /**
@@ -746,9 +769,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 6df42a4..77b0a86 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -283,4 +283,9 @@ EXPERIMENTAL {
 
 	# added in 19.08
 	rte_eth_read_clock;
+
+	# added in 19.11
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 02/14] net/mlx5: query hca hairpin capabilities
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 03/14] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index baf945c..4d14e9e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,10 @@ struct mlx5_hca_attr {
 	uint32_t tunnel_lro_vxlan:1;
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index acfe1de..b072c37 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 03/14] net/mlx5: support Rx hairpin queues
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 04/14] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 34376f6..49edb7e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -974,6 +974,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1040,6 +1041,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0db065a..66596df 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 04/14] net/mlx5: prepare txq to work with different types
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 03/14] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 05/14] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 49edb7e..2431a55 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -911,7 +911,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4d14e9e..36cced9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -645,7 +645,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 10d0ca1..f23708c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 53d45e7..a6e2563 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -1127,7 +1127,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1153,8 +1153,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 05/14] net/mlx5: support Tx hairpin queues
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 04/14] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 06/14] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c           |  36 +++++-
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 620 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2431a55..c53a9c6 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -325,6 +325,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -389,10 +392,25 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-	err = mlx5_get_pdn(sh->pd, &sh->pdn);
-	if (err) {
-		DRV_LOG(ERR, "Fail to extract pdn from PD");
-		goto error;
+	if (sh->devx) {
+		err = mlx5_get_pdn(sh->pd, &sh->pdn);
+		if (err) {
+			DRV_LOG(ERR, "Fail to extract pdn from PD");
+			goto error;
+		}
+		sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+		if (!sh->td) {
+			DRV_LOG(ERR, "TD allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
+		tis_attr.transport_domain = sh->td->id;
+		sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+		if (!sh->tis) {
+			DRV_LOG(ERR, "TIS allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
@@ -425,6 +443,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -485,6 +507,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -976,6 +1002,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1043,6 +1070,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 36cced9..7ea4950 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -350,6 +350,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -591,6 +628,8 @@ struct mlx5_ibv_shared {
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -911,5 +950,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index b072c37..917bbf9 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -709,3 +709,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 3765df0..faa7996 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -666,9 +666,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1311,6 +1315,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1427,6 +1448,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1572,6 +1611,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a6e2563..f9bfe31 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -1100,6 +1259,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -1108,6 +1268,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 06/14] net/mlx5: add get hairpin capabilities
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 05/14] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 07/14] app/testpmd: add hairpin support Ori Kam
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c        |  2 ++
 drivers/net/mlx5/mlx5.h        |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 27 +++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c53a9c6..7962936 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1028,6 +1028,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1090,6 +1091,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7ea4950..ce044b9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -782,7 +782,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f2b1752..95c70f7 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -2028,3 +2028,30 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->devx == 0) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	cap->max_n_queues = UINT16_MAX;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 07/14] app/testpmd: add hairpin support
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 06/14] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 08/14] net/mlx5: add hairpin binding function Ori Kam
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 app/test-pmd/parameters.c |  28 ++++++++++++
 app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |   3 ++
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 6c78dca..6246129 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -147,6 +147,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -618,6 +620,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1036,6 +1039,31 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 &&
+				    check_nb_hairpinq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_hairpinq
+						  (&pid));
+				if ((n + nb_txq) < 0 ||
+				    check_nb_txq((queueid_t)(n + nb_txq)) != 0)
+					rte_exit(EXIT_FAILURE, "txq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_txq,
+						  get_allowed_max_nb_txq(&pid));
+				if ((n + nb_rxq) < 0 ||
+				    check_nb_rxq((queueid_t)(n + nb_rxq)) != 0)
+					rte_exit(EXIT_FAILURE, "rxq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_rxq,
+						  get_allowed_max_nb_rxq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f31..8290e22 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -1103,6 +1104,53 @@ struct extmem_param {
 	return 0;
 }
 
+/*
+ * Get the allowed maximum number of hairpin queues.
+ * *pid return the port id which has minimal value of
+ * max_hairpin_queues in all ports.
+ */
+queueid_t
+get_allowed_max_nb_hairpinq(portid_t *pid)
+{
+	queueid_t allowed_max_hairpinq = MAX_QUEUE_ID;
+	portid_t pi;
+	struct rte_eth_hairpin_cap cap;
+
+	RTE_ETH_FOREACH_DEV(pi) {
+		if (rte_eth_dev_hairpin_capability_get(pi, &cap) != 0) {
+			*pid = pi;
+			return 0;
+		}
+		if (cap.max_n_queues < allowed_max_hairpinq) {
+			allowed_max_hairpinq = cap.max_n_queues;
+			*pid = pi;
+		}
+	}
+	return allowed_max_hairpinq;
+}
+
+/*
+ * Check input hairpin is valid or not.
+ * If input hairpin is not greater than any of maximum number
+ * of hairpin queues of all ports, it is valid.
+ * if valid, return 0, else return -1
+ */
+int
+check_nb_hairpinq(queueid_t hairpinq)
+{
+	queueid_t allowed_max_hairpinq;
+	portid_t pid = 0;
+
+	allowed_max_hairpinq = get_allowed_max_nb_hairpinq(&pid);
+	if (hairpinq > allowed_max_hairpinq) {
+		printf("Fail: input hairpin (%u) can't be greater "
+		       "than max_hairpin_queues (%u) of port %u\n",
+		       hairpinq, allowed_max_hairpinq, pid);
+		return -1;
+	}
+	return 0;
+}
+
 static void
 init_config(void)
 {
@@ -2064,6 +2112,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_n = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2096,9 +2149,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2251,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8ebe71..0682c11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
@@ -854,6 +855,8 @@ enum print_warning {
 int check_nb_rxq(queueid_t rxq);
 queueid_t get_allowed_max_nb_txq(portid_t *pid);
 int check_nb_txq(queueid_t txq);
+queueid_t get_allowed_max_nb_hairpinq(portid_t *pid);
+int check_nb_hairpinq(queueid_t hairpinq);
 
 uint16_t dump_rx_pkts(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		      uint16_t nb_pkts, __rte_unused uint16_t max_pkts,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 08/14] net/mlx5: add hairpin binding function
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 07/14] app/testpmd: add hairpin support Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ce044b9..a43accf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -188,6 +188,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 917bbf9..0243733 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index faa7996..d4084db 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1611,6 +1611,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 09/14] net/mlx5: add support for hairpin hrxq
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 08/14] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 10/14] net/mlx5: add internal tag item and action Ori Kam
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a43accf..391ae2c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -711,6 +711,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -785,6 +786,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 95c70f7..5b811e8 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 66596df..a8ff8b2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint32_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 10/14] net/mlx5: add internal tag item and action
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 11/14] net/mlx5: add id generation function Ori Kam
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  54 ++++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 257 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 578d003..b4bcd1a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 235bccd..0148c1b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,16 +90,17 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
-#define MLX5_FLOW_LAYER_IPIP (1u << 21)
-#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 22)
-#define MLX5_FLOW_LAYER_NVGRE (1u << 23)
+#define MLX5_FLOW_LAYER_IPIP (1u << 22)
+#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
+#define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
@@ -139,6 +177,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -172,7 +211,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f0422dc..dde6673 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4640,6 +4693,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5225,8 +5366,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5541,6 +5683,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5565,8 +5713,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5712,6 +5861,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index d4084db..695578f 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -623,7 +623,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 11/14] net/mlx5: add id generation function
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 10/14] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 12/14] net/mlx5: add default flows for hairpin Ori Kam
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7962936..0c3239c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -179,6 +179,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -329,7 +447,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0148c1b..1b14fb7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -495,8 +495,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 12/14] net/mlx5: add default flows for hairpin
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 11/14] net/mlx5: add id generation function Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 13/14] net/mlx5: split hairpin flows Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 14/14] doc: add hairpin feature Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 391ae2c..8e86bcf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -556,6 +556,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -876,6 +877,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b4bcd1a..b6dc105 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2731,6 +2731,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1b14fb7..bb67380 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -102,6 +108,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
 #define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dde6673..c7a3f6b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3518,6 +3520,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3526,11 +3531,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3796,6 +3802,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5291,6 +5299,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5866,6 +5919,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 13/14] net/mlx5: split hairpin flows
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 12/14] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 14/14] doc: add hairpin feature Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0c3239c..bd9c203 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -530,6 +530,12 @@ struct mlx5_flow_id_pool *
 			goto error;
 		}
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -569,6 +575,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -631,6 +639,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8e86bcf..5f40a39 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -574,6 +574,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -632,6 +641,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b6dc105..bb13857 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2438,6 +2438,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2465,6 +2669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2472,16 +2677,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2490,11 +2723,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2515,7 +2750,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2523,7 +2758,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2535,8 +2787,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2626,12 +2886,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index bb67380..90a289e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -434,6 +434,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -455,12 +457,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -504,15 +505,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c7a3f6b..367e632 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5763,7 +5763,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -5985,7 +5985,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6000,7 +6000,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6056,7 +6056,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6290,7 +6290,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a8ff8b2..c39118a 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v3 14/14] doc: add hairpin feature
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 13/14] net/mlx5: split hairpin flows Ori Kam
@ 2019-10-15  9:04   ` Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-15  9:04 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, orika, jingjing.wu, stephen

This commit adds the hairpin feature to the release notes.

Signed-off-by: Ori Kam <orika@mellanox.com>

---
V3:
 - address ML comments.

---
 doc/guides/rel_notes/release_19_11.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index cd4e350..2a27cb4 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -87,6 +87,11 @@ New Features
 
   Added support for the ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
 
+* **Added hairpin queue.**
+
+  On supported NICs, we can now setup haipin queue which will offload packets from the wire,
+  back to the wire.
+
 
 Removed Items
 -------------
@@ -286,4 +291,5 @@ Tested Platforms
   * Added support for VLAN push flow offload command.
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
+  * Added hairpin support.
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue
  2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-15 10:12     ` Andrew Rybchenko
  2019-10-16 19:36       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-15 10:12 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

On 10/15/19 12:04 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>
>
> ---
> V3:
>   - update according to ML comments.
>
> V2:
>   - update according to ML comments.
>
> ---
>   lib/librte_ethdev/rte_ethdev.c           | 260 ++++++++++++++++++++++++++++++-
>   lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++-
>   lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
>   lib/librte_ethdev/rte_ethdev_version.map |   5 +
>   4 files changed, 422 insertions(+), 13 deletions(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index af82360..22a97de 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -904,10 +904,19 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
>   
> -	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
> -		RTE_ETHDEV_LOG(INFO,
> -			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
> +	if (dev->data->rx_queue_state[rx_queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
> +			"port_id=%"PRIu16" is hairpin queue\n",
>   			rx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->rx_queue_state[rx_queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_STARTED) {
> +		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
> +			"port_id=%"PRIu16" already started\n", rx_queue_id,
> +			port_id);
>   		return 0;
>   	}
>   

You should not touch existing code here. Yes, line is longer than 80 
symbols,
but fixing codding style is a separate thing.

Also format string should not be split into many lines since it
makes it hard to use grep to find it in sources (i.e. when you
see it in longs and would like to find corresponding line in
sources).

[snip]

> @@ -1758,6 +1791,92 @@ struct rte_eth_dev *
>   }
>   
>   int
> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			       uint16_t nb_rx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	int ret;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	struct rte_eth_dev_info dev_info;
> +	void **rxq;
> +	int i;
> +	int count = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);

There is not necessity to check the pointer here, since it is checked
inside rte_eth_dev_info_get() and error is returned.

> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0)
> +		nb_rx_desc = cap.max_nb_desc;
> +	if (nb_rx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> +		return -EBUSY;
> +	if (dev->data->dev_started &&
> +		(dev->data->rx_queue_state[rx_queue_id] !=
> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;

As I understand it does not allow to change hairpin queue setup
by calling setup once agian.

> +	if (conf->peer_n > cap.max_rx_2_tx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,
> +			       cap.max_rx_2_tx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);
> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != UINT16_MAX) {
> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +			if (dev->data->rx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> +						      nb_rx_desc, conf);
> +	if (ret == 0)
> +		dev->data->rx_queue_state[rx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
>   		       const struct rte_eth_txconf *tx_conf)
> @@ -1851,9 +1970,92 @@ struct rte_eth_dev *
>   			__func__);
>   		return -EINVAL;
>   	}
> +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> +					      socket_id, &local_conf);
> +	return eth_err(port_id, ret);
> +}

Unrelated change

>   
> -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
> -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> +int
> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> +			       uint16_t nb_tx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	struct rte_eth_dev_info dev_info;
> +	void **txq;
> +	int i;
> +	int count = 0;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +	dev = &rte_eth_devices[port_id];
> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);

There is not necessity to check the pointer here, since it is checked
inside rte_eth_dev_info_get() and error is returned.

> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	rte_eth_dev_info_get(port_id, &dev_info);


Please, check return status.

> +	/* Use default specified by driver, if nb_tx_desc is zero */
> +	if (nb_tx_desc == 0)
> +		nb_tx_desc = cap.max_nb_desc;
> +	if (nb_tx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n > cap.max_tx_2_rx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,
> +			       cap.max_tx_2_rx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);
> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != UINT16_MAX) {
> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +			if (dev->data->tx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}

I don't understand why order of checks differ above and here.

> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> +		return -EBUSY;
> +	if (dev->data->dev_started &&
> +		(dev->data->tx_queue_state[tx_queue_id] !=
> +		 RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;

As I understand it does not allow to change hairpin queue setup
by calling setup once agian.

> +	txq = dev->data->tx_queues;
> +	if (txq[tx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> +		txq[tx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> +		(dev, tx_queue_id, nb_tx_desc, conf);
> +	if (ret == 0)
> +		dev->data->tx_queue_state[tx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
>   }
>   
>   void

[snip]

> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index d937fb4..51843c1 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h

[snip]

> @@ -1277,6 +1317,7 @@ struct rte_eth_dcb_info {
>    */
>   #define RTE_ETH_QUEUE_STATE_STOPPED 0
>   #define RTE_ETH_QUEUE_STATE_STARTED 1
> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2

See my notes below.
Also, may be out of scope of the review, but
I'd move these defines out of public header to rte_ethdev_driver.h
in a separate patch.

>   #define RTE_ETH_ALL RTE_MAX_ETHPORTS
>   
> @@ -1771,6 +1812,36 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		struct rte_mempool *mb_pool);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> + *
> + * The function set up the selected queue to be used in hairpin.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_rx_desc
> + *   The number of receive descriptors to allocate for the receive ring.
> + *   0 means the PMD will use default value.
> + * @param conf
> + *   The pointer to the hairpin configuration.

There is empty line between parameters and return description below,
but it is missing here. It should be the same in both places and I'd
prefer to have empty line to make it easier to read.

> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENOMEM) if unable to allocate the resources.
> + */
> +__rte_experimental
> +int rte_eth_rx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Allocate and set up a transmit queue for an Ethernet device.
>    *
>    * @param port_id
> @@ -1823,6 +1894,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		const struct rte_eth_txconf *tx_conf);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param tx_queue_id
> + *   The index of the transmit queue to set up.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_tx_desc
> + *   The number of transmit descriptors to allocate for the transmit ring.
> + *   0 to set default PMD value.
> + * @param conf
> + *   The hairpin configuration.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENOMEM) if unable to allocate the resources.
> + */
> +__rte_experimental
> +int rte_eth_tx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Return the NUMA socket to which an Ethernet device is connected
>    *
>    * @param port_id

[snip]

> diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
> index dcb5ae6..ef46e71 100644
> --- a/lib/librte_ethdev/rte_ethdev_core.h
> +++ b/lib/librte_ethdev/rte_ethdev_core.h
> @@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    struct rte_mempool *mb_pool);
>   /**< @internal Set up a receive queue of an Ethernet device. */
>   
> +typedef int (*eth_rx_hairpin_queue_setup_t)
> +	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +	 uint16_t nb_rx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +/**< @internal Set up a receive hairpin queue of an Ethernet device. */
> +

Please, write down full description similar to eth_promiscuous_enable_t
before the typedef. Don't forgot about return values listing.

>   typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    uint16_t tx_queue_id,
>   				    uint16_t nb_tx_desc,
> @@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>   				    const struct rte_eth_txconf *tx_conf);
>   /**< @internal Setup a transmit queue of an Ethernet device. */
>   
> +typedef int (*eth_tx_hairpin_queue_setup_t)
> +	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +	 uint16_t nb_tx_desc,
> +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> +/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
> +


Please, write down full description similar to eth_promiscuous_enable_t
before the typedef. Don't forgot about return values listing.

>   typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>   				    uint16_t rx_queue_id);
>   /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> @@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
>   						const char *pool);
>   /**< @internal Test if a port supports specific mempool ops */
>   
> +typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
> +				     struct rte_eth_hairpin_cap *cap);
> +/**< @internal get the hairpin capabilities. */
> +

Please, write down full description similar to eth_promiscuous_enable_t
before the typedef. Don't forgot about return values listing.

If you reorder functions as suggested below, hairpin queue setup
typedefs should be defiend here.

>   /**
>    * @internal A structure containing the functions exported by an Ethernet driver.
>    */
> @@ -557,6 +573,8 @@ struct eth_dev_ops {
>   	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
>   	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
>   	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
> +	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
> +	/**< Set up device RX hairpin queue. */
>   	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
>   	eth_rx_queue_count_t       rx_queue_count;
>   	/**< Get the number of used RX descriptors. */
> @@ -568,6 +586,8 @@ struct eth_dev_ops {
>   	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
>   	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
>   	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
> +	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> +	/**< Set up device TX hairpin queue. */
>   	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
>   	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>   
> @@ -639,6 +659,9 @@ struct eth_dev_ops {
>   
>   	eth_pool_ops_supported_t pool_ops_supported;
>   	/**< Test if a port supports specific mempool ops */
> +
> +	eth_hairpin_cap_get_t hairpin_cap_get;
> +	/**< Returns the hairpin capabilities. */

May I suggest to put hairpin queue setup functions here.
It will group hairpin related functions here.

>   };
>   
>   /**
> @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
>   		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
>   		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
>   	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> -			/**< Queues state: STARTED(1) / STOPPED(0). */
> +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>   	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> -			/**< Queues state: STARTED(1) / STOPPED(0). */
> +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */

/ is missing above after HAIRPIN(2).
In fact there is no point to duplicate values in parenthesis, but it is
out of scope of the review.

I'm not 100% happy that it makes impossible to mark hairpin queues
as started/stopped. It is not that important right now, but may be it is
better to use state as bit field. Bit 0 - stopped/started,
bit 1 - regular/hairpin. Anyway, it is internal interface.

[snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue
  2019-10-15 10:12     ` Andrew Rybchenko
@ 2019-10-16 19:36       ` Ori Kam
  2019-10-17 10:41         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-16 19:36 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

Thanks again for your time.

PSB, 
Ori


> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 15, 2019 1:12 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [PATCH v3 01/14] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> On 10/15/19 12:04 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> >
> > ---
> > V3:
> >   - update according to ML comments.
> >
> > V2:
> >   - update according to ML comments.
> >
> > ---
> >   lib/librte_ethdev/rte_ethdev.c           | 260
> ++++++++++++++++++++++++++++++-
> >   lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++-
> >   lib/librte_ethdev/rte_ethdev_core.h      |  27 +++-
> >   lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >   4 files changed, 422 insertions(+), 13 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index af82360..22a97de 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -904,10 +904,19 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
> ENOTSUP);
> >
> > -	if (dev->data->rx_queue_state[rx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED) {
> > -		RTE_ETHDEV_LOG(INFO,
> > -			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already started\n",
> > +	if (dev->data->rx_queue_state[rx_queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
> > +			"port_id=%"PRIu16" is hairpin queue\n",
> >   			rx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dev->data->rx_queue_state[rx_queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_STARTED) {
> > +		RTE_ETHDEV_LOG(INFO, "Queue %"PRIu16" of device with "
> > +			"port_id=%"PRIu16" already started\n", rx_queue_id,
> > +			port_id);
> >   		return 0;
> >   	}
> >
> 
> You should not touch existing code here. Yes, line is longer than 80
> symbols,
> but fixing codding style is a separate thing.
> 

The code style was done since I needed to change the if statement from
original one that was != stopped to == started. But If you prefer I can undo this change.

> Also format string should not be split into many lines since it
> makes it hard to use grep to find it in sources (i.e. when you
> see it in longs and would like to find corresponding line in
> sources).
> 

O.K by me, I'm used to mlx5 where line length is the more important thing.

> [snip]
> 
> > @@ -1758,6 +1791,92 @@ struct rte_eth_dev *
> >   }
> >
> >   int
> > +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > +			       uint16_t nb_rx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	int ret;
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	struct rte_eth_dev_info dev_info;
> > +	void **rxq;
> > +	int i;
> > +	int count = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> 
> There is not necessity to check the pointer here, since it is checked
> inside rte_eth_dev_info_get() and error is returned.
> 

O.K.

> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0)
> > +		nb_rx_desc = cap.max_nb_desc;
> > +	if (nb_rx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +	if (ret != 0)
> > +		return ret;
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +		  RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +	if (dev->data->dev_started &&
> > +		(dev->data->rx_queue_state[rx_queue_id] !=
> > +		 RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> 
> As I understand it does not allow to change hairpin queue setup
> by calling setup once agian.
> 

I will replace both above if cases to return error if dev is started. Since we can't support 
hairpin change when device is working.

> > +	if (conf->peer_n > cap.max_rx_2_tx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> > +			       cap.max_rx_2_tx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != UINT16_MAX) {
> > +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > +			if (dev->data->rx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> > +						      nb_rx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->rx_queue_state[rx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> >   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >   		       uint16_t nb_tx_desc, unsigned int socket_id,
> >   		       const struct rte_eth_txconf *tx_conf)
> > @@ -1851,9 +1970,92 @@ struct rte_eth_dev *
> >   			__func__);
> >   		return -EINVAL;
> >   	}
> > +	ret = (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> > +					      socket_id, &local_conf);
> > +	return eth_err(port_id, ret);
> > +}
> 
> Unrelated change
> 

O.K. will remove.

> >
> > -	return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
> > -		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> > +int
> > +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> > +			       uint16_t nb_tx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	struct rte_eth_dev_info dev_info;
> > +	void **txq;
> > +	int i;
> > +	int count = 0;
> > +	int ret;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +	dev = &rte_eth_devices[port_id];
> > +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> tx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -
> ENOTSUP);
> 
> There is not necessity to check the pointer here, since it is checked
> inside rte_eth_dev_info_get() and error is returned.
> 

O.K. will remove function call.

> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	rte_eth_dev_info_get(port_id, &dev_info);
> 
> 
> Please, check return status.
> 

Sure, will add check.

> > +	/* Use default specified by driver, if nb_tx_desc is zero */
> > +	if (nb_tx_desc == 0)
> > +		nb_tx_desc = cap.max_nb_desc;
> > +	if (nb_tx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n > cap.max_tx_2_rx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> > +			       cap.max_tx_2_rx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != UINT16_MAX) {
> > +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +			if (dev->data->tx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> I don't understand why order of checks differ above and here.
> 

Will align the order.

> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +		  RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +	if (dev->data->dev_started &&
> > +		(dev->data->tx_queue_state[tx_queue_id] !=
> > +		 RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> 
> As I understand it does not allow to change hairpin queue setup
> by calling setup once agian.
> 

Like above I will check only if device is started, since we can't update hairpin queue
while device is started.

> > +	txq = dev->data->tx_queues;
> > +	if (txq[tx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > +		txq[tx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> > +		(dev, tx_queue_id, nb_tx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->tx_queue_state[tx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> >   }
> >
> >   void
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index d937fb4..51843c1 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> 
> [snip]
> 
> > @@ -1277,6 +1317,7 @@ struct rte_eth_dcb_info {
> >    */
> >   #define RTE_ETH_QUEUE_STATE_STOPPED 0
> >   #define RTE_ETH_QUEUE_STATE_STARTED 1
> > +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
> 
> See my notes below.
> Also, may be out of scope of the review, but
> I'd move these defines out of public header to rte_ethdev_driver.h
> in a separate patch.
>

O.K. I will add a commit to move them.
 
> >   #define RTE_ETH_ALL RTE_MAX_ETHPORTS
> >
> > @@ -1771,6 +1812,36 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		struct rte_mempool *mb_pool);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a hairpin receive queue for an Ethernet device.
> > + *
> > + * The function set up the selected queue to be used in hairpin.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param rx_queue_id
> > + *   The index of the receive queue to set up.
> > + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_rx_desc
> > + *   The number of receive descriptors to allocate for the receive ring.
> > + *   0 means the PMD will use default value.
> > + * @param conf
> > + *   The pointer to the hairpin configuration.
> 
> There is empty line between parameters and return description below,
> but it is missing here. It should be the same in both places and I'd
> prefer to have empty line to make it easier to read.
> 

This is the how the rte_eth_dev_info_get looks like. Just following your request 😊
I think the empty line is better so I will add it.

> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-ENOMEM) if unable to allocate the resources.
> > + */
> > +__rte_experimental
> > +int rte_eth_rx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Allocate and set up a transmit queue for an Ethernet device.
> >    *
> >    * @param port_id
> > @@ -1823,6 +1894,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >   		const struct rte_eth_txconf *tx_conf);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param tx_queue_id
> > + *   The index of the transmit queue to set up.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_tx_desc
> > + *   The number of transmit descriptors to allocate for the transmit ring.
> > + *   0 to set default PMD value.
> > + * @param conf
> > + *   The hairpin configuration.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-ENOMEM) if unable to allocate the resources.
> > + */
> > +__rte_experimental
> > +int rte_eth_tx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Return the NUMA socket to which an Ethernet device is connected
> >    *
> >    * @param port_id
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev_core.h
> b/lib/librte_ethdev/rte_ethdev_core.h
> > index dcb5ae6..ef46e71 100644
> > --- a/lib/librte_ethdev/rte_ethdev_core.h
> > +++ b/lib/librte_ethdev/rte_ethdev_core.h
> > @@ -250,6 +250,12 @@ typedef int (*eth_rx_queue_setup_t)(struct
> rte_eth_dev *dev,
> >   				    struct rte_mempool *mb_pool);
> >   /**< @internal Set up a receive queue of an Ethernet device. */
> >
> > +typedef int (*eth_rx_hairpin_queue_setup_t)
> > +	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> > +	 uint16_t nb_rx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +/**< @internal Set up a receive hairpin queue of an Ethernet device. */
> > +
> 
> Please, write down full description similar to eth_promiscuous_enable_t
> before the typedef. Don't forgot about return values listing.
> 

O.K. will add description.

> >   typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
> >   				    uint16_t tx_queue_id,
> >   				    uint16_t nb_tx_desc,
> > @@ -257,6 +263,12 @@ typedef int (*eth_tx_queue_setup_t)(struct
> rte_eth_dev *dev,
> >   				    const struct rte_eth_txconf *tx_conf);
> >   /**< @internal Setup a transmit queue of an Ethernet device. */
> >
> > +typedef int (*eth_tx_hairpin_queue_setup_t)
> > +	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> > +	 uint16_t nb_tx_desc,
> > +	 const struct rte_eth_hairpin_conf *hairpin_conf);
> > +/**< @internal Setup a transmit hairpin queue of an Ethernet device. */
> > +
> 
> 
> Please, write down full description similar to eth_promiscuous_enable_t
> before the typedef. Don't forgot about return values listing.
> 
O.K. will add description.

> >   typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >   				    uint16_t rx_queue_id);
> >   /**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> > @@ -505,6 +517,10 @@ typedef int (*eth_pool_ops_supported_t)(struct
> rte_eth_dev *dev,
> >   						const char *pool);
> >   /**< @internal Test if a port supports specific mempool ops */
> >
> > +typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
> > +				     struct rte_eth_hairpin_cap *cap);
> > +/**< @internal get the hairpin capabilities. */
> > +
> 
> Please, write down full description similar to eth_promiscuous_enable_t
> before the typedef. Don't forgot about return values listing.
>

O.K. will add description.

 
> If you reorder functions as suggested below, hairpin queue setup
> typedefs should be defiend here.
> 
> >   /**
> >    * @internal A structure containing the functions exported by an Ethernet
> driver.
> >    */
> > @@ -557,6 +573,8 @@ struct eth_dev_ops {
> >   	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
> >   	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
> >   	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX
> queue. */
> > +	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
> > +	/**< Set up device RX hairpin queue. */
> >   	eth_queue_release_t        rx_queue_release; /**< Release RX queue.
> */
> >   	eth_rx_queue_count_t       rx_queue_count;
> >   	/**< Get the number of used RX descriptors. */
> > @@ -568,6 +586,8 @@ struct eth_dev_ops {
> >   	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue
> interrupt. */
> >   	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue
> interrupt. */
> >   	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX
> queue. */
> > +	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
> > +	/**< Set up device TX hairpin queue. */
> >   	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
> >   	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >
> > @@ -639,6 +659,9 @@ struct eth_dev_ops {
> >
> >   	eth_pool_ops_supported_t pool_ops_supported;
> >   	/**< Test if a port supports specific mempool ops */
> > +
> > +	eth_hairpin_cap_get_t hairpin_cap_get;
> > +	/**< Returns the hairpin capabilities. */
> 
> May I suggest to put hairpin queue setup functions here.
> It will group hairpin related functions here.
> 

I don't care, both makes sense to me (group the setup function or group the hairpin function)
But I will follow your suggestion.

> >   };
> >
> >   /**
> > @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
> >   		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0).
> */
> >   		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
> >   	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> > -			/**< Queues state: STARTED(1) / STOPPED(0). */
> > +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
> >   	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> > -			/**< Queues state: STARTED(1) / STOPPED(0). */
> > +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
> 
> / is missing above after HAIRPIN(2).
> In fact there is no point to duplicate values in parenthesis, but it is
> out of scope of the review.
> 

I will add the missing /

> I'm not 100% happy that it makes impossible to mark hairpin queues
> as started/stopped. It is not that important right now, but may be it is
> better to use state as bit field. Bit 0 - stopped/started,
> bit 1 - regular/hairpin. Anyway, it is internal interface.
> 

Your idea is very nice, but there are some things to consider.
For example if converted to bits it will take more memory.
We can just it is flags for the U8 but this will mean that we could have 
both hairpin and stopped / started in the same time. Which I'm not sure if 
it is good or bad. Like you say it is internal so let's keep the current implementation,
and discuss your idea after this patch set, and after we will see how the community uses 
the hairpin feature. Is that O.K. for you?

 

> [snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue
  2019-10-16 19:36       ` Ori Kam
@ 2019-10-17 10:41         ` Andrew Rybchenko
  0 siblings, 0 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-17 10:41 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

On 10/16/19 10:36 PM, Ori Kam wrote:
> Hi Andrew,
>
> Thanks again for your time.
>
> PSB,
> Ori
>
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Tuesday, October 15, 2019 1:12 PM
>> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
>> Subject: Re: [PATCH v3 01/14] ethdev: add support for hairpin queue
>>
>> Hi Ori,
>>
>> On 10/15/19 12:04 PM, Ori Kam wrote:
>>> This commit introduce hairpin queue type.
>>>
>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>> It is used to offload traffic coming from the wire and redirect it back
>>> to the wire.
>>>
>>> There are 3 new functions:
>>> - rte_eth_dev_hairpin_capability_get
>>> - rte_eth_rx_hairpin_queue_setup
>>> - rte_eth_tx_hairpin_queue_setup
>>>
>>> In order to use the queue, there is a need to create rte_flow
>>> with queue / RSS action that targets one or more of the Rx queues.
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>

[snip]

>>> @@ -746,9 +769,9 @@ struct rte_eth_dev_data {
>>>    		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
>>>    		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
>>>    	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>> -			/**< Queues state: STARTED(1) / STOPPED(0). */
>>> +		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>>>    	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>> -			/**< Queues state: STARTED(1) / STOPPED(0). */
>>> +		/**< Queues state: HAIRPIN(2) STARTED(1) / STOPPED(0). */
>> / is missing above after HAIRPIN(2).
>> In fact there is no point to duplicate values in parenthesis, but it is
>> out of scope of the review.
>>
> I will add the missing /
>
>> I'm not 100% happy that it makes impossible to mark hairpin queues
>> as started/stopped. It is not that important right now, but may be it is
>> better to use state as bit field. Bit 0 - stopped/started,
>> bit 1 - regular/hairpin. Anyway, it is internal interface.
>>
> Your idea is very nice, but there are some things to consider.
> For example if converted to bits it will take more memory.
> We can just it is flags for the U8 but this will mean that we could have
> both hairpin and stopped / started in the same time. Which I'm not sure if
> it is good or bad. Like you say it is internal so let's keep the current implementation,
> and discuss your idea after this patch set, and after we will see how the community uses
> the hairpin feature. Is that O.K. for you?

Yes.


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 00/15] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (15 preceding siblings ...)
  2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
@ 2019-10-17 15:32 ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
                     ` (15 more replies)
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                   ` (2 subsequent siblings)
  19 siblings, 16 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

s patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.


This series must be applied after series[2]

[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
[2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V4:
 - update according to comments from ML.

V3:
 - update according to comments from ML.

V2:
 - update according to comments from ML.

Ori Kam (15):
  ethdev: move queue state defines to private file
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows
  doc: add hairpin feature

 app/test-pmd/parameters.c                |  28 +++
 app/test-pmd/testpmd.c                   | 109 ++++++++-
 app/test-pmd/testpmd.h                   |   3 +
 doc/guides/rel_notes/release_19_11.rst   |   6 +
 drivers/net/mlx5/mlx5.c                  | 170 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 129 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  73 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 lib/librte_ethdev/rte_ethdev.c           | 229 ++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 149 +++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |   7 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 24 files changed, 2677 insertions(+), 170 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:37     ` Stephen Hemminger
  2019-10-22 10:59     ` Andrew Rybchenko
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
                     ` (14 subsequent siblings)
  15 siblings, 2 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>

---
V4:
 - new file, created due to ML comments.

---
 lib/librte_ethdev/rte_ethdev.h        | 6 ------
 lib/librte_ethdev/rte_ethdev_driver.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index d937fb4..187a2bb 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1272,12 +1272,6 @@ struct rte_eth_dcb_info {
 	struct rte_eth_dcb_tc_queue_mapping tc_queue;
 };
 
-/**
- * RX/TX queue states
- */
-#define RTE_ETH_QUEUE_STATE_STOPPED 0
-#define RTE_ETH_QUEUE_STATE_STARTED 1
-
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
 /* Macros to check for valid port */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 936ff8c..c404f17 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -22,6 +22,12 @@
 #endif
 
 /**
+ * RX/TX queue states
+ */
+#define RTE_ETH_QUEUE_STATE_STOPPED 0
+#define RTE_ETH_QUEUE_STATE_STARTED 1
+
+/**
  * @internal
  * Returns a ethdev slot specified by the unique identifier name.
  *
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 21:01     ` Thomas Monjalon
                       ` (2 more replies)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (13 subsequent siblings)
  15 siblings, 3 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>

---
V4:
 - update according to ML comments.

V3:
 - update according to ML comments.

V2:
 - update according to ML comments.

---
 lib/librte_ethdev/rte_ethdev.c           | 229 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 +++++++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |   1 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 5 files changed, 461 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index af82360..10a8bf2 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -904,6 +904,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
+	if (dev->data->rx_queue_state[rx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -931,6 +939,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
+	if (dev->data->rx_queue_state[rx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -964,6 +980,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
+	if (dev->data->tx_queue_state[tx_queue_id] ==
+	   RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -989,6 +1013,14 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
+	if (dev->data->tx_queue_state[tx_queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -1758,6 +1790,81 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **rxq;
+	int i;
+	int count = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_rx_desc(=%hu), should be: "
+			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_n > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			if (dev->data->rx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (ret == 0)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1856,6 +1963,80 @@ struct rte_eth_dev *
 		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **txq;
+	int i;
+	int count = 0;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for nb_tx_desc(=%hu), should be: "
+			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_n > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: <= %hu", conf->peer_n,
+			       cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_n == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			       "Invalid value for number of peers(=%hu), "
+			       "should be: > 0", conf->peer_n);
+		return -EINVAL;
+	}
+	if (cap.max_n_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			if (dev->data->tx_queue_state[i] ==
+			    RTE_ETH_QUEUE_STATE_HAIRPIN)
+				count++;
+		}
+		if (count > cap.max_n_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Rx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (ret == 0)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
@@ -3981,12 +4162,20 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4058,6 +4247,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4065,6 +4256,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4180,6 +4378,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
 
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	memset(qinfo, 0, sizeof(*qinfo));
 	dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
 	return 0;
@@ -4202,6 +4408,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(INFO,
+			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4510,6 +4724,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	memset(cap, 0, sizeof(*cap));
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 187a2bb..276f55f 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -804,6 +804,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supported.
+ */
+struct rte_eth_hairpin_cap {
+	uint16_t max_n_queues;
+	/**< The max number of hairpin queues (different bindings). */
+	uint16_t max_rx_2_tx;
+	/**< Max number of Rx queues to be connected to one Tx queue. */
+	uint16_t max_tx_2_rx;
+	/**< Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_n; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1765,6 +1805,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ *   0 means the PMD will use default value.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1817,6 +1888,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ *   0 to set default PMD value.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -1851,7 +1951,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1868,7 +1968,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1886,7 +1986,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1903,7 +2003,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -3569,7 +3669,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
@@ -3589,7 +3690,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
@@ -4031,6 +4133,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
@@ -4131,6 +4250,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4397,6 +4522,12 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
+		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index dcb5ae6..6d61158 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -506,6 +506,86 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 /**< @internal Test if a port supports specific mempool ops */
 
 /**
+ * @internal
+ * Get the hairpin capabilities.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param cap
+ *   returns the hairpin capabilities from the device.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ */
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+
+/**
+ * @internal
+ * Setup RX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_queue_id
+ *   the selected RX queue index.
+ * @param nb_rx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the RX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
+ * @internal
+ * Setup TX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param tx_queue_id
+ *   the selected TX queue index.
+ * @param nb_tx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the TX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
 struct eth_dev_ops {
@@ -639,6 +719,13 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 };
 
 /**
@@ -746,9 +833,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c404f17..59d4c01 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -26,6 +26,7 @@
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
 
 /**
  * @internal
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 6df42a4..77b0a86 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -283,4 +283,9 @@ EXPERIMENTAL {
 
 	# added in 19.08
 	rte_eth_read_clock;
+
+	# added in 19.11
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 03/15] net/mlx5: query hca hairpin capabilities
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 04/15] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index baf945c..4d14e9e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,10 @@ struct mlx5_hca_attr {
 	uint32_t tunnel_lro_vxlan:1;
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index acfe1de..b072c37 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 04/15] net/mlx5: support Rx hairpin queues
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 05/15] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 34376f6..49edb7e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -974,6 +974,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1040,6 +1041,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0db065a..66596df 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 05/15] net/mlx5: prepare txq to work with different types
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 04/15] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 06/15] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 49edb7e..2431a55 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -911,7 +911,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4d14e9e..36cced9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -645,7 +645,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 10d0ca1..f23708c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 53d45e7..a6e2563 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -1127,7 +1127,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1153,8 +1153,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 06/15] net/mlx5: support Tx hairpin queues
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 05/15] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 07/15] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c           |  36 +++++-
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 620 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2431a55..c53a9c6 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -325,6 +325,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -389,10 +392,25 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-	err = mlx5_get_pdn(sh->pd, &sh->pdn);
-	if (err) {
-		DRV_LOG(ERR, "Fail to extract pdn from PD");
-		goto error;
+	if (sh->devx) {
+		err = mlx5_get_pdn(sh->pd, &sh->pdn);
+		if (err) {
+			DRV_LOG(ERR, "Fail to extract pdn from PD");
+			goto error;
+		}
+		sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+		if (!sh->td) {
+			DRV_LOG(ERR, "TD allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
+		tis_attr.transport_domain = sh->td->id;
+		sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+		if (!sh->tis) {
+			DRV_LOG(ERR, "TIS allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
@@ -425,6 +443,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -485,6 +507,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -976,6 +1002,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1043,6 +1070,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 36cced9..7ea4950 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -350,6 +350,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -591,6 +628,8 @@ struct mlx5_ibv_shared {
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -911,5 +950,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index b072c37..917bbf9 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -709,3 +709,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 3765df0..faa7996 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -666,9 +666,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1311,6 +1315,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1427,6 +1448,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1572,6 +1611,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a6e2563..f9bfe31 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_n != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -1100,6 +1259,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -1108,6 +1268,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 07/15] net/mlx5: add get hairpin capabilities
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 06/15] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 08/15] app/testpmd: add hairpin support Ori Kam
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c        |  2 ++
 drivers/net/mlx5/mlx5.h        |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 27 +++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c53a9c6..7962936 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1028,6 +1028,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1090,6 +1091,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7ea4950..ce044b9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -782,7 +782,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f2b1752..95c70f7 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -2028,3 +2028,30 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->devx == 0) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	cap->max_n_queues = UINT16_MAX;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 08/15] app/testpmd: add hairpin support
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 07/15] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 09/15] net/mlx5: add hairpin binding function Ori Kam
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 app/test-pmd/parameters.c |  28 ++++++++++++
 app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |   3 ++
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 6c78dca..6246129 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -147,6 +147,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -618,6 +620,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1036,6 +1039,31 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 &&
+				    check_nb_hairpinq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_hairpinq
+						  (&pid));
+				if ((n + nb_txq) < 0 ||
+				    check_nb_txq((queueid_t)(n + nb_txq)) != 0)
+					rte_exit(EXIT_FAILURE, "txq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_txq,
+						  get_allowed_max_nb_txq(&pid));
+				if ((n + nb_rxq) < 0 ||
+				    check_nb_rxq((queueid_t)(n + nb_rxq)) != 0)
+					rte_exit(EXIT_FAILURE, "rxq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_rxq,
+						  get_allowed_max_nb_rxq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f31..8290e22 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -1103,6 +1104,53 @@ struct extmem_param {
 	return 0;
 }
 
+/*
+ * Get the allowed maximum number of hairpin queues.
+ * *pid return the port id which has minimal value of
+ * max_hairpin_queues in all ports.
+ */
+queueid_t
+get_allowed_max_nb_hairpinq(portid_t *pid)
+{
+	queueid_t allowed_max_hairpinq = MAX_QUEUE_ID;
+	portid_t pi;
+	struct rte_eth_hairpin_cap cap;
+
+	RTE_ETH_FOREACH_DEV(pi) {
+		if (rte_eth_dev_hairpin_capability_get(pi, &cap) != 0) {
+			*pid = pi;
+			return 0;
+		}
+		if (cap.max_n_queues < allowed_max_hairpinq) {
+			allowed_max_hairpinq = cap.max_n_queues;
+			*pid = pi;
+		}
+	}
+	return allowed_max_hairpinq;
+}
+
+/*
+ * Check input hairpin is valid or not.
+ * If input hairpin is not greater than any of maximum number
+ * of hairpin queues of all ports, it is valid.
+ * if valid, return 0, else return -1
+ */
+int
+check_nb_hairpinq(queueid_t hairpinq)
+{
+	queueid_t allowed_max_hairpinq;
+	portid_t pid = 0;
+
+	allowed_max_hairpinq = get_allowed_max_nb_hairpinq(&pid);
+	if (hairpinq > allowed_max_hairpinq) {
+		printf("Fail: input hairpin (%u) can't be greater "
+		       "than max_hairpin_queues (%u) of port %u\n",
+		       hairpinq, allowed_max_hairpinq, pid);
+		return -1;
+	}
+	return 0;
+}
+
 static void
 init_config(void)
 {
@@ -2064,6 +2112,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_n = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2096,9 +2149,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2251,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8ebe71..0682c11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
@@ -854,6 +855,8 @@ enum print_warning {
 int check_nb_rxq(queueid_t rxq);
 queueid_t get_allowed_max_nb_txq(portid_t *pid);
 int check_nb_txq(queueid_t txq);
+queueid_t get_allowed_max_nb_hairpinq(portid_t *pid);
+int check_nb_hairpinq(queueid_t hairpinq);
 
 uint16_t dump_rx_pkts(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		      uint16_t nb_pkts, __rte_unused uint16_t max_pkts,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 09/15] net/mlx5: add hairpin binding function
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 08/15] app/testpmd: add hairpin support Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ce044b9..a43accf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -188,6 +188,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 917bbf9..0243733 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index faa7996..d4084db 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1611,6 +1611,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 10/15] net/mlx5: add support for hairpin hrxq
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 09/15] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 11/15] net/mlx5: add internal tag item and action Ori Kam
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a43accf..391ae2c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -711,6 +711,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -785,6 +786,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 95c70f7..5b811e8 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 66596df..a8ff8b2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint32_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 11/15] net/mlx5: add internal tag item and action
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 12/15] net/mlx5: add id generation function Ori Kam
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  54 ++++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 257 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 578d003..b4bcd1a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 235bccd..0148c1b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,16 +90,17 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
-#define MLX5_FLOW_LAYER_IPIP (1u << 21)
-#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 22)
-#define MLX5_FLOW_LAYER_NVGRE (1u << 23)
+#define MLX5_FLOW_LAYER_IPIP (1u << 22)
+#define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
+#define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
@@ -139,6 +177,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -172,7 +211,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f0422dc..dde6673 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4640,6 +4693,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5225,8 +5366,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5541,6 +5683,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5565,8 +5713,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5712,6 +5861,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index d4084db..695578f 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -623,7 +623,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 12/15] net/mlx5: add id generation function
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 11/15] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 13/15] net/mlx5: add default flows for hairpin Ori Kam
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7962936..0c3239c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -179,6 +179,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -329,7 +447,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0148c1b..1b14fb7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -495,8 +495,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 13/15] net/mlx5: add default flows for hairpin
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 12/15] net/mlx5: add id generation function Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 14/15] net/mlx5: split hairpin flows Ori Kam
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 391ae2c..8e86bcf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -556,6 +556,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -876,6 +877,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b4bcd1a..b6dc105 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2731,6 +2731,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1b14fb7..bb67380 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -102,6 +108,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_IPV6_ENCAP (1u << 23)
 #define MLX5_FLOW_LAYER_NVGRE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index dde6673..c7a3f6b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3518,6 +3520,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3526,11 +3531,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3796,6 +3802,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5291,6 +5299,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5866,6 +5919,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 14/15] net/mlx5: split hairpin flows
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 13/15] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 15/15] doc: add hairpin feature Ori Kam
  2019-10-18 19:07   ` [dpdk-dev] [PATCH v4 00/15] " Ferruh Yigit
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0c3239c..bd9c203 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -530,6 +530,12 @@ struct mlx5_flow_id_pool *
 			goto error;
 		}
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -569,6 +575,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -631,6 +639,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8e86bcf..5f40a39 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -574,6 +574,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -632,6 +641,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b6dc105..bb13857 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2438,6 +2438,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2465,6 +2669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2472,16 +2677,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2490,11 +2723,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2515,7 +2750,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2523,7 +2758,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2535,8 +2787,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2626,12 +2886,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index bb67380..90a289e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -434,6 +434,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -455,12 +457,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -504,15 +505,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c7a3f6b..367e632 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5763,7 +5763,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -5985,7 +5985,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6000,7 +6000,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6056,7 +6056,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6290,7 +6290,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a8ff8b2..c39118a 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v4 15/15] doc: add hairpin feature
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (13 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 14/15] net/mlx5: split hairpin flows Ori Kam
@ 2019-10-17 15:32   ` Ori Kam
  2019-10-18 19:07   ` [dpdk-dev] [PATCH v4 00/15] " Ferruh Yigit
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-17 15:32 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, orika, jingjing.wu, stephen

This commit adds the hairpin feature to the release notes.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 doc/guides/rel_notes/release_19_11.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index cd4e350..2a27cb4 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -87,6 +87,11 @@ New Features
 
   Added support for the ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
 
+* **Added hairpin queue.**
+
+  On supported NICs, we can now setup haipin queue which will offload packets from the wire,
+  back to the wire.
+
 
 Removed Items
 -------------
@@ -286,4 +291,5 @@ Tested Platforms
   * Added support for VLAN push flow offload command.
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
+  * Added hairpin support.
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
@ 2019-10-17 15:37     ` Stephen Hemminger
  2019-10-22 10:59     ` Andrew Rybchenko
  1 sibling, 0 replies; 186+ messages in thread
From: Stephen Hemminger @ 2019-10-17 15:37 UTC (permalink / raw)
  To: Ori Kam; +Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, jingjing.wu

On Thu, 17 Oct 2019 15:32:03 +0000
Ori Kam <orika@mellanox.com> wrote:

>  /**
> + * RX/TX queue states
> + */
> +#define RTE_ETH_QUEUE_STATE_STOPPED 0
> +#define RTE_ETH_QUEUE_STATE_STARTED 1

Why not make it an enum?

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-17 21:01     ` Thomas Monjalon
  2019-10-22 11:37     ` Andrew Rybchenko
  2019-10-23  7:04     ` Thomas Monjalon
  2 siblings, 0 replies; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-17 21:01 UTC (permalink / raw)
  To: Ori Kam; +Cc: dev

17/10/2019 17:32, Ori Kam:
> V4:
>  - update according to ML comments.
> 
> V3:
>  - update according to ML comments.
> 
> V2:
>  - update according to ML comments.

I would prefer to see a summary of the changes in the changelog,
so we can easily track the progress.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/15] add hairpin feature
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
                     ` (14 preceding siblings ...)
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 15/15] doc: add hairpin feature Ori Kam
@ 2019-10-18 19:07   ` Ferruh Yigit
  15 siblings, 0 replies; 186+ messages in thread
From: Ferruh Yigit @ 2019-10-18 19:07 UTC (permalink / raw)
  To: Ori Kam
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger, thomas,
	arybchenko, viacheslavo

On 10/17/2019 4:32 PM, Ori Kam wrote:
> s patch set implements the hairpin feature.
> The hairpin feature was introduced in RFC[1]
> 
> The hairpin feature (different name can be forward) acts as "bump on the wire",
> meaning that a packet that is received from the wire can be modified using
> offloaded action and then sent back to the wire without application intervention
> which save CPU cycles.
> 
> The hairpin is the inverse function of loopback in which application
> sends a packet then it is received again by the
> application without being sent to the wire.
> 
> The hairpin can be used by a number of different NVF, for example load
> balancer, gateway and so on.
> 
> As can be seen from the hairpin description, hairpin is basically RX queue
> connected to TX queue.
> 
> During the design phase I was thinking of two ways to implement this
> feature the first one is adding a new rte flow action. and the second
> one is create a special kind of queue.
> 
> The advantages of using the queue approch:
> 1. More control for the application. queue depth (the memory size that
> should be used).
> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> will be easy to integrate with such system.
> 3. Native integression with the rte flow API. Just setting the target
> queue/rss to hairpin queue, will result that the traffic will be routed
> to the hairpin queue.
> 4. Enable queue offloading.
> 
> Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
> different ports assuming the PMD supports it. The same goes the other
> way each hairpin Txq can be connected to one or more Rxqs.
> This is the reason that both the Txq setup and Rxq setup are getting the
> hairpin configuration structure.
> 
> From PMD prespctive the number of Rxq/Txq is the total of standard
> queues + hairpin queues.
> 
> To configure hairpin queue the user should call
> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
> of the normal queue setup functions.
> 
> The hairpin queues are not part of the normal RSS functiosn.
> 
> To use the queues the user simply create a flow that points to RSS/queue
> actions that are hairpin queues.
> The reason for selecting 2 new functions for hairpin queue setup are:
> 1. avoid API break.
> 2. avoid extra and unused parameters.
> 
> 
> This series must be applied after series[2]

This dependency already merged right? If so can drop from cover letter.

> 
> [1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
> [2] https://inbox.dpdk.org/dev/1569398015-6027-1-git-send-email-viacheslavo@mellanox.com/
> 
> Cc: wenzhuo.lu@intel.com
> Cc: bernard.iremonger@intel.com
> Cc: thomas@monjalon.net
> Cc: ferruh.yigit@intel.com
> Cc: arybchenko@solarflare.com
> Cc: viacheslavo@mellanox.com
> 
> ------
> V4:
>  - update according to comments from ML.
> 
> V3:
>  - update according to comments from ML.
> 
> V2:
>  - update according to comments from ML.
> 
> Ori Kam (15):
>   ethdev: move queue state defines to private file
>   ethdev: add support for hairpin queue
>   net/mlx5: query hca hairpin capabilities
>   net/mlx5: support Rx hairpin queues
>   net/mlx5: prepare txq to work with different types
>   net/mlx5: support Tx hairpin queues
>   net/mlx5: add get hairpin capabilities
>   app/testpmd: add hairpin support
>   net/mlx5: add hairpin binding function
>   net/mlx5: add support for hairpin hrxq
>   net/mlx5: add internal tag item and action
>   net/mlx5: add id generation function
>   net/mlx5: add default flows for hairpin
>   net/mlx5: split hairpin flows
>   doc: add hairpin feature

There are build error as patchwork status also shows.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
  2019-10-17 15:37     ` Stephen Hemminger
@ 2019-10-22 10:59     ` Andrew Rybchenko
  1 sibling, 0 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-22 10:59 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

On 10/17/19 6:32 PM, Ori Kam wrote:
> The queue state defines are internal to the DPDK.
> This commit moves them to a private header file.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>

Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

Yes, there is a lot of space for improvements in queue state
management etc, but this one is a step in a right direction to
remove it from public API.


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
  2019-10-17 21:01     ` Thomas Monjalon
@ 2019-10-22 11:37     ` Andrew Rybchenko
  2019-10-23  6:23       ` Ori Kam
  2019-10-23  7:04     ` Thomas Monjalon
  2 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-22 11:37 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

see my notes below.

A generic note is that we have strict policy about Rx/Tx (not RX/TX) in
commit messages, but I'd like to follow it in comments and log messages
at least in a new code. It is already a mixture in the existing code.

On 10/17/19 6:32 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>
>
> ---
> V4:
>   - update according to ML comments.
>
> V3:
>   - update according to ML comments.
>
> V2:
>   - update according to ML comments.
>
> ---
>   lib/librte_ethdev/rte_ethdev.c           | 229 +++++++++++++++++++++++++++++++
>   lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++++-
>   lib/librte_ethdev/rte_ethdev_core.h      |  91 +++++++++++-
>   lib/librte_ethdev/rte_ethdev_driver.h    |   1 +
>   lib/librte_ethdev/rte_ethdev_version.map |   5 +
>   5 files changed, 461 insertions(+), 8 deletions(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index af82360..10a8bf2 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -904,6 +904,14 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
>   
> +	if (dev->data->rx_queue_state[rx_queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {

May I suggest to add helper static function to check
if device Rx queue is hairpin. Plus similar function for Tx.
It will allow to make changes in rx_queue_state less
intrusive.
These functions should be used everywhere below in
the code.

> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

The message is the same for many cases and it is bad since it does
not allow to identify place where it was logged easily.
It should be mentioned here that it is an attempt to start Rx queue.

> +			rx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
> @@ -931,6 +939,14 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
>   
> +	if (dev->data->rx_queue_state[rx_queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
> +			rx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
> @@ -964,6 +980,14 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
>   
> +	if (dev->data->tx_queue_state[tx_queue_id] ==
> +	   RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
> +			tx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
> @@ -989,6 +1013,14 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
>   
> +	if (dev->data->tx_queue_state[tx_queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
> +			tx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
> @@ -1758,6 +1790,81 @@ struct rte_eth_dev *
>   }
>   
>   int
> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			       uint16_t nb_rx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	int ret;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	void **rxq;
> +	int i;
> +	int count = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0)
> +		nb_rx_desc = cap.max_nb_desc;
> +	if (nb_rx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);

Please, don't split format string

> +		return -EINVAL;
> +	}
> +	if (conf->peer_n > cap.max_rx_2_tx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,

Please, don't split format string.
Also make the message unique. Right now it is same for Rx and Tx.

> +			       cap.max_rx_2_tx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);

Please, don't split format string
Also make the message unique. Right now it is same for Rx and Tx.

> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != UINT16_MAX) {
> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +			if (dev->data->rx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	if (dev->data->dev_started)
> +		return -EBUSY;
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> +						      nb_rx_desc, conf);
> +	if (ret == 0)
> +		dev->data->rx_queue_state[rx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
>   		       const struct rte_eth_txconf *tx_conf)
> @@ -1856,6 +1963,80 @@ struct rte_eth_dev *
>   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>   }
>   
> +int
> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> +			       uint16_t nb_tx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	void **txq;
> +	int i;
> +	int count = 0;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +	dev = &rte_eth_devices[port_id];
> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	/* Use default specified by driver, if nb_tx_desc is zero */
> +	if (nb_tx_desc == 0)
> +		nb_tx_desc = cap.max_nb_desc;
> +	if (nb_tx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);

Please, don't split format string

> +		return -EINVAL;
> +	}
> +	if (conf->peer_n > cap.max_tx_2_rx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: <= %hu", conf->peer_n,

Please, don't split format string

> +			       cap.max_tx_2_rx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_n == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "Invalid value for number of peers(=%hu), "
> +			       "should be: > 0", conf->peer_n);

Please, don't split format string

> +		return -EINVAL;
> +	}
> +	if (cap.max_n_queues != UINT16_MAX) {
> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +			if (dev->data->tx_queue_state[i] ==
> +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +				count++;
> +		}
> +		if (count > cap.max_n_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Rx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	if (dev->data->dev_started)
> +		return -EBUSY;
> +	txq = dev->data->tx_queues;
> +	if (txq[tx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> +		txq[tx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> +		(dev, tx_queue_id, nb_tx_desc, conf);
> +	if (ret == 0)
> +		dev->data->tx_queue_state[tx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
>   void
>   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
>   		void *userdata __rte_unused)
> @@ -3981,12 +4162,20 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
>   		rte_errno = EINVAL;
>   		return NULL;
>   	}
> +	dev = &rte_eth_devices[port_id];
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4058,6 +4247,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> @@ -4065,6 +4256,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return NULL;
>   	}
>   
> +	dev = &rte_eth_devices[port_id];
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4180,6 +4378,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
>   
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

I think it would be useful if log mentions what caller tried to do.
See note about log messages uniqueness, e.g.
"Cannot get RxQ info: port %"PRIu16" queue %"PRIu16" is hairpin"
Also I think it is better to check it before rxq_info_get check
mainly to put it nearby queue range check to group all
queue ID checks together.

> +			queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	memset(qinfo, 0, sizeof(*qinfo));
>   	dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
>   	return 0;
> @@ -4202,6 +4408,14 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return -EINVAL;
>   	}
>   
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Same as above.

> +			queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
>   
>   	memset(qinfo, 0, sizeof(*qinfo));
> @@ -4510,6 +4724,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   }
>   
>   int
> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> +				   struct rte_eth_hairpin_cap *cap)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> +				-ENOTSUP);
> +	memset(cap, 0, sizeof(*cap));
> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
> +}
> +
> +int
>   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>   {
>   	struct rte_eth_dev *dev;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 187a2bb..276f55f 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -804,6 +804,46 @@ struct rte_eth_txconf {
>   };
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to return the hairpin capabilities that are supported.
> + */
> +struct rte_eth_hairpin_cap {
> +	uint16_t max_n_queues;
> +	/**< The max number of hairpin queues (different bindings). */
> +	uint16_t max_rx_2_tx;
> +	/**< Max number of Rx queues to be connected to one Tx queue. */
> +	uint16_t max_tx_2_rx;
> +	/**< Max number of Tx queues to be connected to one Rx queue. */
> +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> +};
> +
> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to hold hairpin peer data.
> + */
> +struct rte_eth_hairpin_peer {
> +	uint16_t port; /**< Peer port. */
> +	uint16_t queue; /**< Peer queue. */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure hairpin binding.
> + */
> +struct rte_eth_hairpin_conf {
> +	uint16_t peer_n; /**< The number of peers. */
> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> +};
> +
> +/**
>    * A structure contains information about HW descriptor ring limitations.
>    */
>   struct rte_eth_desc_lim {
> @@ -1765,6 +1805,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		struct rte_mempool *mb_pool);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a hairpin receive queue for an Ethernet device.
> + *
> + * The function set up the selected queue to be used in hairpin.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_rx_desc
> + *   The number of receive descriptors to allocate for the receive ring.
> + *   0 means the PMD will use default value.
> + * @param conf
> + *   The pointer to the hairpin configuration.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENOMEM) if unable to allocate the resources.
> + */
> +__rte_experimental
> +int rte_eth_rx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Allocate and set up a transmit queue for an Ethernet device.
>    *
>    * @param port_id
> @@ -1817,6 +1888,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		const struct rte_eth_txconf *tx_conf);
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param tx_queue_id
> + *   The index of the transmit queue to set up.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_tx_desc
> + *   The number of transmit descriptors to allocate for the transmit ring.
> + *   0 to set default PMD value.
> + * @param conf
> + *   The hairpin configuration.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENOMEM) if unable to allocate the resources.
> + */
> +__rte_experimental
> +int rte_eth_tx_hairpin_queue_setup
> +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> +	 const struct rte_eth_hairpin_conf *conf);
> +
> +/**
>    * Return the NUMA socket to which an Ethernet device is connected
>    *
>    * @param port_id
> @@ -1851,7 +1951,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>    *   to rte_eth_dev_configure().
>    * @return
>    *   - 0: Success, the receive queue is started.
> - *   - -EINVAL: The port_id or the queue_id out of range.
> + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
>    *   - -EIO: if device is removed.
>    *   - -ENOTSUP: The function not supported in PMD driver.
>    */
> @@ -1868,7 +1968,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>    *   to rte_eth_dev_configure().
>    * @return
>    *   - 0: Success, the receive queue is stopped.
> - *   - -EINVAL: The port_id or the queue_id out of range.
> + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
>    *   - -EIO: if device is removed.
>    *   - -ENOTSUP: The function not supported in PMD driver.
>    */
> @@ -1886,7 +1986,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>    *   to rte_eth_dev_configure().
>    * @return
>    *   - 0: Success, the transmit queue is started.
> - *   - -EINVAL: The port_id or the queue_id out of range.
> + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
>    *   - -EIO: if device is removed.
>    *   - -ENOTSUP: The function not supported in PMD driver.
>    */
> @@ -1903,7 +2003,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>    *   to rte_eth_dev_configure().
>    * @return
>    *   - 0: Success, the transmit queue is stopped.
> - *   - -EINVAL: The port_id or the queue_id out of range.
> + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
>    *   - -EIO: if device is removed.
>    *   - -ENOTSUP: The function not supported in PMD driver.
>    */
> @@ -3569,7 +3669,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
>    * @return
>    *   - 0: Success
>    *   - -ENOTSUP: routine is not supported by the device PMD.
> - *   - -EINVAL:  The port_id or the queue_id is out of range.
> + *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
> + *               is hairpin queue.
>    */
>   int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
>   	struct rte_eth_rxq_info *qinfo);
> @@ -3589,7 +3690,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
>    * @return
>    *   - 0: Success
>    *   - -ENOTSUP: routine is not supported by the device PMD.
> - *   - -EINVAL:  The port_id or the queue_id is out of range.
> + *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
> + *               is hairpin queue.
>    */
>   int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
>   	struct rte_eth_txq_info *qinfo);
> @@ -4031,6 +4133,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
>   void *
>   rte_eth_dev_get_sec_ctx(uint16_t port_id);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Query the device hairpin capabilities.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   Pointer to a structure that will hold the hairpin capabilities.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + */
> +__rte_experimental
> +int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> +				       struct rte_eth_hairpin_cap *cap);
>   
>   #include <rte_ethdev_core.h>
>   
> @@ -4131,6 +4250,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
>   		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
>   		return 0;
>   	}
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",

I see but these log messages are very similar to above, but I still think
it would be useful to mention context to make it clear in log, e.g.
"Cannot Rx from hairpin queue%"PRIu16" at port %"PRIu16

> +			       queue_id);
> +		return 0;
> +	}
>   #endif
>   	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>   				     rx_pkts, nb_pkts);
> @@ -4397,6 +4522,12 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
>   		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
>   		return 0;
>   	}
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> +		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
> +			       queue_id);

I think it would be useful to mention context to make it clear in log, e.g.
"Cannot Tx to hairpin queue%"PRIu16" at port %"PRIu16

> +		return 0;
> +	}
>   #endif
>   
>   #ifdef RTE_ETHDEV_RXTX_CALLBACKS

[snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-22 11:37     ` Andrew Rybchenko
@ 2019-10-23  6:23       ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23  6:23 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 22, 2019 2:38 PM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [PATCH v4 02/15] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> see my notes below.
> 
> A generic note is that we have strict policy about Rx/Tx (not RX/TX) in
> commit messages, but I'd like to follow it in comments and log messages
> at least in a new code. It is already a mixture in the existing code.
>

O.K. will make sure my code is aligned.
 
> On 10/17/19 6:32 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> >
> > ---
> > V4:
> >   - update according to ML comments.
> >
> > V3:
> >   - update according to ML comments.
> >
> > V2:
> >   - update according to ML comments.
> >
> > ---
> >   lib/librte_ethdev/rte_ethdev.c           | 229
> +++++++++++++++++++++++++++++++
> >   lib/librte_ethdev/rte_ethdev.h           | 143 ++++++++++++++++++-
> >   lib/librte_ethdev/rte_ethdev_core.h      |  91 +++++++++++-
> >   lib/librte_ethdev/rte_ethdev_driver.h    |   1 +
> >   lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >   5 files changed, 461 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index af82360..10a8bf2 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -904,6 +904,14 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
> ENOTSUP);
> >
> > +	if (dev->data->rx_queue_state[rx_queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> 
> May I suggest to add helper static function to check
> if device Rx queue is hairpin. Plus similar function for Tx.
> It will allow to make changes in rx_queue_state less
> intrusive.
> These functions should be used everywhere below in
> the code.
> 

Agree, will change.

> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> 
> The message is the same for many cases and it is bad since it does
> not allow to identify place where it was logged easily.
> It should be mentioned here that it is an attempt to start Rx queue.
> 

O.K will change.

> > +			rx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->rx_queue_state[rx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already started\n",
> > @@ -931,6 +939,14 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -
> ENOTSUP);
> >
> > +	if (dev->data->rx_queue_state[rx_queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> > +			rx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->rx_queue_state[rx_queue_id] ==
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already stopped\n",
> > @@ -964,6 +980,14 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -
> ENOTSUP);
> >
> > +	if (dev->data->tx_queue_state[tx_queue_id] ==
> > +	   RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> > +			tx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->tx_queue_state[tx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already started\n",
> > @@ -989,6 +1013,14 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -
> ENOTSUP);
> >
> > +	if (dev->data->tx_queue_state[tx_queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> > +			tx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->tx_queue_state[tx_queue_id] ==
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already stopped\n",
> > @@ -1758,6 +1790,81 @@ struct rte_eth_dev *
> >   }
> >
> >   int
> > +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > +			       uint16_t nb_rx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	int ret;
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	void **rxq;
> > +	int i;
> > +	int count = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0)
> > +		nb_rx_desc = cap.max_nb_desc;
> > +	if (nb_rx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_rx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_rx_desc, cap.max_nb_desc);
> 
> Please, don't split format string
> 

O.K. will fix.

> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n > cap.max_rx_2_tx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> 
> Please, don't split format string.
> Also make the message unique. Right now it is same for Rx and Tx.
>

O.K. will fix.
 
> > +			       cap.max_rx_2_tx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> 
> Please, don't split format string
> Also make the message unique. Right now it is same for Rx and Tx.
> 

O.K. will fix.

> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != UINT16_MAX) {
> > +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > +			if (dev->data->rx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	if (dev->data->dev_started)
> > +		return -EBUSY;
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> > +						      nb_rx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->rx_queue_state[rx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> >   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >   		       uint16_t nb_tx_desc, unsigned int socket_id,
> >   		       const struct rte_eth_txconf *tx_conf)
> > @@ -1856,6 +1963,80 @@ struct rte_eth_dev *
> >   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >   }
> >
> > +int
> > +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> > +			       uint16_t nb_tx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	void **txq;
> > +	int i;
> > +	int count = 0;
> > +	int ret;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +	dev = &rte_eth_devices[port_id];
> > +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> tx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	/* Use default specified by driver, if nb_tx_desc is zero */
> > +	if (nb_tx_desc == 0)
> > +		nb_tx_desc = cap.max_nb_desc;
> > +	if (nb_tx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for nb_tx_desc(=%hu), should be: "
> > +			       "<= %hu", nb_tx_desc, cap.max_nb_desc);
> 
> Please, don't split format string
>

 
O.K. will fix.

> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n > cap.max_tx_2_rx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: <= %hu", conf->peer_n,
> 
> Please, don't split format string
> 

O.K. will fix.

> > +			       cap.max_tx_2_rx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_n == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "Invalid value for number of peers(=%hu), "
> > +			       "should be: > 0", conf->peer_n);
> 
> Please, don't split format string
> 

O.K. will fix.

> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_n_queues != UINT16_MAX) {
> > +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +			if (dev->data->tx_queue_state[i] ==
> > +			    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +				count++;
> > +		}
> > +		if (count > cap.max_n_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Rx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	if (dev->data->dev_started)
> > +		return -EBUSY;
> > +	txq = dev->data->tx_queues;
> > +	if (txq[tx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > +		txq[tx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> > +		(dev, tx_queue_id, nb_tx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->tx_queue_state[tx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> >   void
> >   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
> >   		void *userdata __rte_unused)
> > @@ -3981,12 +4162,20 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
> >   		rte_errno = EINVAL;
> >   		return NULL;
> >   	}
> > +	dev = &rte_eth_devices[port_id];
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4058,6 +4247,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id,
> uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> > @@ -4065,6 +4256,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return NULL;
> >   	}
> >
> > +	dev = &rte_eth_devices[port_id];
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4180,6 +4378,14 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -
> ENOTSUP);
> >
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> 
> I think it would be useful if log mentions what caller tried to do.
> See note about log messages uniqueness, e.g.
> "Cannot get RxQ info: port %"PRIu16" queue %"PRIu16" is hairpin"
> Also I think it is better to check it before rxq_info_get check
> mainly to put it nearby queue range check to group all
> queue ID checks together.
> 

O.K. will fix.

> > +			queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	memset(qinfo, 0, sizeof(*qinfo));
> >   	dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
> >   	return 0;
> > @@ -4202,6 +4408,14 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return -EINVAL;
> >   	}
> >
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Queue %"PRIu16" of device with port_id=%"PRIu16" is
> hairpin queue\n",
> 
> Same as above.
> 

O.K. will fix.

> > +			queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -
> ENOTSUP);
> >
> >   	memset(qinfo, 0, sizeof(*qinfo));
> > @@ -4510,6 +4724,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   }
> >
> >   int
> > +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > +				   struct rte_eth_hairpin_cap *cap)
> > +{
> > +	struct rte_eth_dev *dev;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> > +				-ENOTSUP);
> > +	memset(cap, 0, sizeof(*cap));
> > +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
> > +}
> > +
> > +int
> >   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >   {
> >   	struct rte_eth_dev *dev;
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index 187a2bb..276f55f 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -804,6 +804,46 @@ struct rte_eth_txconf {
> >   };
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to return the hairpin capabilities that are supported.
> > + */
> > +struct rte_eth_hairpin_cap {
> > +	uint16_t max_n_queues;
> > +	/**< The max number of hairpin queues (different bindings). */
> > +	uint16_t max_rx_2_tx;
> > +	/**< Max number of Rx queues to be connected to one Tx queue. */
> > +	uint16_t max_tx_2_rx;
> > +	/**< Max number of Tx queues to be connected to one Rx queue. */
> > +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> > +};
> > +
> > +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to hold hairpin peer data.
> > + */
> > +struct rte_eth_hairpin_peer {
> > +	uint16_t port; /**< Peer port. */
> > +	uint16_t queue; /**< Peer queue. */
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to configure hairpin binding.
> > + */
> > +struct rte_eth_hairpin_conf {
> > +	uint16_t peer_n; /**< The number of peers. */
> > +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> > +};
> > +
> > +/**
> >    * A structure contains information about HW descriptor ring limitations.
> >    */
> >   struct rte_eth_desc_lim {
> > @@ -1765,6 +1805,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		struct rte_mempool *mb_pool);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a hairpin receive queue for an Ethernet device.
> > + *
> > + * The function set up the selected queue to be used in hairpin.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param rx_queue_id
> > + *   The index of the receive queue to set up.
> > + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_rx_desc
> > + *   The number of receive descriptors to allocate for the receive ring.
> > + *   0 means the PMD will use default value.
> > + * @param conf
> > + *   The pointer to the hairpin configuration.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-ENOMEM) if unable to allocate the resources.
> > + */
> > +__rte_experimental
> > +int rte_eth_rx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Allocate and set up a transmit queue for an Ethernet device.
> >    *
> >    * @param port_id
> > @@ -1817,6 +1888,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >   		const struct rte_eth_txconf *tx_conf);
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Allocate and set up a transmit hairpin queue for an Ethernet device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param tx_queue_id
> > + *   The index of the transmit queue to set up.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param nb_tx_desc
> > + *   The number of transmit descriptors to allocate for the transmit ring.
> > + *   0 to set default PMD value.
> > + * @param conf
> > + *   The hairpin configuration.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - (-EINVAL) if bad parameter.
> > + *   - (-ENOMEM) if unable to allocate the resources.
> > + */
> > +__rte_experimental
> > +int rte_eth_tx_hairpin_queue_setup
> > +	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
> > +	 const struct rte_eth_hairpin_conf *conf);
> > +
> > +/**
> >    * Return the NUMA socket to which an Ethernet device is connected
> >    *
> >    * @param port_id
> > @@ -1851,7 +1951,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >    *   to rte_eth_dev_configure().
> >    * @return
> >    *   - 0: Success, the receive queue is started.
> > - *   - -EINVAL: The port_id or the queue_id out of range.
> > + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
> >    *   - -EIO: if device is removed.
> >    *   - -ENOTSUP: The function not supported in PMD driver.
> >    */
> > @@ -1868,7 +1968,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >    *   to rte_eth_dev_configure().
> >    * @return
> >    *   - 0: Success, the receive queue is stopped.
> > - *   - -EINVAL: The port_id or the queue_id out of range.
> > + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
> >    *   - -EIO: if device is removed.
> >    *   - -ENOTSUP: The function not supported in PMD driver.
> >    */
> > @@ -1886,7 +1986,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >    *   to rte_eth_dev_configure().
> >    * @return
> >    *   - 0: Success, the transmit queue is started.
> > - *   - -EINVAL: The port_id or the queue_id out of range.
> > + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
> >    *   - -EIO: if device is removed.
> >    *   - -ENOTSUP: The function not supported in PMD driver.
> >    */
> > @@ -1903,7 +2003,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >    *   to rte_eth_dev_configure().
> >    * @return
> >    *   - 0: Success, the transmit queue is stopped.
> > - *   - -EINVAL: The port_id or the queue_id out of range.
> > + *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
> >    *   - -EIO: if device is removed.
> >    *   - -ENOTSUP: The function not supported in PMD driver.
> >    */
> > @@ -3569,7 +3669,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id,
> uint16_t queue_id,
> >    * @return
> >    *   - 0: Success
> >    *   - -ENOTSUP: routine is not supported by the device PMD.
> > - *   - -EINVAL:  The port_id or the queue_id is out of range.
> > + *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
> > + *               is hairpin queue.
> >    */
> >   int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
> >   	struct rte_eth_rxq_info *qinfo);
> > @@ -3589,7 +3690,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id,
> uint16_t queue_id,
> >    * @return
> >    *   - 0: Success
> >    *   - -ENOTSUP: routine is not supported by the device PMD.
> > - *   - -EINVAL:  The port_id or the queue_id is out of range.
> > + *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
> > + *               is hairpin queue.
> >    */
> >   int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
> >   	struct rte_eth_txq_info *qinfo);
> > @@ -4031,6 +4133,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> port_id,
> >   void *
> >   rte_eth_dev_get_sec_ctx(uint16_t port_id);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Query the device hairpin capabilities.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param cap
> > + *   Pointer to a structure that will hold the hairpin capabilities.
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > +				       struct rte_eth_hairpin_cap *cap);
> >
> >   #include <rte_ethdev_core.h>
> >
> > @@ -4131,6 +4250,12 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> port_id,
> >   		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> queue_id);
> >   		return 0;
> >   	}
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(ERR, "RX queue_id=%u is hairpin queue\n",
> 
> I see but these log messages are very similar to above, but I still think
> it would be useful to mention context to make it clear in log, e.g.
> "Cannot Rx from hairpin queue%"PRIu16" at port %"PRIu16
> 

O.K. will fix.

> > +			       queue_id);
> > +		return 0;
> > +	}
> >   #endif
> >   	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> >   				     rx_pkts, nb_pkts);
> > @@ -4397,6 +4522,12 @@ static inline int
> rte_eth_tx_descriptor_status(uint16_t port_id,
> >   		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> queue_id);
> >   		return 0;
> >   	}
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN) {
> > +		RTE_ETHDEV_LOG(ERR, "TX queue_id=%u is hairpin queue\n",
> > +			       queue_id);
> 
> I think it would be useful to mention context to make it clear in log, e.g.
> "Cannot Tx to hairpin queue%"PRIu16" at port %"PRIu16
> 

O.K. will fix.

> > +		return 0;
> > +	}
> >   #endif
> >
> >   #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> 
> [snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
  2019-10-17 21:01     ` Thomas Monjalon
  2019-10-22 11:37     ` Andrew Rybchenko
@ 2019-10-23  7:04     ` Thomas Monjalon
  2019-10-23 10:09       ` Ori Kam
  2 siblings, 1 reply; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-23  7:04 UTC (permalink / raw)
  To: Ori Kam; +Cc: dev, Ferruh Yigit, Andrew Rybchenko, jingjing.wu, stephen

17/10/2019 17:32, Ori Kam:
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
>  /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to return the hairpin capabilities that are supported.
> + */
> +struct rte_eth_hairpin_cap {
> +	uint16_t max_n_queues;
> +	/**< The max number of hairpin queues (different bindings). */
> +	uint16_t max_rx_2_tx;
> +	/**< Max number of Rx queues to be connected to one Tx queue. */
> +	uint16_t max_tx_2_rx;
> +	/**< Max number of Tx queues to be connected to one Rx queue. */
> +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> +};

I think you can switch to "comment-first style" for this struct.


> +#define RTE_ETH_MAX_HAIRPIN_PEERS 32

Usually I think such define is in the build config.
Any other opinion?


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to hold hairpin peer data.
> + */
> +struct rte_eth_hairpin_peer {
> +	uint16_t port; /**< Peer port. */
> +	uint16_t queue; /**< Peer queue. */
> +};

It may be the right place to give more words about what is a peer,
can we have multiple peers, etc.


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure hairpin binding.
> + */
> +struct rte_eth_hairpin_conf {
> +	uint16_t peer_n; /**< The number of peers. */

In general, I don't like one-letter abbreviations.
Is peer_count better?

> +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> +};




^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-23  7:04     ` Thomas Monjalon
@ 2019-10-23 10:09       ` Ori Kam
  2019-10-23 10:18         ` Bruce Richardson
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-23 10:09 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ferruh Yigit, Andrew Rybchenko, jingjing.wu, stephen

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, October 23, 2019 10:05 AM
> To: Ori Kam <orika@mellanox.com>
> Cc: dev@dpdk.org; Ferruh Yigit <ferruh.yigit@intel.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
> 
> 17/10/2019 17:32, Ori Kam:
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> >  /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to return the hairpin capabilities that are supported.
> > + */
> > +struct rte_eth_hairpin_cap {
> > +	uint16_t max_n_queues;
> > +	/**< The max number of hairpin queues (different bindings). */
> > +	uint16_t max_rx_2_tx;
> > +	/**< Max number of Rx queues to be connected to one Tx queue. */
> > +	uint16_t max_tx_2_rx;
> > +	/**< Max number of Tx queues to be connected to one Rx queue. */
> > +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> > +};
> 
> I think you can switch to "comment-first style" for this struct.
> 

O.K I will change.

> 
> > +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> 
> Usually I think such define is in the build config.
> Any other opinion?
> 

I need to check. But if you don't mind let's keep it this way, and modify it 
later after we see how other manufactures will add hairpin.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to hold hairpin peer data.
> > + */
> > +struct rte_eth_hairpin_peer {
> > +	uint16_t port; /**< Peer port. */
> > +	uint16_t queue; /**< Peer queue. */
> > +};
> 
> It may be the right place to give more words about what is a peer,
> can we have multiple peers, etc.
> 


I'm not sure what I can say to make it clearer but I will try.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to configure hairpin binding.
> > + */
> > +struct rte_eth_hairpin_conf {
> > +	uint16_t peer_n; /**< The number of peers. */
> 
> In general, I don't like one-letter abbreviations.
> Is peer_count better?
> 

O.K. I will change to count.

> > +	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> > +};
> 
> 


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
  2019-10-23 10:09       ` Ori Kam
@ 2019-10-23 10:18         ` Bruce Richardson
  0 siblings, 0 replies; 186+ messages in thread
From: Bruce Richardson @ 2019-10-23 10:18 UTC (permalink / raw)
  To: Ori Kam
  Cc: Thomas Monjalon, dev, Ferruh Yigit, Andrew Rybchenko,
	jingjing.wu, stephen

On Wed, Oct 23, 2019 at 10:09:45AM +0000, Ori Kam wrote:
> Hi Thomas,
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Wednesday, October 23, 2019 10:05 AM
> > To: Ori Kam <orika@mellanox.com>
> > Cc: dev@dpdk.org; Ferruh Yigit <ferruh.yigit@intel.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>; jingjing.wu@intel.com;
> > stephen@networkplumber.org
> > Subject: Re: [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue
> > 
> > 17/10/2019 17:32, Ori Kam:
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > >  /**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > notice
> > > + *
> > > + * A structure used to return the hairpin capabilities that are supported.
> > > + */
> > > +struct rte_eth_hairpin_cap {
> > > +	uint16_t max_n_queues;
> > > +	/**< The max number of hairpin queues (different bindings). */
> > > +	uint16_t max_rx_2_tx;
> > > +	/**< Max number of Rx queues to be connected to one Tx queue. */
> > > +	uint16_t max_tx_2_rx;
> > > +	/**< Max number of Tx queues to be connected to one Rx queue. */
> > > +	uint16_t max_nb_desc; /**< The max num of descriptors. */
> > > +};
> > 
> > I think you can switch to "comment-first style" for this struct.
> > 
> 
> O.K I will change.
> 
> > 
> > > +#define RTE_ETH_MAX_HAIRPIN_PEERS 32
> > 
> > Usually I think such define is in the build config.
> > Any other opinion?
> > 
+1 for not moving it to the build config unless absolutely necessary.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 00/15] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (16 preceding siblings ...)
  2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
@ 2019-10-23 13:37 ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 01/15] ethdev: move queue state defines to private file Ori Kam
                     ` (15 more replies)
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
  19 siblings, 16 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.



[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V5:
 - modify log messages to be more distinct.
 - set that log message will be in the same line even if > 80.
 - change peer_n to peer_count.
 - add functions to get if queue is hairpin queue.

V4:
 - update according to comments from ML.

V3:
 - update according to comments from ML.

V2:
 - update according to comments from ML.

Ori Kam (15):
  ethdev: move queue state defines to private file
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows
  doc: add hairpin feature

 app/test-pmd/parameters.c                |  28 +++
 app/test-pmd/testpmd.c                   | 109 ++++++++-
 app/test-pmd/testpmd.h                   |   3 +
 doc/guides/rel_notes/release_19_11.rst   |   7 +
 drivers/net/mlx5/mlx5.c                  | 170 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 129 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  67 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 lib/librte_ethdev/rte_ethdev.c           | 217 +++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 147 +++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |  50 ++++
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 24 files changed, 2702 insertions(+), 167 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 01/15] ethdev: move queue state defines to private file
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue Ori Kam
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 lib/librte_ethdev/rte_ethdev.h        | 6 ------
 lib/librte_ethdev/rte_ethdev_driver.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 33c528b..09c611a 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1336,12 +1336,6 @@ struct rte_eth_dcb_info {
 	struct rte_eth_dcb_tc_queue_mapping tc_queue;
 };
 
-/**
- * RX/TX queue states
- */
-#define RTE_ETH_QUEUE_STATE_STOPPED 0
-#define RTE_ETH_QUEUE_STATE_STARTED 1
-
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
 /* Macros to check for valid port */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 936ff8c..c404f17 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -22,6 +22,12 @@
 #endif
 
 /**
+ * RX/TX queue states
+ */
+#define RTE_ETH_QUEUE_STATE_STOPPED 0
+#define RTE_ETH_QUEUE_STATE_STARTED 1
+
+/**
  * @internal
  * Returns a ethdev slot specified by the unique identifier name.
  *
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 01/15] ethdev: move queue state defines to private file Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-24  7:54     ` Andrew Rybchenko
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
---
V5:
 - add function to check if queue is hairpin queue.
 - modify log messages to be more distinct.
 - update log messages to be only on one line.
 - change peer_n to peer_count.

V4:
 - update according to ML comments.

V3:
 - update according to ML comments.

V2:
 - update according to ML comments.

---
 lib/librte_ethdev/rte_ethdev.c           | 217 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 141 +++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |  44 +++++++
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 5 files changed, 488 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 78da293..199e96e 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -923,6 +923,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -950,6 +957,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -983,6 +997,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -1008,6 +1029,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -1780,6 +1808,79 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **rxq;
+	int i;
+	int count = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu",
+			nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	if (cap.max_nb_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			if (rte_eth_dev_is_rx_hairpin_queue(dev, i) == 1)
+				count++;
+		}
+		if (count > cap.max_nb_queues) {
+			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues %d",
+			count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (ret == 0)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1878,6 +1979,78 @@ struct rte_eth_dev *
 		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **txq;
+	int i;
+	int count = 0;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_tx_desc(=%hu), should be: <= %hu",
+			nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	if (cap.max_nb_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			if (rte_eth_dev_is_tx_hairpin_queue(dev, i) == 1)
+				count++;
+		}
+		if (count > cap.max_nb_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Tx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (ret == 0)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
@@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id) == 1) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id) == 1) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get queue info for Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id) == 1) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get queue info for Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	memset(cap, 0, sizeof(*cap));
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 09c611a..24b7a3c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -839,6 +839,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supported.
+ */
+struct rte_eth_hairpin_cap {
+	/** The max number of hairpin queues (different bindings). */
+	uint16_t max_nb_queues;
+	/**< Max number of Rx queues to be connected to one Tx queue. */
+	uint16_t max_rx_2_tx;
+	/**< Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_tx_2_rx;
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_count; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1829,6 +1869,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ *   0 means the PMD will use default value.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1881,6 +1952,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ *   0 to set default PMD value.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -1915,7 +2015,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1932,7 +2032,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1950,7 +2050,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1967,7 +2067,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -3633,7 +3733,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
@@ -3653,7 +3754,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
@@ -4151,6 +4253,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
@@ -4251,6 +4370,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id) == 1) {
+		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4517,6 +4641,11 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id) == 1) {
+		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 392aea8..f215af7 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -509,6 +509,86 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 /**< @internal Test if a port supports specific mempool ops */
 
 /**
+ * @internal
+ * Get the hairpin capabilities.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param cap
+ *   returns the hairpin capabilities from the device.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ */
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+
+/**
+ * @internal
+ * Setup RX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_queue_id
+ *   the selected RX queue index.
+ * @param nb_rx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the RX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
+ * @internal
+ * Setup TX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param tx_queue_id
+ *   the selected TX queue index.
+ * @param nb_tx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the TX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
 struct eth_dev_ops {
@@ -644,6 +724,13 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 };
 
 /**
@@ -751,9 +838,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c404f17..98023d7 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -26,6 +26,50 @@
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
+
+/**
+ * @internal
+ * Check if the selected Rx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+static inline int
+rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * @internal
+ * Check if the selected Tx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+static inline int
+rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
 
 /**
  * @internal
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index e59d516..48b5389 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -288,4 +288,7 @@ EXPERIMENTAL {
 	rte_eth_rx_burst_mode_get;
 	rte_eth_tx_burst_mode_get;
 	rte_eth_burst_mode_option_name;
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 03/15] net/mlx5: query hca hairpin capabilities
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 01/15] ethdev: move queue state defines to private file Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 04/15] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b6a51b2..ee04dd0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -187,6 +187,10 @@ struct mlx5_hca_attr {
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
 	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 51947d3..17c1671 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 04/15] net/mlx5: support Rx hairpin queues
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 05/15] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fac5105..6be423f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -985,6 +985,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1051,6 +1052,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f0ab843..c70e161 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 05/15] net/mlx5: prepare txq to work with different types
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 04/15] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 06/15] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6be423f..8d1595c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -922,7 +922,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ee04dd0..3afb4cc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -650,7 +650,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5ec2b48..f597c89 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 53d45e7..a6e2563 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -1127,7 +1127,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1153,8 +1153,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 06/15] net/mlx5: support Tx hairpin queues
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 05/15] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 07/15] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |  36 +++++-
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 620 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8d1595c..49b1e82 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -324,6 +324,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -390,10 +393,25 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-	err = mlx5_get_pdn(sh->pd, &sh->pdn);
-	if (err) {
-		DRV_LOG(ERR, "Fail to extract pdn from PD");
-		goto error;
+	if (sh->devx) {
+		err = mlx5_get_pdn(sh->pd, &sh->pdn);
+		if (err) {
+			DRV_LOG(ERR, "Fail to extract pdn from PD");
+			goto error;
+		}
+		sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+		if (!sh->td) {
+			DRV_LOG(ERR, "TD allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
+		tis_attr.transport_domain = sh->td->id;
+		sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+		if (!sh->tis) {
+			DRV_LOG(ERR, "TIS allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
@@ -426,6 +444,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -495,6 +517,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -987,6 +1013,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1054,6 +1081,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3afb4cc..566bf2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -353,6 +353,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -596,6 +633,8 @@ struct mlx5_ibv_shared {
 	uint32_t devx_intr_cnt; /* Devx interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -918,5 +957,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 17c1671..a501f1f 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -717,3 +717,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c86f8b8..c687cfb 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -671,9 +671,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1328,6 +1332,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1444,6 +1465,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1589,6 +1628,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a6e2563..dfc379c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -1100,6 +1259,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -1108,6 +1268,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 07/15] net/mlx5: add get hairpin capabilities
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 06/15] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 08/15] app/testpmd: add hairpin support Ori Kam
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 ++
 drivers/net/mlx5/mlx5.h        |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c | 27 +++++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 49b1e82..b0fdd9b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1039,6 +1039,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1101,6 +1102,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 566bf2d..742bedd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -789,7 +789,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2278b24..fe1b4d4 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -2114,3 +2114,30 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->devx == 0) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	cap->max_nb_queues = UINT16_MAX;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 08/15] app/testpmd: add hairpin support
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 07/15] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 09/15] net/mlx5: add hairpin binding function Ori Kam
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/parameters.c |  28 ++++++++++++
 app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |   3 ++
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9ea87c1..9b6e35b 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -149,6 +149,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -622,6 +624,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1045,6 +1048,31 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 &&
+				    check_nb_hairpinq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_hairpinq
+						  (&pid));
+				if ((n + nb_txq) < 0 ||
+				    check_nb_txq((queueid_t)(n + nb_txq)) != 0)
+					rte_exit(EXIT_FAILURE, "txq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_txq,
+						  get_allowed_max_nb_txq(&pid));
+				if ((n + nb_rxq) < 0 ||
+				    check_nb_rxq((queueid_t)(n + nb_rxq)) != 0)
+					rte_exit(EXIT_FAILURE, "rxq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_rxq,
+						  get_allowed_max_nb_rxq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f31..fec946f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -1103,6 +1104,53 @@ struct extmem_param {
 	return 0;
 }
 
+/*
+ * Get the allowed maximum number of hairpin queues.
+ * *pid return the port id which has minimal value of
+ * max_hairpin_queues in all ports.
+ */
+queueid_t
+get_allowed_max_nb_hairpinq(portid_t *pid)
+{
+	queueid_t allowed_max_hairpinq = MAX_QUEUE_ID;
+	portid_t pi;
+	struct rte_eth_hairpin_cap cap;
+
+	RTE_ETH_FOREACH_DEV(pi) {
+		if (rte_eth_dev_hairpin_capability_get(pi, &cap) != 0) {
+			*pid = pi;
+			return 0;
+		}
+		if (cap.max_nb_queues < allowed_max_hairpinq) {
+			allowed_max_hairpinq = cap.max_nb_queues;
+			*pid = pi;
+		}
+	}
+	return allowed_max_hairpinq;
+}
+
+/*
+ * Check input hairpin is valid or not.
+ * If input hairpin is not greater than any of maximum number
+ * of hairpin queues of all ports, it is valid.
+ * if valid, return 0, else return -1
+ */
+int
+check_nb_hairpinq(queueid_t hairpinq)
+{
+	queueid_t allowed_max_hairpinq;
+	portid_t pid = 0;
+
+	allowed_max_hairpinq = get_allowed_max_nb_hairpinq(&pid);
+	if (hairpinq > allowed_max_hairpinq) {
+		printf("Fail: input hairpin (%u) can't be greater "
+		       "than max_hairpin_queues (%u) of port %u\n",
+		       hairpinq, allowed_max_hairpinq, pid);
+		return -1;
+	}
+	return 0;
+}
+
 static void
 init_config(void)
 {
@@ -2064,6 +2112,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_count = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2096,9 +2149,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2251,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 540bf82..1e94cf6 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
@@ -855,6 +856,8 @@ enum print_warning {
 int check_nb_rxq(queueid_t rxq);
 queueid_t get_allowed_max_nb_txq(portid_t *pid);
 int check_nb_txq(queueid_t txq);
+queueid_t get_allowed_max_nb_hairpinq(portid_t *pid);
+int check_nb_hairpinq(queueid_t hairpinq);
 
 uint16_t dump_rx_pkts(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		      uint16_t nb_pkts, __rte_unused uint16_t max_pkts,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 09/15] net/mlx5: add hairpin binding function
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 08/15] app/testpmd: add hairpin support Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 742bedd..33cfc5b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -191,6 +191,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index a501f1f..3471a9b 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c687cfb..e4b19f8 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1628,6 +1628,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 10/15] net/mlx5: add support for hairpin hrxq
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 09/15] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 11/15] net/mlx5: add internal tag item and action Ori Kam
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 33cfc5b..a36ba2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -716,6 +716,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -792,6 +793,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index fe1b4d4..c2bed2f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c70e161..2c3d5eb 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint32_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 11/15] net/mlx5: add internal tag item and action
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 12/15] net/mlx5: add id generation function Ori Kam
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  48 +++++++++++-
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 254 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d4d956f..a309b6f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9658db1..a79b48b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,11 +90,12 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
 #define MLX5_FLOW_LAYER_IPIP (1u << 21)
@@ -141,6 +179,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -174,7 +213,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b1aa427..a671952 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -723,6 +723,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4719,6 +4772,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5304,8 +5445,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5620,6 +5762,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5644,8 +5792,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5796,6 +5945,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index e4b19f8..96b9166 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -628,7 +628,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 12/15] net/mlx5: add id generation function
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 11/15] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 13/15] net/mlx5: add default flows for hairpin Ori Kam
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b0fdd9b..b7a98b8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -178,6 +178,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -328,7 +446,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a79b48b..fddc06b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -527,8 +527,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 13/15] net/mlx5: add default flows for hairpin
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 12/15] net/mlx5: add id generation function Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 14/15] net/mlx5: split hairpin flows Ori Kam
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a36ba2d..1181c1f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -560,6 +560,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -883,6 +884,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a309b6f..1148db0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2820,6 +2820,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index fddc06b..f81e1b1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -103,6 +109,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_NVGRE (1u << 23)
 #define MLX5_FLOW_LAYER_GENEVE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a671952..4d881bb 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3357,7 +3357,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3526,6 +3528,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3534,11 +3539,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3804,6 +3810,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5370,6 +5378,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5950,6 +6003,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 14/15] net/mlx5: split hairpin flows
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 13/15] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature Ori Kam
  2019-10-25 18:49   ` [dpdk-dev] [PATCH v5 00/15] " Ferruh Yigit
  15 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b7a98b8..b622339 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -531,6 +531,12 @@ struct mlx5_flow_id_pool *
 			goto error;
 		}
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -570,6 +576,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -641,6 +649,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1181c1f..f644998 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -578,6 +578,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -637,6 +646,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1148db0..5f01f9c 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2527,6 +2527,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2554,6 +2758,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2561,16 +2766,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2579,11 +2812,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2604,7 +2839,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2612,7 +2847,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2624,8 +2876,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2715,12 +2975,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f81e1b1..7559810 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -466,6 +466,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -487,12 +489,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -536,15 +537,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 4d881bb..691420c 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5842,7 +5842,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -6069,7 +6069,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6084,7 +6084,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6140,7 +6140,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6374,7 +6374,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2c3d5eb..24d0eaa 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (13 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 14/15] net/mlx5: split hairpin flows Ori Kam
@ 2019-10-23 13:37   ` Ori Kam
  2019-10-24  8:11     ` Thomas Monjalon
  2019-10-25 18:49   ` [dpdk-dev] [PATCH v5 00/15] " Ferruh Yigit
  15 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-23 13:37 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, orika, jingjing.wu, stephen

This commit adds the hairpin feature to the release notes.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/rel_notes/release_19_11.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index d96d6fd..133ff86 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -138,6 +138,7 @@ New Features
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
   * Added support for matching on packets withe Geneve tunnel header.
+  * Added hairpin support.
 
 * **Updated the AF_XDP PMD.**
 
@@ -184,6 +185,11 @@ New Features
   * Added a console command to testpmd app, ``show port (port_id) ptypes`` which
     gives ability to print port supported ptypes in different protocol layers.
 
+* **Added hairpin queue.**
+
+  On supported NICs, we can now setup haipin queue which will offload packets from the wire,
+  back to the wire.
+
 
 Removed Items
 -------------
@@ -377,3 +383,4 @@ Tested Platforms
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-24  7:54     ` Andrew Rybchenko
  2019-10-24  8:29       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-24  7:54 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Ori,

thanks for review notes applied. Please, see below.

On 10/23/19 4:37 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>

Just a bit below
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

[snip]

> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 78da293..199e96e 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -923,6 +923,13 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
>   
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id) == 1) {

I think the function should return bool and it results should be
used as is without == 1 or something like this.
Everyrwhere in the patch.

[snip]

> diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
> index c404f17..98023d7 100644
> --- a/lib/librte_ethdev/rte_ethdev_driver.h
> +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> @@ -26,6 +26,50 @@
>    */
>   #define RTE_ETH_QUEUE_STATE_STOPPED 0
>   #define RTE_ETH_QUEUE_STATE_STARTED 1
> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
> +
> +/**
> + * @internal
> + * Check if the selected Rx queue is hairpin queue.
> + *
> + * @param dev
> + *  Pointer to the selected device.
> + * @param queue_id
> + *  The selected queue.
> + *
> + * @return
> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> + */
> +static inline int

I think the function should return bool (and stdbool.h should be included).
Return value description should be updated.

> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
> +{
> +	if (dev->data->rx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +		return 1;
> +	return 0;
> +}
> +
> +
> +/**
> + * @internal
> + * Check if the selected Tx queue is hairpin queue.
> + *
> + * @param dev
> + *  Pointer to the selected device.
> + * @param queue_id
> + *  The selected queue.
> + *
> + * @return
> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> + */
> +static inline int

Same here.

> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
> +{
> +	if (dev->data->tx_queue_state[queue_id] ==
> +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
> +		return 1;
> +	return 0;
> +}
>   
>   /**
>    * @internal

[snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature Ori Kam
@ 2019-10-24  8:11     ` Thomas Monjalon
  0 siblings, 0 replies; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-24  8:11 UTC (permalink / raw)
  To: Ori Kam; +Cc: dev, John McNamara, Marko Kovacevic, jingjing.wu, stephen

Forgot to comment on this patch.
Ferruh, please can you address when merging?

23/10/2019 15:37, Ori Kam:
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -138,6 +138,7 @@ New Features
>    * Added support for VLAN set PCP offload command.
>    * Added support for VLAN set VID offload command.
>    * Added support for matching on packets withe Geneve tunnel header.
> +  * Added hairpin support.

It could be merged in a mlx5 patch, maybe patch 7 about capabilities.

> @@ -184,6 +185,11 @@ New Features
> +* **Added hairpin queue.**
> +
> +  On supported NICs, we can now setup haipin queue which will offload packets from the wire,
> +  back to the wire.

It should be merged in first patch (ethdev).



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24  7:54     ` Andrew Rybchenko
@ 2019-10-24  8:29       ` Ori Kam
  2019-10-24 14:47         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-24  8:29 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

Hi Andrew,

When writing the new function I thought about using bool, but 
I decided against it for the following reasons:
1. There is no use of bool any where in the code, and there is not special reason to add it now.
2. Other functions of this kind already returns int. for example (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)

Thanks,
Ori

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Thursday, October 24, 2019 10:55 AM
> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> thanks for review notes applied. Please, see below.
> 
> On 10/23/19 4:37 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> 
> Just a bit below
> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index 78da293..199e96e 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -923,6 +923,13 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
> ENOTSUP);
> >
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id) == 1) {
> 
> I think the function should return bool and it results should be
> used as is without == 1 or something like this.
> Everyrwhere in the patch.
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev_driver.h
> b/lib/librte_ethdev/rte_ethdev_driver.h
> > index c404f17..98023d7 100644
> > --- a/lib/librte_ethdev/rte_ethdev_driver.h
> > +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> > @@ -26,6 +26,50 @@
> >    */
> >   #define RTE_ETH_QUEUE_STATE_STOPPED 0
> >   #define RTE_ETH_QUEUE_STATE_STARTED 1
> > +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
> > +
> > +/**
> > + * @internal
> > + * Check if the selected Rx queue is hairpin queue.
> > + *
> > + * @param dev
> > + *  Pointer to the selected device.
> > + * @param queue_id
> > + *  The selected queue.
> > + *
> > + * @return
> > + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > + */
> > +static inline int
> 
> I think the function should return bool (and stdbool.h should be included).
> Return value description should be updated.
> 
> > +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> queue_id)
> > +{
> > +	if (dev->data->rx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +
> > +/**
> > + * @internal
> > + * Check if the selected Tx queue is hairpin queue.
> > + *
> > + * @param dev
> > + *  Pointer to the selected device.
> > + * @param queue_id
> > + *  The selected queue.
> > + *
> > + * @return
> > + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > + */
> > +static inline int
> 
> Same here.
> 
> > +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> queue_id)
> > +{
> > +	if (dev->data->tx_queue_state[queue_id] ==
> > +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
> > +		return 1;
> > +	return 0;
> > +}
> >
> >   /**
> >    * @internal
> 
> [snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24  8:29       ` Ori Kam
@ 2019-10-24 14:47         ` Andrew Rybchenko
  2019-10-24 15:17           ` Thomas Monjalon
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-24 14:47 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit; +Cc: dev, jingjing.wu, stephen

On 10/24/19 11:29 AM, Ori Kam wrote:
> Hi Andrew,
>
> When writing the new function I thought about using bool, but
> I decided against it for the following reasons:
> 1. There is no use of bool any where in the code, and there is not special reason to add it now.

rte_ethdev.c includes stdbool.h and uses bool

> 2. Other functions of this kind already returns int. for example (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)
>
> Thanks,
> Ori
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Thursday, October 24, 2019 10:55 AM
>> To: Ori Kam <orika@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
>> Subject: Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
>>
>> Hi Ori,
>>
>> thanks for review notes applied. Please, see below.
>>
>> On 10/23/19 4:37 PM, Ori Kam wrote:
>>> This commit introduce hairpin queue type.
>>>
>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>> It is used to offload traffic coming from the wire and redirect it back
>>> to the wire.
>>>
>>> There are 3 new functions:
>>> - rte_eth_dev_hairpin_capability_get
>>> - rte_eth_rx_hairpin_queue_setup
>>> - rte_eth_tx_hairpin_queue_setup
>>>
>>> In order to use the queue, there is a need to create rte_flow
>>> with queue / RSS action that targets one or more of the Rx queues.
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>> Just a bit below
>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>
>> [snip]
>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>>> index 78da293..199e96e 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -923,6 +923,13 @@ struct rte_eth_dev *
>>>
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
>> ENOTSUP);
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id) == 1) {
>> I think the function should return bool and it results should be
>> used as is without == 1 or something like this.
>> Everyrwhere in the patch.
>>
>> [snip]
>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev_driver.h
>> b/lib/librte_ethdev/rte_ethdev_driver.h
>>> index c404f17..98023d7 100644
>>> --- a/lib/librte_ethdev/rte_ethdev_driver.h
>>> +++ b/lib/librte_ethdev/rte_ethdev_driver.h
>>> @@ -26,6 +26,50 @@
>>>     */
>>>    #define RTE_ETH_QUEUE_STATE_STOPPED 0
>>>    #define RTE_ETH_QUEUE_STATE_STARTED 1
>>> +#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
>>> +
>>> +/**
>>> + * @internal
>>> + * Check if the selected Rx queue is hairpin queue.
>>> + *
>>> + * @param dev
>>> + *  Pointer to the selected device.
>>> + * @param queue_id
>>> + *  The selected queue.
>>> + *
>>> + * @return
>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>> + */
>>> +static inline int
>> I think the function should return bool (and stdbool.h should be included).
>> Return value description should be updated.
>>
>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>> queue_id)
>>> +{
>>> +	if (dev->data->rx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +		return 1;
>>> +	return 0;
>>> +}
>>> +
>>> +
>>> +/**
>>> + * @internal
>>> + * Check if the selected Tx queue is hairpin queue.
>>> + *
>>> + * @param dev
>>> + *  Pointer to the selected device.
>>> + * @param queue_id
>>> + *  The selected queue.
>>> + *
>>> + * @return
>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>> + */
>>> +static inline int
>> Same here.
>>
>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>> queue_id)
>>> +{
>>> +	if (dev->data->tx_queue_state[queue_id] ==
>>> +	    RTE_ETH_QUEUE_STATE_HAIRPIN)
>>> +		return 1;
>>> +	return 0;
>>> +}
>>>
>>>    /**
>>>     * @internal
>> [snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24 14:47         ` Andrew Rybchenko
@ 2019-10-24 15:17           ` Thomas Monjalon
  2019-10-24 15:30             ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-24 15:17 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Ori Kam, Ferruh Yigit, jingjing.wu, stephen

24/10/2019 16:47, Andrew Rybchenko:
> On 10/24/19 11:29 AM, Ori Kam wrote:
> > Hi Andrew,
> >
> > When writing the new function I thought about using bool, but
> > I decided against it for the following reasons:
> > 1. There is no use of bool any where in the code, and there is not special reason to add it now.
> 
> rte_ethdev.c includes stdbool.h and uses bool
> 
> > 2. Other functions of this kind already returns int. for example (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)

I agree with Ori here for 2 reasons:
1. It is better to be consistent in the API
2. I remember having some issues with some drivers when introducing stdbool in the API.

I think it may be nice to convert all such API to bool in one patch,
and check if there are some remaining issues with bool usage in drivers or with PPC.
But I suggest to do such API change in DPDK 20.11.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24 15:17           ` Thomas Monjalon
@ 2019-10-24 15:30             ` Andrew Rybchenko
  2019-10-24 15:34               ` Thomas Monjalon
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-24 15:30 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ori Kam, Ferruh Yigit, jingjing.wu, stephen

On 10/24/19 6:17 PM, Thomas Monjalon wrote:
> 24/10/2019 16:47, Andrew Rybchenko:
>> On 10/24/19 11:29 AM, Ori Kam wrote:
>>> Hi Andrew,
>>>
>>> When writing the new function I thought about using bool, but
>>> I decided against it for the following reasons:
>>> 1. There is no use of bool any where in the code, and there is not special reason to add it now.
>> rte_ethdev.c includes stdbool.h and uses bool
>>
>>> 2. Other functions of this kind already returns int. for example (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)
> I agree with Ori here for 2 reasons:
> 1. It is better to be consistent in the API
> 2. I remember having some issues with some drivers when introducing stdbool in the API.
>
> I think it may be nice to convert all such API to bool in one patch,
> and check if there are some remaining issues with bool usage in drivers or with PPC.
> But I suggest to do such API change in DPDK 20.11.

OK, no problem. Does it prevent to avoid comparison == 1? Just to
avoid changes in these lines in the future.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24 15:30             ` Andrew Rybchenko
@ 2019-10-24 15:34               ` Thomas Monjalon
  2019-10-25 19:01                 ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-24 15:34 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Ori Kam, Ferruh Yigit, jingjing.wu, stephen

24/10/2019 17:30, Andrew Rybchenko:
> On 10/24/19 6:17 PM, Thomas Monjalon wrote:
> > 24/10/2019 16:47, Andrew Rybchenko:
> >> On 10/24/19 11:29 AM, Ori Kam wrote:
> >>> Hi Andrew,
> >>>
> >>> When writing the new function I thought about using bool, but
> >>> I decided against it for the following reasons:
> >>> 1. There is no use of bool any where in the code, and there is not special reason to add it now.
> >> rte_ethdev.c includes stdbool.h and uses bool
> >>
> >>> 2. Other functions of this kind already returns int. for example (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)
> > I agree with Ori here for 2 reasons:
> > 1. It is better to be consistent in the API
> > 2. I remember having some issues with some drivers when introducing stdbool in the API.
> >
> > I think it may be nice to convert all such API to bool in one patch,
> > and check if there are some remaining issues with bool usage in drivers or with PPC.
> > But I suggest to do such API change in DPDK 20.11.
> 
> OK, no problem. Does it prevent to avoid comparison == 1? Just to
> avoid changes in these lines in the future.

Yes probably better to avoid explicit comparison, but prefer boolean operator (!).




^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 00/15] add hairpin feature
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
                     ` (14 preceding siblings ...)
  2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature Ori Kam
@ 2019-10-25 18:49   ` Ferruh Yigit
  15 siblings, 0 replies; 186+ messages in thread
From: Ferruh Yigit @ 2019-10-25 18:49 UTC (permalink / raw)
  To: Ori Kam
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger, thomas,
	arybchenko, viacheslavo, Rastislav Cernay, Jan Remes

On 10/23/2019 2:37 PM, Ori Kam wrote:
> This patch set implements the hairpin feature.
> The hairpin feature was introduced in RFC[1]
> 
> The hairpin feature (different name can be forward) acts as "bump on the wire",
> meaning that a packet that is received from the wire can be modified using
> offloaded action and then sent back to the wire without application intervention
> which save CPU cycles.
> 
> The hairpin is the inverse function of loopback in which application
> sends a packet then it is received again by the
> application without being sent to the wire.
> 
> The hairpin can be used by a number of different NVF, for example load
> balancer, gateway and so on.
> 
> As can be seen from the hairpin description, hairpin is basically RX queue
> connected to TX queue.
> 
> During the design phase I was thinking of two ways to implement this
> feature the first one is adding a new rte flow action. and the second
> one is create a special kind of queue.
> 
> The advantages of using the queue approch:
> 1. More control for the application. queue depth (the memory size that
> should be used).
> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> will be easy to integrate with such system.
> 3. Native integression with the rte flow API. Just setting the target
> queue/rss to hairpin queue, will result that the traffic will be routed
> to the hairpin queue.
> 4. Enable queue offloading.
> 
> Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
> different ports assuming the PMD supports it. The same goes the other
> way each hairpin Txq can be connected to one or more Rxqs.
> This is the reason that both the Txq setup and Rxq setup are getting the
> hairpin configuration structure.
> 
> From PMD prespctive the number of Rxq/Txq is the total of standard
> queues + hairpin queues.
> 
> To configure hairpin queue the user should call
> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
> of the normal queue setup functions.
> 
> The hairpin queues are not part of the normal RSS functiosn.
> 
> To use the queues the user simply create a flow that points to RSS/queue
> actions that are hairpin queues.
> The reason for selecting 2 new functions for hairpin queue setup are:
> 1. avoid API break.
> 2. avoid extra and unused parameters.
> 
> 
> 
> [1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
> 
> Cc: wenzhuo.lu@intel.com
> Cc: bernard.iremonger@intel.com
> Cc: thomas@monjalon.net
> Cc: ferruh.yigit@intel.com
> Cc: arybchenko@solarflare.com
> Cc: viacheslavo@mellanox.com
> 
> ------
> V5:
>  - modify log messages to be more distinct.
>  - set that log message will be in the same line even if > 80.
>  - change peer_n to peer_count.
>  - add functions to get if queue is hairpin queue.
> 
> V4:
>  - update according to comments from ML.
> 
> V3:
>  - update according to comments from ML.
> 
> V2:
>  - update according to comments from ML.
> 
> Ori Kam (15):
>   ethdev: move queue state defines to private file
>   ethdev: add support for hairpin queue
>   net/mlx5: query hca hairpin capabilities
>   net/mlx5: support Rx hairpin queues
>   net/mlx5: prepare txq to work with different types
>   net/mlx5: support Tx hairpin queues
>   net/mlx5: add get hairpin capabilities
>   app/testpmd: add hairpin support
>   net/mlx5: add hairpin binding function
>   net/mlx5: add support for hairpin hrxq
>   net/mlx5: add internal tag item and action
>   net/mlx5: add id generation function
>   net/mlx5: add default flows for hairpin
>   net/mlx5: split hairpin flows
>   doc: add hairpin feature

nfb causing build error because of 'RTE_ETH_QUEUE_STATE_STARTED'

.../drivers/net/nfb/nfb_rx.c(66): error: identifier
"RTE_ETH_QUEUE_STATE_STARTED" is undefined
        dev->data->rx_queue_state[rxq_id] = RTE_ETH_QUEUE_STATE_STARTED;
                                            ^


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-24 15:34               ` Thomas Monjalon
@ 2019-10-25 19:01                 ` Ori Kam
  2019-10-25 22:16                   ` Thomas Monjalon
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-25 19:01 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko; +Cc: dev, Ferruh Yigit, jingjing.wu, stephen

Hi Andrew, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, October 24, 2019 6:35 PM
> To: Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: dev@dpdk.org; Ori Kam <orika@mellanox.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; jingjing.wu@intel.com;
> stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
> 
> 24/10/2019 17:30, Andrew Rybchenko:
> > On 10/24/19 6:17 PM, Thomas Monjalon wrote:
> > > 24/10/2019 16:47, Andrew Rybchenko:
> > >> On 10/24/19 11:29 AM, Ori Kam wrote:
> > >>> Hi Andrew,
> > >>>
> > >>> When writing the new function I thought about using bool, but
> > >>> I decided against it for the following reasons:
> > >>> 1. There is no use of bool any where in the code, and there is not special
> reason to add it now.
> > >> rte_ethdev.c includes stdbool.h and uses bool
> > >>
> > >>> 2. Other functions of this kind already returns int. for example
> (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)
> > > I agree with Ori here for 2 reasons:
> > > 1. It is better to be consistent in the API
> > > 2. I remember having some issues with some drivers when introducing
> stdbool in the API.
> > >
> > > I think it may be nice to convert all such API to bool in one patch,
> > > and check if there are some remaining issues with bool usage in drivers or
> with PPC.
> > > But I suggest to do such API change in DPDK 20.11.
> >
> > OK, no problem. Does it prevent to avoid comparison == 1? Just to
> > avoid changes in these lines in the future.
> 
> Yes probably better to avoid explicit comparison, but prefer boolean operator
> (!).
> 
> 

Thomas I understand your comments but from Andrew comment on my V2-01 patch:
"
>>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
>>> +						      nb_rx_desc, conf);
>>> +	if (!ret)
>> Please, compare with 0
>>
> Will do, but again just for my knowledge why?

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.org%2Fguides%2Fcontributing%2Fcoding_style.html%23function-calls&amp;data=02%7C01%7Corika%40mellanox.com%7C022cd953964f4a20d50508d7508a259f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637066426584029629&amp;sdata=G8ZxEWFFsv1kLWc6L7sKT8O6CSiBcj5ZuwQqmK0Q6nY%3D&amp;reserved=0

"
I don't see any relevant info in the link, but maybe I'm missing something .
What are the rules? 
Thomas also keep in mind that in most cases the condition that is tested is the positive, meaning  it will look something like this:
if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {     
        rte_errno = EINVAL;                               
        return NULL;                                      
}                                       

What do you think?
                 



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue
  2019-10-25 19:01                 ` Ori Kam
@ 2019-10-25 22:16                   ` Thomas Monjalon
  0 siblings, 0 replies; 186+ messages in thread
From: Thomas Monjalon @ 2019-10-25 22:16 UTC (permalink / raw)
  To: Ori Kam; +Cc: Andrew Rybchenko, dev, Ferruh Yigit, jingjing.wu, stephen

25/10/2019 21:01, Ori Kam:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 24/10/2019 17:30, Andrew Rybchenko:
> > > On 10/24/19 6:17 PM, Thomas Monjalon wrote:
> > > > 24/10/2019 16:47, Andrew Rybchenko:
> > > >> On 10/24/19 11:29 AM, Ori Kam wrote:
> > > >>> Hi Andrew,
> > > >>>
> > > >>> When writing the new function I thought about using bool, but
> > > >>> I decided against it for the following reasons:
> > > >>> 1. There is no use of bool any where in the code, and there is not special
> > reason to add it now.
> > > >> rte_ethdev.c includes stdbool.h and uses bool
> > > >>
> > > >>> 2. Other functions of this kind already returns int. for example
> > (rte_eth_dev_is_valid_port / rte_eth_is_valid_owner_id)
> > > > I agree with Ori here for 2 reasons:
> > > > 1. It is better to be consistent in the API
> > > > 2. I remember having some issues with some drivers when introducing
> > stdbool in the API.
> > > >
> > > > I think it may be nice to convert all such API to bool in one patch,
> > > > and check if there are some remaining issues with bool usage in drivers or
> > with PPC.
> > > > But I suggest to do such API change in DPDK 20.11.
> > >
> > > OK, no problem. Does it prevent to avoid comparison == 1? Just to
> > > avoid changes in these lines in the future.
> > 
> > Yes probably better to avoid explicit comparison, but prefer boolean operator
> > (!).
> > 
> > 
> 
> Thomas I understand your comments but from Andrew comment on my V2-01 patch:
> "
> >>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> >>> +						      nb_rx_desc, conf);
> >>> +	if (!ret)
> >> Please, compare with 0
> >>
> > Will do, but again just for my knowledge why?
> 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.org%2Fguides%2Fcontributing%2Fcoding_style.html%23function-calls&amp;data=02%7C01%7Corika%40mellanox.com%7C022cd953964f4a20d50508d7508a259f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637066426584029629&amp;sdata=G8ZxEWFFsv1kLWc6L7sKT8O6CSiBcj5ZuwQqmK0Q6nY%3D&amp;reserved=0
> 
> "
> I don't see any relevant info in the link, but maybe I'm missing something .
> What are the rules? 
> Thomas also keep in mind that in most cases the condition that is tested is the positive, meaning  it will look something like this:
> if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {     
>         rte_errno = EINVAL;                               
>         return NULL;                                      
> }                                       
> 
> What do you think?

I think for normal functions with error codes,
we should compare explictly with a value.
But for boolean-type functions like "is_hairpin_queue",
we should have implicit (natural) comparison. So yes, this is correct:
	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id))



^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 00/14] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (17 preceding siblings ...)
  2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
@ 2019-10-27 12:24 ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 01/14] ethdev: move queue state defines to private file Ori Kam
                     ` (13 more replies)
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
  19 siblings, 14 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.



[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V6:
 - add missing include in nfb driver.
 - change comparing of rte_eth_dev_is_tx_hairpin_queue /
   rte_eth_dev_is_rx_hairpin_queue to boolean operator.
 - split the doc patch to the relevant patches.

V5:
 - modify log messages to be more distinct.
 - set that log message will be in the same line even if > 80.
 - change peer_n to peer_count.
 - add functions to get if queue is hairpin queue.

V4:
 - update according to comments from ML.

V3:
 - update according to comments from ML.

V2:
 - update according to comments from ML.


Ori Kam (14):
  ethdev: move queue state defines to private file
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows

 app/test-pmd/parameters.c                |  28 +++
 app/test-pmd/testpmd.c                   | 109 ++++++++-
 app/test-pmd/testpmd.h                   |   3 +
 doc/guides/rel_notes/release_19_11.rst   |   6 +
 drivers/net/mlx5/mlx5.c                  | 170 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 129 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  67 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 drivers/net/nfb/nfb_tx.h                 |   1 +
 lib/librte_ethdev/rte_ethdev.c           | 217 +++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 147 +++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |  50 ++++
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 25 files changed, 2702 insertions(+), 167 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 01/14] ethdev: move queue state defines to private file
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue Ori Kam
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Rastislav Cernay, Jan Remes, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

---
V6:
 - add rte_ethdev_driver include to nfb driver.

---
 drivers/net/nfb/nfb_tx.h              | 1 +
 lib/librte_ethdev/rte_ethdev.h        | 6 ------
 lib/librte_ethdev/rte_ethdev_driver.h | 6 ++++++
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/nfb/nfb_tx.h b/drivers/net/nfb/nfb_tx.h
index edf5ede..b6578cc 100644
--- a/drivers/net/nfb/nfb_tx.h
+++ b/drivers/net/nfb/nfb_tx.h
@@ -10,6 +10,7 @@
 #include <nfb/nfb.h>
 #include <nfb/ndp.h>
 
+#include <rte_ethdev_driver.h>
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
 
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index c36c1b6..9e1f9ae 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1336,12 +1336,6 @@ struct rte_eth_dcb_info {
 	struct rte_eth_dcb_tc_queue_mapping tc_queue;
 };
 
-/**
- * RX/TX queue states
- */
-#define RTE_ETH_QUEUE_STATE_STOPPED 0
-#define RTE_ETH_QUEUE_STATE_STARTED 1
-
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
 /* Macros to check for valid port */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 936ff8c..c404f17 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -22,6 +22,12 @@
 #endif
 
 /**
+ * RX/TX queue states
+ */
+#define RTE_ETH_QUEUE_STATE_STOPPED 0
+#define RTE_ETH_QUEUE_STATE_STARTED 1
+
+/**
  * @internal
  * Returns a ethdev slot specified by the unique identifier name.
  *
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 01/14] ethdev: move queue state defines to private file Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-28 15:16     ` Andrew Rybchenko
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
V6:
 - change comparision of rte_eth_dev_is_tx/tx_hairpin_queue to boolean.
 - add hairpin to release note.

V5:
 - add function to check if queue is hairpin queue.
 - modify log messages to be more distinct.
 - update log messages to be only on one line.
 - change peer_n to peer_count.

V4:
 - update according to ML comments.

V3:
 - update according to ML comments.

V2:
 - update according to ML comments
---
 doc/guides/rel_notes/release_19_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev.c           | 217 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 141 +++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |  44 +++++++
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 6 files changed, 493 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index e02e2f4..87310db 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -224,6 +224,11 @@ New Features
   * Added a console command to testpmd app, ``show port (port_id) ptypes`` which
     gives ability to print port supported ptypes in different protocol layers.
 
+* **Added hairpin queue.**
+
+  On supported NICs, we can now setup haipin queue which will offload packets
+  from the wire, backto the wire.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7743205..68aca1f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -923,6 +923,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -950,6 +957,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -983,6 +997,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -1008,6 +1029,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -1780,6 +1808,79 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **rxq;
+	int i;
+	int count = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu",
+			nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	if (cap.max_nb_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
+				count++;
+		}
+		if (count > cap.max_nb_queues) {
+			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues %d",
+			count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (ret == 0)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1878,6 +1979,78 @@ struct rte_eth_dev *
 		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **txq;
+	int i;
+	int count = 0;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* Use default specified by driver, if nb_tx_desc is zero */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_tx_desc(=%hu), should be: <= %hu",
+			nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	if (cap.max_nb_queues != UINT16_MAX) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			if (rte_eth_dev_is_tx_hairpin_queue(dev, i))
+				count++;
+		}
+		if (count > cap.max_nb_queues) {
+			RTE_ETHDEV_LOG(ERR,
+				       "To many Tx hairpin queues %d", count);
+			return -EINVAL;
+		}
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (ret == 0)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
@@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get queue info for Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get queue info for Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
+				-ENOTSUP);
+	memset(cap, 0, sizeof(*cap));
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 9e1f9ae..9b69255 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -839,6 +839,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supported.
+ */
+struct rte_eth_hairpin_cap {
+	/** The max number of hairpin queues (different bindings). */
+	uint16_t max_nb_queues;
+	/**< Max number of Rx queues to be connected to one Tx queue. */
+	uint16_t max_rx_2_tx;
+	/**< Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_tx_2_rx;
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_count; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1829,6 +1869,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ *   0 means the PMD will use default value.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1881,6 +1952,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ *   0 to set default PMD value.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -1915,7 +2015,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1932,7 +2032,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1950,7 +2050,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1967,7 +2067,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -3633,7 +3733,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
@@ -3653,7 +3754,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
@@ -4151,6 +4253,23 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
@@ -4251,6 +4370,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4517,6 +4641,11 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 392aea8..f215af7 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -509,6 +509,86 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 /**< @internal Test if a port supports specific mempool ops */
 
 /**
+ * @internal
+ * Get the hairpin capabilities.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param cap
+ *   returns the hairpin capabilities from the device.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ */
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+
+/**
+ * @internal
+ * Setup RX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_queue_id
+ *   the selected RX queue index.
+ * @param nb_rx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the RX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
+ * @internal
+ * Setup TX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param tx_queue_id
+ *   the selected TX queue index.
+ * @param nb_tx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the TX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
 struct eth_dev_ops {
@@ -644,6 +724,13 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 };
 
 /**
@@ -751,9 +838,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c404f17..98023d7 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -26,6 +26,50 @@
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
+
+/**
+ * @internal
+ * Check if the selected Rx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+static inline int
+rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
+
+
+/**
+ * @internal
+ * Check if the selected Tx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+static inline int
+rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
 
 /**
  * @internal
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index e59d516..48b5389 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -288,4 +288,7 @@ EXPERIMENTAL {
 	rte_eth_rx_burst_mode_get;
 	rte_eth_tx_burst_mode_get;
 	rte_eth_burst_mode_option_name;
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 03/14] net/mlx5: query hca hairpin capabilities
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 01/14] ethdev: move queue state defines to private file Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 04/14] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b6a51b2..ee04dd0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -187,6 +187,10 @@ struct mlx5_hca_attr {
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
 	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 51947d3..17c1671 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 04/14] net/mlx5: support Rx hairpin queues
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 05/14] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fac5105..6be423f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -985,6 +985,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1051,6 +1052,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f0ab843..c70e161 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 05/14] net/mlx5: prepare txq to work with different types
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 04/14] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 06/14] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6be423f..8d1595c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -922,7 +922,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ee04dd0..3afb4cc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -650,7 +650,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5ec2b48..f597c89 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 53d45e7..a6e2563 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -1127,7 +1127,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1153,8 +1153,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 06/14] net/mlx5: support Tx hairpin queues
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 05/14] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 07/14] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |  36 +++++-
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 620 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8d1595c..49b1e82 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -324,6 +324,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -390,10 +393,25 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-	err = mlx5_get_pdn(sh->pd, &sh->pdn);
-	if (err) {
-		DRV_LOG(ERR, "Fail to extract pdn from PD");
-		goto error;
+	if (sh->devx) {
+		err = mlx5_get_pdn(sh->pd, &sh->pdn);
+		if (err) {
+			DRV_LOG(ERR, "Fail to extract pdn from PD");
+			goto error;
+		}
+		sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+		if (!sh->td) {
+			DRV_LOG(ERR, "TD allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
+		tis_attr.transport_domain = sh->td->id;
+		sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+		if (!sh->tis) {
+			DRV_LOG(ERR, "TIS allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
@@ -426,6 +444,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -495,6 +517,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -987,6 +1013,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1054,6 +1081,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3afb4cc..566bf2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -353,6 +353,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -596,6 +633,8 @@ struct mlx5_ibv_shared {
 	uint32_t devx_intr_cnt; /* Devx interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -918,5 +957,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 17c1671..a501f1f 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -717,3 +717,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c86f8b8..c687cfb 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -671,9 +671,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1328,6 +1332,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1444,6 +1465,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1589,6 +1628,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a6e2563..dfc379c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -1100,6 +1259,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -1108,6 +1268,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 07/14] net/mlx5: add get hairpin capabilities
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 06/14] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 08/14] app/testpmd: add hairpin support Ori Kam
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
V6:
 - add MLX5 hairpin support to release note.

---
 doc/guides/rel_notes/release_19_11.rst |  1 +
 drivers/net/mlx5/mlx5.c                |  2 ++
 drivers/net/mlx5/mlx5.h                |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c         | 27 +++++++++++++++++++++++++++
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 87310db..3ad41d9 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -149,6 +149,7 @@ New Features
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
   * Added support for matching on packets withe Geneve tunnel header.
+  * Added hairpin support.
 
 * **Updated the AF_XDP PMD.**
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 49b1e82..b0fdd9b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1039,6 +1039,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1101,6 +1102,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 566bf2d..742bedd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -789,7 +789,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2278b24..fe1b4d4 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -2114,3 +2114,30 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->devx == 0) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	cap->max_nb_queues = UINT16_MAX;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 08/14] app/testpmd: add hairpin support
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 07/14] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 09/14] net/mlx5: add hairpin binding function Ori Kam
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/parameters.c |  28 ++++++++++++
 app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |   3 ++
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9ea87c1..9b6e35b 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -149,6 +149,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -622,6 +624,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1045,6 +1048,31 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 &&
+				    check_nb_hairpinq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_hairpinq
+						  (&pid));
+				if ((n + nb_txq) < 0 ||
+				    check_nb_txq((queueid_t)(n + nb_txq)) != 0)
+					rte_exit(EXIT_FAILURE, "txq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_txq,
+						  get_allowed_max_nb_txq(&pid));
+				if ((n + nb_rxq) < 0 ||
+				    check_nb_rxq((queueid_t)(n + nb_rxq)) != 0)
+					rte_exit(EXIT_FAILURE, "rxq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_rxq,
+						  get_allowed_max_nb_rxq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f31..fec946f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -235,6 +235,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -1103,6 +1104,53 @@ struct extmem_param {
 	return 0;
 }
 
+/*
+ * Get the allowed maximum number of hairpin queues.
+ * *pid return the port id which has minimal value of
+ * max_hairpin_queues in all ports.
+ */
+queueid_t
+get_allowed_max_nb_hairpinq(portid_t *pid)
+{
+	queueid_t allowed_max_hairpinq = MAX_QUEUE_ID;
+	portid_t pi;
+	struct rte_eth_hairpin_cap cap;
+
+	RTE_ETH_FOREACH_DEV(pi) {
+		if (rte_eth_dev_hairpin_capability_get(pi, &cap) != 0) {
+			*pid = pi;
+			return 0;
+		}
+		if (cap.max_nb_queues < allowed_max_hairpinq) {
+			allowed_max_hairpinq = cap.max_nb_queues;
+			*pid = pi;
+		}
+	}
+	return allowed_max_hairpinq;
+}
+
+/*
+ * Check input hairpin is valid or not.
+ * If input hairpin is not greater than any of maximum number
+ * of hairpin queues of all ports, it is valid.
+ * if valid, return 0, else return -1
+ */
+int
+check_nb_hairpinq(queueid_t hairpinq)
+{
+	queueid_t allowed_max_hairpinq;
+	portid_t pid = 0;
+
+	allowed_max_hairpinq = get_allowed_max_nb_hairpinq(&pid);
+	if (hairpinq > allowed_max_hairpinq) {
+		printf("Fail: input hairpin (%u) can't be greater "
+		       "than max_hairpin_queues (%u) of port %u\n",
+		       hairpinq, allowed_max_hairpinq, pid);
+		return -1;
+	}
+	return 0;
+}
+
 static void
 init_config(void)
 {
@@ -2064,6 +2112,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_count = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2096,9 +2149,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2191,6 +2251,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb8d456..625093d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -385,6 +385,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
@@ -857,6 +858,8 @@ enum print_warning {
 int check_nb_rxq(queueid_t rxq);
 queueid_t get_allowed_max_nb_txq(portid_t *pid);
 int check_nb_txq(queueid_t txq);
+queueid_t get_allowed_max_nb_hairpinq(portid_t *pid);
+int check_nb_hairpinq(queueid_t hairpinq);
 
 uint16_t dump_rx_pkts(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		      uint16_t nb_pkts, __rte_unused uint16_t max_pkts,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 09/14] net/mlx5: add hairpin binding function
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 08/14] app/testpmd: add hairpin support Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 742bedd..33cfc5b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -191,6 +191,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index a501f1f..3471a9b 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c687cfb..e4b19f8 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1628,6 +1628,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 10/14] net/mlx5: add support for hairpin hrxq
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 09/14] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 11/14] net/mlx5: add internal tag item and action Ori Kam
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 33cfc5b..a36ba2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -716,6 +716,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -792,6 +793,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index fe1b4d4..c2bed2f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c70e161..2c3d5eb 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint32_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 11/14] net/mlx5: add internal tag item and action
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 12/14] net/mlx5: add id generation function Ori Kam
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  48 +++++++++++-
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 254 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d4d956f..a309b6f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9658db1..a79b48b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,11 +90,12 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
 #define MLX5_FLOW_LAYER_IPIP (1u << 21)
@@ -141,6 +179,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -174,7 +213,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e5f4c4c..1c9dc36 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -724,6 +724,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4720,6 +4773,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5305,8 +5446,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5621,6 +5763,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5645,8 +5793,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5797,6 +5946,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index e4b19f8..96b9166 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -628,7 +628,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 12/14] net/mlx5: add id generation function
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 11/14] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-27 12:24   ` Ori Kam
  2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 13/14] net/mlx5: add default flows for hairpin Ori Kam
  2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 14/14] net/mlx5: split hairpin flows Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:24 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b0fdd9b..b7a98b8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -178,6 +178,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -328,7 +446,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a79b48b..fddc06b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -527,8 +527,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 13/14] net/mlx5: add default flows for hairpin
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 12/14] net/mlx5: add id generation function Ori Kam
@ 2019-10-27 12:25   ` Ori Kam
  2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 14/14] net/mlx5: split hairpin flows Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:25 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a36ba2d..1181c1f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -560,6 +560,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -883,6 +884,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a309b6f..1148db0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2820,6 +2820,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index fddc06b..f81e1b1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -103,6 +109,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_NVGRE (1u << 23)
 #define MLX5_FLOW_LAYER_GENEVE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1c9dc36..13178cc 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3358,7 +3358,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3527,6 +3529,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3535,11 +3540,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3805,6 +3811,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5371,6 +5379,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5951,6 +6004,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v6 14/14] net/mlx5: split hairpin flows
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 13/14] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-27 12:25   ` Ori Kam
  13 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-27 12:25 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b7a98b8..b622339 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -531,6 +531,12 @@ struct mlx5_flow_id_pool *
 			goto error;
 		}
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -570,6 +576,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -641,6 +649,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1181c1f..f644998 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -578,6 +578,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -637,6 +646,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1148db0..5f01f9c 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2527,6 +2527,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2554,6 +2758,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2561,16 +2766,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2579,11 +2812,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2604,7 +2839,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2612,7 +2847,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2624,8 +2876,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2715,12 +2975,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f81e1b1..7559810 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -466,6 +466,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -487,12 +489,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -536,15 +537,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 13178cc..d9a7fd4 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5843,7 +5843,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -6070,7 +6070,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6085,7 +6085,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6141,7 +6141,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6375,7 +6375,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2c3d5eb..24d0eaa 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-28 15:16     ` Andrew Rybchenko
  2019-10-28 18:44       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-28 15:16 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

Hi Ori,

On 10/27/19 3:24 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>
> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

LGTM, nothing critical may be except maximum number check
which I lost from my view before.
Plus few style suggestions which may be dropped, but I'd be
happier if applied.

Thanks.

[snip]

> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 7743205..68aca1f 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -923,6 +923,13 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
>   
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't start Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Log message looks a bit strange:
     Can't start Rx queue 5 of device with port_id=0 is hairpin queue
may be to put key information first:
     Can't start hairpin Rx queue 5 of device with port_id=0

> +			rx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
> @@ -950,6 +957,13 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
>   
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't stop Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Same

> +			rx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
> @@ -983,6 +997,13 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
>   
> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't start Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Same

> +			tx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
> @@ -1008,6 +1029,13 @@ struct rte_eth_dev *
>   
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
>   
> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't stop Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Same

> +			tx_queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
>   		RTE_ETHDEV_LOG(INFO,
>   			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
> @@ -1780,6 +1808,79 @@ struct rte_eth_dev *
>   }
>   
>   int
> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			       uint16_t nb_rx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)
> +{
> +	int ret;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	void **rxq;
> +	int i;
> +	int count = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
> +				-ENOTSUP);

Most likely unsupported hairpin is caught by capability get above.
So, may be it is better to move the check just before usage far below.
Also, if line length is sufficient I think it would better to put -ENOTSUP
to the previous line just to follow port_id check style.

> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0)
> +		nb_rx_desc = cap.max_nb_desc;

Function description and comment above mentions PMD default, but
there is no default. It uses just maximum. I have no strong opinion
if default is really required or it is OK to say that maximum is used.
The only concern is: why maximum?

> +	if (nb_rx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu",
> +			nb_rx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_count > cap.max_rx_2_tx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for number of peers for Rx queue(=%hu), should be: <= %hu",
> +			conf->peer_count, cap.max_rx_2_tx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_count == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for number of peers for Rx queue(=%hu), should be: > 0",
> +			conf->peer_count);
> +		return -EINVAL;
> +	}
> +	if (cap.max_nb_queues != UINT16_MAX) {

I'm not sure that we need to handle it separately. Code below
should handle it and it and there is no point to optimize it.

> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {

May I suggest to assign count = 0 to make it a bit easier to read and
more robust against future changes.

> +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))

The condition should be more tricky if we resetup hairpin queue.
I.e. we should check if i is rx_queue_id and count it anyway.

> +				count++;
> +		}
> +		if (count > cap.max_nb_queues) {
> +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues %d",

I think it would be useful to log max here as well to catch
unset max cases easier.

> +			count);
> +			return -EINVAL;
> +		}
> +	}
> +	if (dev->data->dev_started)
> +		return -EBUSY;
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> +						      nb_rx_desc, conf);
> +	if (ret == 0)
> +		dev->data->rx_queue_state[rx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
>   		       const struct rte_eth_txconf *tx_conf)
> @@ -1878,6 +1979,78 @@ struct rte_eth_dev *
>   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>   }
>   
> +int
> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> +			       uint16_t nb_tx_desc,
> +			       const struct rte_eth_hairpin_conf *conf)

Same notes as for Rx queue above.

> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_hairpin_cap cap;
> +	void **txq;
> +	int i;
> +	int count = 0;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +	dev = &rte_eth_devices[port_id];
> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
> +		return -EINVAL;
> +	}
> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> +	if (ret != 0)
> +		return ret;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
> +				-ENOTSUP);
> +	/* Use default specified by driver, if nb_tx_desc is zero */
> +	if (nb_tx_desc == 0)
> +		nb_tx_desc = cap.max_nb_desc;
> +	if (nb_tx_desc > cap.max_nb_desc) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for nb_tx_desc(=%hu), should be: <= %hu",
> +			nb_tx_desc, cap.max_nb_desc);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_count > cap.max_tx_2_rx) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for number of peers for Tx queue(=%hu), should be: <= %hu",
> +			conf->peer_count, cap.max_tx_2_rx);
> +		return -EINVAL;
> +	}
> +	if (conf->peer_count == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for number of peers for Tx queue(=%hu), should be: > 0",
> +			conf->peer_count);
> +		return -EINVAL;
> +	}
> +	if (cap.max_nb_queues != UINT16_MAX) {
> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +			if (rte_eth_dev_is_tx_hairpin_queue(dev, i))
> +				count++;
> +		}
> +		if (count > cap.max_nb_queues) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "To many Tx hairpin queues %d", count);
> +			return -EINVAL;
> +		}
> +	}
> +	if (dev->data->dev_started)
> +		return -EBUSY;
> +	txq = dev->data->tx_queues;
> +	if (txq[tx_queue_id] != NULL) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> +		txq[tx_queue_id] = NULL;
> +	}
> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> +		(dev, tx_queue_id, nb_tx_desc, conf);
> +	if (ret == 0)
> +		dev->data->tx_queue_state[tx_queue_id] =
> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> +	return eth_err(port_id, ret);
> +}
> +
>   void
>   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
>   		void *userdata __rte_unused)
> @@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
>   		rte_errno = EINVAL;
>   		return NULL;
>   	}
> +	dev = &rte_eth_devices[port_id];
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   	rte_errno = ENOTSUP;
>   	return NULL;
>   #endif
> +	struct rte_eth_dev *dev;
> +
>   	/* check input parameters */
>   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> @@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return NULL;
>   	}
>   
> +	dev = &rte_eth_devices[port_id];
> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
>   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>   
>   	if (cb == NULL) {
> @@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return -EINVAL;
>   	}
>   
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't get queue info for Rx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

"queue" is repeated 3 times above ;) I'm afraid it is too much, may be:
"Can't get hairpin Rx queue %" PRIu16 " port %" PRIu16 " info\n"
or
"Can't get hairpin Rx queue %" PRIu16 " info of device with port_id=%" 
PRIu16 "\n"
Anyway up to you.

> +			queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
>   
>   	memset(qinfo, 0, sizeof(*qinfo));
> @@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   		return -EINVAL;
>   	}
>   
> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Can't get queue info for Tx queue %"PRIu16" of device with port_id=%"PRIu16" is hairpin queue\n",

Same

> +			queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
>   
>   	memset(qinfo, 0, sizeof(*qinfo));
> @@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
>   }
>   
>   int
> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> +				   struct rte_eth_hairpin_cap *cap)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> +				-ENOTSUP);

Please, move -ENOTSUP to the previous line since line length is sufficient
and make it similar to port_id check above.

> +	memset(cap, 0, sizeof(*cap));
> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
> +}
> +
> +int
>   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>   {
>   	struct rte_eth_dev *dev;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 9e1f9ae..9b69255 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -839,6 +839,46 @@ struct rte_eth_txconf {
>   };
>   
>   /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to return the hairpin capabilities that are supported.
> + */
> +struct rte_eth_hairpin_cap {
> +	/** The max number of hairpin queues (different bindings). */
> +	uint16_t max_nb_queues;
> +	/**< Max number of Rx queues to be connected to one Tx queue. */

Should be /**

> +	uint16_t max_rx_2_tx;
> +	/**< Max number of Tx queues to be connected to one Rx queue. */

Should be /**

[snip]


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-28 15:16     ` Andrew Rybchenko
@ 2019-10-28 18:44       ` Ori Kam
  2019-10-29  7:38         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-28 18:44 UTC (permalink / raw)
  To: Andrew Rybchenko, John McNamara, Marko Kovacevic,
	Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

Hi Andrew,


> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Monday, October 28, 2019 5:16 PM
> To: Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> On 10/27/19 3:24 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> LGTM, nothing critical may be except maximum number check
> which I lost from my view before.
> Plus few style suggestions which may be dropped, but I'd be
> happier if applied.
> 
> Thanks.
> 
I really apricate your time and comments,
This patch is the base of a number of other series (Meta/Metering) 
So if it is nothing critical I prefer to get this set merged and then change what is needed,
if it is O.K by you.

Detail  comments please see below.

> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index 7743205..68aca1f 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -923,6 +923,13 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
> ENOTSUP);
> >
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't start Rx queue %"PRIu16" of device with
> port_id=%"PRIu16" is hairpin queue\n",
> 
> Log message looks a bit strange:
>      Can't start Rx queue 5 of device with port_id=0 is hairpin queue
> may be to put key information first:
>      Can't start hairpin Rx queue 5 of device with port_id=0
> 

I'm not a native English speaker but I think the meaning is different.
In my original log it means that you try to start a queue but fail due to 
the fact that the queue is hairpin queue.

In your version it means that you can't start an hairpin queue but there is no
reason why not.

What do you think?

> > +			rx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->rx_queue_state[rx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already started\n",
> > @@ -950,6 +957,13 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -
> ENOTSUP);
> >
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't stop Rx queue %"PRIu16" of device with
> port_id=%"PRIu16" is hairpin queue\n",
> 
> Same
>
Please see comment above.
 
> > +			rx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->rx_queue_state[rx_queue_id] ==
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already stopped\n",
> > @@ -983,6 +997,13 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -
> ENOTSUP);
> >
> > +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't start Tx queue %"PRIu16" of device with
> port_id=%"PRIu16" is hairpin queue\n",
> 
> Same
> 
Please see comment above.

> > +			tx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->tx_queue_state[tx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already started\n",
> > @@ -1008,6 +1029,13 @@ struct rte_eth_dev *
> >
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -
> ENOTSUP);
> >
> > +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't stop Tx queue %"PRIu16" of device with
> port_id=%"PRIu16" is hairpin queue\n",
> 
> Same
>

Please see comment above.
 
> > +			tx_queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	if (dev->data->tx_queue_state[tx_queue_id] ==
> RTE_ETH_QUEUE_STATE_STOPPED) {
> >   		RTE_ETHDEV_LOG(INFO,
> >   			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> already stopped\n",
> > @@ -1780,6 +1808,79 @@ struct rte_eth_dev *
> >   }
> >
> >   int
> > +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > +			       uint16_t nb_rx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> > +{
> > +	int ret;
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	void **rxq;
> > +	int i;
> > +	int count = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_hairpin_queue_setup,
> > +				-ENOTSUP);
> 
> Most likely unsupported hairpin is caught by capability get above.
> So, may be it is better to move the check just before usage far below.
> Also, if line length is sufficient I think it would better to put -ENOTSUP
> to the previous line just to follow port_id check style.
> 

I think that in most function we are starting with the check.
personally I like to have basic checks in the beginning of the code.
But I will do what you think is best. If I remember correctly the line
length is to short, but I will test again.

> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0)
> > +		nb_rx_desc = cap.max_nb_desc;
> 
> Function description and comment above mentions PMD default, but
> there is no default. It uses just maximum. I have no strong opinion
> if default is really required or it is OK to say that maximum is used.
> The only concern is: why maximum?
> 

Most likely the best value is the max, but I can add a new field to the cap
that say default value. What do you think?

> > +	if (nb_rx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for nb_rx_desc(=%hu), should be: <=
> %hu",
> > +			nb_rx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_count > cap.max_rx_2_tx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for number of peers for Rx queue(=%hu),
> should be: <= %hu",
> > +			conf->peer_count, cap.max_rx_2_tx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_count == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for number of peers for Rx queue(=%hu),
> should be: > 0",
> > +			conf->peer_count);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_nb_queues != UINT16_MAX) {
> 
> I'm not sure that we need to handle it separately. Code below
> should handle it and it and there is no point to optimize it.
> 

This is done to save time if the user set uint16_max there is no point to the
loop, I can add the check as condition to the loop but then it looks incorrect
since we are checking some think that can’t be changed. 
What do you think? 

> > +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> 
> May I suggest to assign count = 0 to make it a bit easier to read and
> more robust against future changes.
> 

You mean add count = 0 to the first part of the loop?

> > +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
> 
> The condition should be more tricky if we resetup hairpin queue.
> I.e. we should check if i is rx_queue_id and count it anyway.
> 
> > +				count++;
> > +		}
> > +		if (count > cap.max_nb_queues) {
> > +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues
> %d",
> 
> I think it would be useful to log max here as well to catch
> unset max cases easier.
> 

I'm not sure I understand.

> > +			count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	if (dev->data->dev_started)
> > +		return -EBUSY;
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> > +						      nb_rx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->rx_queue_state[rx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> > +int
> >   rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >   		       uint16_t nb_tx_desc, unsigned int socket_id,
> >   		       const struct rte_eth_txconf *tx_conf)
> > @@ -1878,6 +1979,78 @@ struct rte_eth_dev *
> >   		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >   }
> >
> > +int
> > +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> > +			       uint16_t nb_tx_desc,
> > +			       const struct rte_eth_hairpin_conf *conf)
> 
> Same notes as for Rx queue above.
> 

O.K. same comments.

> > +{
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_hairpin_cap cap;
> > +	void **txq;
> > +	int i;
> > +	int count = 0;
> > +	int ret;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +	dev = &rte_eth_devices[port_id];
> > +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> tx_queue_id);
> > +		return -EINVAL;
> > +	}
> > +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> > +	if (ret != 0)
> > +		return ret;
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_hairpin_queue_setup,
> > +				-ENOTSUP);
> > +	/* Use default specified by driver, if nb_tx_desc is zero */
> > +	if (nb_tx_desc == 0)
> > +		nb_tx_desc = cap.max_nb_desc;
> > +	if (nb_tx_desc > cap.max_nb_desc) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for nb_tx_desc(=%hu), should be: <=
> %hu",
> > +			nb_tx_desc, cap.max_nb_desc);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_count > cap.max_tx_2_rx) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for number of peers for Tx queue(=%hu),
> should be: <= %hu",
> > +			conf->peer_count, cap.max_tx_2_rx);
> > +		return -EINVAL;
> > +	}
> > +	if (conf->peer_count == 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for number of peers for Tx queue(=%hu),
> should be: > 0",
> > +			conf->peer_count);
> > +		return -EINVAL;
> > +	}
> > +	if (cap.max_nb_queues != UINT16_MAX) {
> > +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> > +			if (rte_eth_dev_is_tx_hairpin_queue(dev, i))
> > +				count++;
> > +		}
> > +		if (count > cap.max_nb_queues) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "To many Tx hairpin queues %d", count);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	if (dev->data->dev_started)
> > +		return -EBUSY;
> > +	txq = dev->data->tx_queues;
> > +	if (txq[tx_queue_id] != NULL) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >tx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > +		txq[tx_queue_id] = NULL;
> > +	}
> > +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> > +		(dev, tx_queue_id, nb_tx_desc, conf);
> > +	if (ret == 0)
> > +		dev->data->tx_queue_state[tx_queue_id] =
> > +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> > +	return eth_err(port_id, ret);
> > +}
> > +
> >   void
> >   rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
> >   		void *userdata __rte_unused)
> > @@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
> >   		rte_errno = EINVAL;
> >   		return NULL;
> >   	}
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id,
> uint16_t queue_idx,
> >   	rte_errno = ENOTSUP;
> >   	return NULL;
> >   #endif
> > +	struct rte_eth_dev *dev;
> > +
> >   	/* check input parameters */
> >   	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >   		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> > @@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return NULL;
> >   	}
> >
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> >   	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >
> >   	if (cb == NULL) {
> > @@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return -EINVAL;
> >   	}
> >
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't get queue info for Rx queue %"PRIu16" of device
> with port_id=%"PRIu16" is hairpin queue\n",
> 
> "queue" is repeated 3 times above ;) I'm afraid it is too much, may be:
> "Can't get hairpin Rx queue %" PRIu16 " port %" PRIu16 " info\n"
> or
> "Can't get hairpin Rx queue %" PRIu16 " info of device with port_id=%"
> PRIu16 "\n"
> Anyway up to you.
> 

O.K.  I will update.

> > +			queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -
> ENOTSUP);
> >
> >   	memset(qinfo, 0, sizeof(*qinfo));
> > @@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   		return -EINVAL;
> >   	}
> >
> > +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> > +		RTE_ETHDEV_LOG(INFO,
> > +			"Can't get queue info for Tx queue %"PRIu16" of device
> with port_id=%"PRIu16" is hairpin queue\n",
> 
> Same
>

Same.
 
> > +			queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -
> ENOTSUP);
> >
> >   	memset(qinfo, 0, sizeof(*qinfo));
> > @@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
> >   }
> >
> >   int
> > +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> > +				   struct rte_eth_hairpin_cap *cap)
> > +{
> > +	struct rte_eth_dev *dev;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> > +				-ENOTSUP);
> 
> Please, move -ENOTSUP to the previous line since line length is sufficient
> and make it similar to port_id check above.
> 

Last time I check it didn't have room, I will check again.

> > +	memset(cap, 0, sizeof(*cap));
> > +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
> > +}
> > +
> > +int
> >   rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >   {
> >   	struct rte_eth_dev *dev;
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index 9e1f9ae..9b69255 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -839,6 +839,46 @@ struct rte_eth_txconf {
> >   };
> >
> >   /**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * A structure used to return the hairpin capabilities that are supported.
> > + */
> > +struct rte_eth_hairpin_cap {
> > +	/** The max number of hairpin queues (different bindings). */
> > +	uint16_t max_nb_queues;
> > +	/**< Max number of Rx queues to be connected to one Tx queue. */
> 
> Should be /**
> 

Will fix.

> > +	uint16_t max_rx_2_tx;
> > +	/**< Max number of Tx queues to be connected to one Rx queue. */
> 
> Should be /**
> 

Will fix.

> [snip]


Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-28 18:44       ` Ori Kam
@ 2019-10-29  7:38         ` Andrew Rybchenko
  2019-10-29 19:39           ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-29  7:38 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

On 10/28/19 9:44 PM, Ori Kam wrote:
> Hi Andrew,
>
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Monday, October 28, 2019 5:16 PM
>> To: Ori Kam <orika@mellanox.com>; John McNamara
>> <john.mcnamara@intel.com>; Marko Kovacevic
>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Ferruh Yigit <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
>> Subject: Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
>>
>> Hi Ori,
>>
>> On 10/27/19 3:24 PM, Ori Kam wrote:
>>> This commit introduce hairpin queue type.
>>>
>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>> It is used to offload traffic coming from the wire and redirect it back
>>> to the wire.
>>>
>>> There are 3 new functions:
>>> - rte_eth_dev_hairpin_capability_get
>>> - rte_eth_rx_hairpin_queue_setup
>>> - rte_eth_tx_hairpin_queue_setup
>>>
>>> In order to use the queue, there is a need to create rte_flow
>>> with queue / RSS action that targets one or more of the Rx queues.
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> LGTM, nothing critical may be except maximum number check
>> which I lost from my view before.
>> Plus few style suggestions which may be dropped, but I'd be
>> happier if applied.
>>
>> Thanks.
>>
> I really apricate your time and comments,
> This patch is the base of a number of other series (Meta/Metering)
> So if it is nothing critical I prefer to get this set merged and then change what is needed,
> if it is O.K by you.

OK for me

> Detail  comments please see below.
>
>> [snip]
>>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
>>> index 7743205..68aca1f 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.c
>>> +++ b/lib/librte_ethdev/rte_ethdev.c
>>> @@ -923,6 +923,13 @@ struct rte_eth_dev *
>>>
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
>> ENOTSUP);
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't start Rx queue %"PRIu16" of device with
>> port_id=%"PRIu16" is hairpin queue\n",
>>
>> Log message looks a bit strange:
>>       Can't start Rx queue 5 of device with port_id=0 is hairpin queue
>> may be to put key information first:
>>       Can't start hairpin Rx queue 5 of device with port_id=0
>>
> I'm not a native English speaker but I think the meaning is different.

Obviously me too

> In my original log it means that you try to start a queue but fail due to
> the fact that the queue is hairpin queue.
>
> In your version it means that you can't start an hairpin queue but there is no
> reason why not.
>
> What do you think?

Let's keep your version if there is no better suggestions from native
speakers.

>>> +			rx_queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	if (dev->data->rx_queue_state[rx_queue_id] !=
>> RTE_ETH_QUEUE_STATE_STOPPED) {
>>>    		RTE_ETHDEV_LOG(INFO,
>>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
>> already started\n",
>>> @@ -950,6 +957,13 @@ struct rte_eth_dev *
>>>
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -
>> ENOTSUP);
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't stop Rx queue %"PRIu16" of device with
>> port_id=%"PRIu16" is hairpin queue\n",
>>
>> Same
>>
> Please see comment above.
>   
>>> +			rx_queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	if (dev->data->rx_queue_state[rx_queue_id] ==
>> RTE_ETH_QUEUE_STATE_STOPPED) {
>>>    		RTE_ETHDEV_LOG(INFO,
>>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
>> already stopped\n",
>>> @@ -983,6 +997,13 @@ struct rte_eth_dev *
>>>
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -
>> ENOTSUP);
>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't start Tx queue %"PRIu16" of device with
>> port_id=%"PRIu16" is hairpin queue\n",
>>
>> Same
>>
> Please see comment above.
>
>>> +			tx_queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	if (dev->data->tx_queue_state[tx_queue_id] !=
>> RTE_ETH_QUEUE_STATE_STOPPED) {
>>>    		RTE_ETHDEV_LOG(INFO,
>>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
>> already started\n",
>>> @@ -1008,6 +1029,13 @@ struct rte_eth_dev *
>>>
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -
>> ENOTSUP);
>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't stop Tx queue %"PRIu16" of device with
>> port_id=%"PRIu16" is hairpin queue\n",
>>
>> Same
>>
> Please see comment above.
>   
>>> +			tx_queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	if (dev->data->tx_queue_state[tx_queue_id] ==
>> RTE_ETH_QUEUE_STATE_STOPPED) {
>>>    		RTE_ETHDEV_LOG(INFO,
>>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
>> already stopped\n",
>>> @@ -1780,6 +1808,79 @@ struct rte_eth_dev *
>>>    }
>>>
>>>    int
>>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>>> +			       uint16_t nb_rx_desc,
>>> +			       const struct rte_eth_hairpin_conf *conf)
>>> +{
>>> +	int ret;
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_hairpin_cap cap;
>>> +	void **rxq;
>>> +	int i;
>>> +	int count = 0;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>> rx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
>>> +	if (ret != 0)
>>> +		return ret;
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>> Most likely unsupported hairpin is caught by capability get above.
>> So, may be it is better to move the check just before usage far below.
>> Also, if line length is sufficient I think it would better to put -ENOTSUP
>> to the previous line just to follow port_id check style.
>>
> I think that in most function we are starting with the check.
> personally I like to have basic checks in the beginning of the code.
> But I will do what you think is best. If I remember correctly the line
> length is to short, but I will test again.

Up to you. Thanks.

>>> +	/* Use default specified by driver, if nb_rx_desc is zero */
>>> +	if (nb_rx_desc == 0)
>>> +		nb_rx_desc = cap.max_nb_desc;
>> Function description and comment above mentions PMD default, but
>> there is no default. It uses just maximum. I have no strong opinion
>> if default is really required or it is OK to say that maximum is used.
>> The only concern is: why maximum?
>>
> Most likely the best value is the max, but I can add a new field to the cap
> that say default value. What do you think?

I'm not 100% sure since default requires 0 value handling and
I think fallback to maximum could be the right handling here.
May be it is better to document that maximum is used and
introduce default if it is really required in the future.
It should be reconsidered when API is promoted to stable
from experimental.

Basically both options are OK for me.

>>> +	if (nb_rx_desc > cap.max_nb_desc) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for nb_rx_desc(=%hu), should be: <=
>> %hu",
>>> +			nb_rx_desc, cap.max_nb_desc);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_count > cap.max_rx_2_tx) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for number of peers for Rx queue(=%hu),
>> should be: <= %hu",
>>> +			conf->peer_count, cap.max_rx_2_tx);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_count == 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for number of peers for Rx queue(=%hu),
>> should be: > 0",
>>> +			conf->peer_count);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (cap.max_nb_queues != UINT16_MAX) {
>> I'm not sure that we need to handle it separately. Code below
>> should handle it and it and there is no point to optimize it.
>>
> This is done to save time if the user set uint16_max there is no point to the
> loop, I can add the check as condition to the loop but then it looks incorrect
> since we are checking some think that can’t be changed.
> What do you think?

Frankly speaking I see no value in the optimization. It is control
path and I'd prefer simpler code here.

>>> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> May I suggest to assign count = 0 to make it a bit easier to read and
>> more robust against future changes.
>>
> You mean add count = 0 to the first part of the loop?

Yes, right now count initialization is done too far from the line.

>>> +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
>> The condition should be more tricky if we resetup hairpin queue.
>> I.e. we should check if i is rx_queue_id and count it anyway.
>>
>>> +				count++;
>>> +		}
>>> +		if (count > cap.max_nb_queues) {
>>> +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues
>> %d",
>>
>> I think it would be useful to log max here as well to catch
>> unset max cases easier.
>>
> I'm not sure I understand.

If the question is about logging, the answer is simple:
if the user forget to initialize maximum number of hairpin queues
properly, it will be zero and setup will fail here. So, it would be
good to log maximum value here just to make it clear which
limit is exceeded.

If the question is about above check, let's consider the case when
maximum is one and one hairpin queue is already setup, but
user tries to setup one more. Above loop will count only one since
hairpin state for current queue is set below. So, the condition will
allow to setup the second hairpin queue.
In theory, we could initialize cound=1 to count this one, but
it would break the case when we call setup once again for the
queue which is already hairpin. API allows and handles it.

>>> +			count);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	if (dev->data->dev_started)
>>> +		return -EBUSY;
>>> +	rxq = dev->data->rx_queues;
>>> +	if (rxq[rx_queue_id] != NULL) {
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> rx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>>> +		rxq[rx_queue_id] = NULL;
>>> +	}
>>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
>>> +						      nb_rx_desc, conf);
>>> +	if (ret == 0)
>>> +		dev->data->rx_queue_state[rx_queue_id] =
>>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
>>> +	return eth_err(port_id, ret);
>>> +}
>>> +
>>> +int
>>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
>>>    		       const struct rte_eth_txconf *tx_conf)
>>> @@ -1878,6 +1979,78 @@ struct rte_eth_dev *
>>>    		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
>>>    }
>>>
>>> +int
>>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>>> +			       uint16_t nb_tx_desc,
>>> +			       const struct rte_eth_hairpin_conf *conf)
>> Same notes as for Rx queue above.
>>
> O.K. same comments.
>
>>> +{
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_hairpin_cap cap;
>>> +	void **txq;
>>> +	int i;
>>> +	int count = 0;
>>> +	int ret;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>> tx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
>>> +	if (ret != 0)
>>> +		return ret;
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_hairpin_queue_setup,
>>> +				-ENOTSUP);
>>> +	/* Use default specified by driver, if nb_tx_desc is zero */
>>> +	if (nb_tx_desc == 0)
>>> +		nb_tx_desc = cap.max_nb_desc;
>>> +	if (nb_tx_desc > cap.max_nb_desc) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for nb_tx_desc(=%hu), should be: <=
>> %hu",
>>> +			nb_tx_desc, cap.max_nb_desc);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_count > cap.max_tx_2_rx) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for number of peers for Tx queue(=%hu),
>> should be: <= %hu",
>>> +			conf->peer_count, cap.max_tx_2_rx);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (conf->peer_count == 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for number of peers for Tx queue(=%hu),
>> should be: > 0",
>>> +			conf->peer_count);
>>> +		return -EINVAL;
>>> +	}
>>> +	if (cap.max_nb_queues != UINT16_MAX) {
>>> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
>>> +			if (rte_eth_dev_is_tx_hairpin_queue(dev, i))
>>> +				count++;
>>> +		}
>>> +		if (count > cap.max_nb_queues) {
>>> +			RTE_ETHDEV_LOG(ERR,
>>> +				       "To many Tx hairpin queues %d", count);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	if (dev->data->dev_started)
>>> +		return -EBUSY;
>>> +	txq = dev->data->tx_queues;
>>> +	if (txq[tx_queue_id] != NULL) {
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
>>> tx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>>> +		txq[tx_queue_id] = NULL;
>>> +	}
>>> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
>>> +		(dev, tx_queue_id, nb_tx_desc, conf);
>>> +	if (ret == 0)
>>> +		dev->data->tx_queue_state[tx_queue_id] =
>>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
>>> +	return eth_err(port_id, ret);
>>> +}
>>> +
>>>    void
>>>    rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
>>>    		void *userdata __rte_unused)
>>> @@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    	rte_errno = ENOTSUP;
>>>    	return NULL;
>>>    #endif
>>> +	struct rte_eth_dev *dev;
>>> +
>>>    	/* check input parameters */
>>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>>>    		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
>>>    		rte_errno = EINVAL;
>>>    		return NULL;
>>>    	}
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>> +		rte_errno = EINVAL;
>>> +		return NULL;
>>> +	}
>>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>>>
>>>    	if (cb == NULL) {
>>> @@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id,
>> uint16_t queue_idx,
>>>    	rte_errno = ENOTSUP;
>>>    	return NULL;
>>>    #endif
>>> +	struct rte_eth_dev *dev;
>>> +
>>>    	/* check input parameters */
>>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
>>>    		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
>>> @@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    		return NULL;
>>>    	}
>>>
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>> +		rte_errno = EINVAL;
>>> +		return NULL;
>>> +	}
>>> +
>>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
>>>
>>>    	if (cb == NULL) {
>>> @@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    		return -EINVAL;
>>>    	}
>>>
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't get queue info for Rx queue %"PRIu16" of device
>> with port_id=%"PRIu16" is hairpin queue\n",
>>
>> "queue" is repeated 3 times above ;) I'm afraid it is too much, may be:
>> "Can't get hairpin Rx queue %" PRIu16 " port %" PRIu16 " info\n"
>> or
>> "Can't get hairpin Rx queue %" PRIu16 " info of device with port_id=%"
>> PRIu16 "\n"
>> Anyway up to you.
>>
> O.K.  I will update.
>
>>> +			queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -
>> ENOTSUP);
>>>    	memset(qinfo, 0, sizeof(*qinfo));
>>> @@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    		return -EINVAL;
>>>    	}
>>>
>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>> +		RTE_ETHDEV_LOG(INFO,
>>> +			"Can't get queue info for Tx queue %"PRIu16" of device
>> with port_id=%"PRIu16" is hairpin queue\n",
>>
>> Same
>>
> Same.
>   
>>> +			queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -
>> ENOTSUP);
>>>    	memset(qinfo, 0, sizeof(*qinfo));
>>> @@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
>> port_id, uint16_t queue_idx,
>>>    }
>>>
>>>    int
>>> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
>>> +				   struct rte_eth_hairpin_cap *cap)
>>> +{
>>> +	struct rte_eth_dev *dev;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
>>> +				-ENOTSUP);
>> Please, move -ENOTSUP to the previous line since line length is sufficient
>> and make it similar to port_id check above.
>>
> Last time I check it didn't have room, I will check again.
>
>>> +	memset(cap, 0, sizeof(*cap));
>>> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
>>> +}
>>> +
>>> +int
>>>    rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
>>>    {
>>>    	struct rte_eth_dev *dev;
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>> index 9e1f9ae..9b69255 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -839,6 +839,46 @@ struct rte_eth_txconf {
>>>    };
>>>
>>>    /**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice
>>> + *
>>> + * A structure used to return the hairpin capabilities that are supported.
>>> + */
>>> +struct rte_eth_hairpin_cap {
>>> +	/** The max number of hairpin queues (different bindings). */
>>> +	uint16_t max_nb_queues;
>>> +	/**< Max number of Rx queues to be connected to one Tx queue. */
>> Should be /**
>>
> Will fix.
>
>>> +	uint16_t max_rx_2_tx;
>>> +	/**< Max number of Tx queues to be connected to one Rx queue. */
>> Should be /**
>>
> Will fix.
>
>> [snip]
>
> Thanks,
> Ori


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-29  7:38         ` Andrew Rybchenko
@ 2019-10-29 19:39           ` Ori Kam
  2019-10-30  6:39             ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-29 19:39 UTC (permalink / raw)
  To: Andrew Rybchenko, John McNamara, Marko Kovacevic,
	Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, October 29, 2019 9:39 AM
> To: Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
> 
> On 10/28/19 9:44 PM, Ori Kam wrote:
> > Hi Andrew,
> >
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Monday, October 28, 2019 5:16 PM
> >> To: Ori Kam <orika@mellanox.com>; John McNamara
> >> <john.mcnamara@intel.com>; Marko Kovacevic
> >> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> Ferruh Yigit <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin
> queue
> >>
> >> Hi Ori,
> >>
> >> On 10/27/19 3:24 PM, Ori Kam wrote:
> >>> This commit introduce hairpin queue type.
> >>>
> >>> The hairpin queue in build from Rx queue binded to Tx queue.
> >>> It is used to offload traffic coming from the wire and redirect it back
> >>> to the wire.
> >>>
> >>> There are 3 new functions:
> >>> - rte_eth_dev_hairpin_capability_get
> >>> - rte_eth_rx_hairpin_queue_setup
> >>> - rte_eth_tx_hairpin_queue_setup
> >>>
> >>> In order to use the queue, there is a need to create rte_flow
> >>> with queue / RSS action that targets one or more of the Rx queues.
> >>>
> >>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> >> LGTM, nothing critical may be except maximum number check
> >> which I lost from my view before.
> >> Plus few style suggestions which may be dropped, but I'd be
> >> happier if applied.
> >>
> >> Thanks.
> >>
> > I really apricate your time and comments,
> > This patch is the base of a number of other series (Meta/Metering)
> > So if it is nothing critical I prefer to get this set merged and then change what
> is needed,
> > if it is O.K by you.
> 
> OK for me
> 

Thanks, I will send a new patch as soon as this get merged.

> > Detail  comments please see below.
> >
> >> [snip]
> >>
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> >>> index 7743205..68aca1f 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.c
> >>> +++ b/lib/librte_ethdev/rte_ethdev.c
> >>> @@ -923,6 +923,13 @@ struct rte_eth_dev *
> >>>
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -
> >> ENOTSUP);
> >>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't start Rx queue %"PRIu16" of device with
> >> port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> Log message looks a bit strange:
> >>       Can't start Rx queue 5 of device with port_id=0 is hairpin queue
> >> may be to put key information first:
> >>       Can't start hairpin Rx queue 5 of device with port_id=0
> >>
> > I'm not a native English speaker but I think the meaning is different.
> 
> Obviously me too
> 
> > In my original log it means that you try to start a queue but fail due to
> > the fact that the queue is hairpin queue.
> >
> > In your version it means that you can't start an hairpin queue but there is no
> > reason why not.
> >
> > What do you think?
> 
> Let's keep your version if there is no better suggestions from native
> speakers.
> 

Thanks,

> >>> +			rx_queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	if (dev->data->rx_queue_state[rx_queue_id] !=
> >> RTE_ETH_QUEUE_STATE_STOPPED) {
> >>>    		RTE_ETHDEV_LOG(INFO,
> >>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> >> already started\n",
> >>> @@ -950,6 +957,13 @@ struct rte_eth_dev *
> >>>
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -
> >> ENOTSUP);
> >>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't stop Rx queue %"PRIu16" of device with
> >> port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> Same
> >>
> > Please see comment above.
> >
> >>> +			rx_queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	if (dev->data->rx_queue_state[rx_queue_id] ==
> >> RTE_ETH_QUEUE_STATE_STOPPED) {
> >>>    		RTE_ETHDEV_LOG(INFO,
> >>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> >> already stopped\n",
> >>> @@ -983,6 +997,13 @@ struct rte_eth_dev *
> >>>
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -
> >> ENOTSUP);
> >>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't start Tx queue %"PRIu16" of device with
> >> port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> Same
> >>
> > Please see comment above.
> >
> >>> +			tx_queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	if (dev->data->tx_queue_state[tx_queue_id] !=
> >> RTE_ETH_QUEUE_STATE_STOPPED) {
> >>>    		RTE_ETHDEV_LOG(INFO,
> >>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> >> already started\n",
> >>> @@ -1008,6 +1029,13 @@ struct rte_eth_dev *
> >>>
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -
> >> ENOTSUP);
> >>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't stop Tx queue %"PRIu16" of device with
> >> port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> Same
> >>
> > Please see comment above.
> >
> >>> +			tx_queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	if (dev->data->tx_queue_state[tx_queue_id] ==
> >> RTE_ETH_QUEUE_STATE_STOPPED) {
> >>>    		RTE_ETHDEV_LOG(INFO,
> >>>    			"Queue %"PRIu16" of device with port_id=%"PRIu16"
> >> already stopped\n",
> >>> @@ -1780,6 +1808,79 @@ struct rte_eth_dev *
> >>>    }
> >>>
> >>>    int
> >>> +rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> >>> +			       uint16_t nb_rx_desc,
> >>> +			       const struct rte_eth_hairpin_conf *conf)
> >>> +{
> >>> +	int ret;
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_hairpin_cap cap;
> >>> +	void **rxq;
> >>> +	int i;
> >>> +	int count = 0;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> >> rx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> >>> +	if (ret != 0)
> >>> +		return ret;
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >> Most likely unsupported hairpin is caught by capability get above.
> >> So, may be it is better to move the check just before usage far below.
> >> Also, if line length is sufficient I think it would better to put -ENOTSUP
> >> to the previous line just to follow port_id check style.
> >>
> > I think that in most function we are starting with the check.
> > personally I like to have basic checks in the beginning of the code.
> > But I will do what you think is best. If I remember correctly the line
> > length is to short, but I will test again.
> 
> Up to you. Thanks.
> 

I will keep my version, but will test again if I can merge it to one line.

> >>> +	/* Use default specified by driver, if nb_rx_desc is zero */
> >>> +	if (nb_rx_desc == 0)
> >>> +		nb_rx_desc = cap.max_nb_desc;
> >> Function description and comment above mentions PMD default, but
> >> there is no default. It uses just maximum. I have no strong opinion
> >> if default is really required or it is OK to say that maximum is used.
> >> The only concern is: why maximum?
> >>
> > Most likely the best value is the max, but I can add a new field to the cap
> > that say default value. What do you think?
> 
> I'm not 100% sure since default requires 0 value handling and
> I think fallback to maximum could be the right handling here.
> May be it is better to document that maximum is used and
> introduce default if it is really required in the future.
> It should be reconsidered when API is promoted to stable
> from experimental.
> 
> Basically both options are OK for me.
> 

I like your idea, I will change the documentation to say that the max will be used.

> >>> +	if (nb_rx_desc > cap.max_nb_desc) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for nb_rx_desc(=%hu), should be: <=
> >> %hu",
> >>> +			nb_rx_desc, cap.max_nb_desc);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_count > cap.max_rx_2_tx) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for number of peers for Rx queue(=%hu),
> >> should be: <= %hu",
> >>> +			conf->peer_count, cap.max_rx_2_tx);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_count == 0) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for number of peers for Rx queue(=%hu),
> >> should be: > 0",
> >>> +			conf->peer_count);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (cap.max_nb_queues != UINT16_MAX) {
> >> I'm not sure that we need to handle it separately. Code below
> >> should handle it and it and there is no point to optimize it.
> >>
> > This is done to save time if the user set uint16_max there is no point to the
> > loop, I can add the check as condition to the loop but then it looks incorrect
> > since we are checking some think that can’t be changed.
> > What do you think?
> 
> Frankly speaking I see no value in the optimization. It is control
> path and I'd prefer simpler code here.
> 

O.K. will add the check in the for command.

> >>> +		for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >> May I suggest to assign count = 0 to make it a bit easier to read and
> >> more robust against future changes.
> >>
> > You mean add count = 0 to the first part of the loop?
> 
> Yes, right now count initialization is done too far from the line.
> 

Will fix.

> >>> +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
> >> The condition should be more tricky if we resetup hairpin queue.
> >> I.e. we should check if i is rx_queue_id and count it anyway.
> >>
> >>> +				count++;
> >>> +		}
> >>> +		if (count > cap.max_nb_queues) {
> >>> +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues
> >> %d",
> >>
> >> I think it would be useful to log max here as well to catch
> >> unset max cases easier.
> >>
> > I'm not sure I understand.
> 
> If the question is about logging, the answer is simple:
> if the user forget to initialize maximum number of hairpin queues
> properly, it will be zero and setup will fail here. So, it would be
> good to log maximum value here just to make it clear which
> limit is exceeded.
> 

Maybe I'm missing something but the PMD sets the max number of hairpin queues.
But in any case I agree we should log what the user requested and what is the max 
that the PMD reports.

> If the question is about above check, let's consider the case when
> maximum is one and one hairpin queue is already setup, but
> user tries to setup one more. Above loop will count only one since
> hairpin state for current queue is set below. So, the condition will
> allow to setup the second hairpin queue.
> In theory, we could initialize cound=1 to count this one, but
> it would break the case when we call setup once again for the
> queue which is already hairpin. API allows and handles it.
> 

Nice catch. I think the best solution is to compare the count to cap.max_nb_queues - 1.
and even before this comparison check if the current queue is already hairpin queue if so
we can skip this check.
What do you think?

> >>> +			count);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	if (dev->data->dev_started)
> >>> +		return -EBUSY;
> >>> +	rxq = dev->data->rx_queues;
> >>> +	if (rxq[rx_queue_id] != NULL) {
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> rx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >>> +		rxq[rx_queue_id] = NULL;
> >>> +	}
> >>> +	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
> >>> +						      nb_rx_desc, conf);
> >>> +	if (ret == 0)
> >>> +		dev->data->rx_queue_state[rx_queue_id] =
> >>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> >>> +	return eth_err(port_id, ret);
> >>> +}
> >>> +
> >>> +int
> >>>    rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >>>    		       uint16_t nb_tx_desc, unsigned int socket_id,
> >>>    		       const struct rte_eth_txconf *tx_conf)
> >>> @@ -1878,6 +1979,78 @@ struct rte_eth_dev *
> >>>    		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
> >>>    }
> >>>
> >>> +int
> >>> +rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
> >>> +			       uint16_t nb_tx_desc,
> >>> +			       const struct rte_eth_hairpin_conf *conf)
> >> Same notes as for Rx queue above.
> >>
> > O.K. same comments.
> >
> >>> +{
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_hairpin_cap cap;
> >>> +	void **txq;
> >>> +	int i;
> >>> +	int count = 0;
> >>> +	int ret;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (tx_queue_id >= dev->data->nb_tx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> >> tx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
> >>> +	if (ret != 0)
> >>> +		return ret;
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_hairpin_queue_setup,
> >>> +				-ENOTSUP);
> >>> +	/* Use default specified by driver, if nb_tx_desc is zero */
> >>> +	if (nb_tx_desc == 0)
> >>> +		nb_tx_desc = cap.max_nb_desc;
> >>> +	if (nb_tx_desc > cap.max_nb_desc) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for nb_tx_desc(=%hu), should be: <=
> >> %hu",
> >>> +			nb_tx_desc, cap.max_nb_desc);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_count > cap.max_tx_2_rx) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for number of peers for Tx queue(=%hu),
> >> should be: <= %hu",
> >>> +			conf->peer_count, cap.max_tx_2_rx);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (conf->peer_count == 0) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for number of peers for Tx queue(=%hu),
> >> should be: > 0",
> >>> +			conf->peer_count);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	if (cap.max_nb_queues != UINT16_MAX) {
> >>> +		for (i = 0; i < dev->data->nb_tx_queues; i++) {
> >>> +			if (rte_eth_dev_is_tx_hairpin_queue(dev, i))
> >>> +				count++;
> >>> +		}
> >>> +		if (count > cap.max_nb_queues) {
> >>> +			RTE_ETHDEV_LOG(ERR,
> >>> +				       "To many Tx hairpin queues %d", count);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	if (dev->data->dev_started)
> >>> +		return -EBUSY;
> >>> +	txq = dev->data->tx_queues;
> >>> +	if (txq[tx_queue_id] != NULL) {
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >>> tx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> >>> +		txq[tx_queue_id] = NULL;
> >>> +	}
> >>> +	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
> >>> +		(dev, tx_queue_id, nb_tx_desc, conf);
> >>> +	if (ret == 0)
> >>> +		dev->data->tx_queue_state[tx_queue_id] =
> >>> +			RTE_ETH_QUEUE_STATE_HAIRPIN;
> >>> +	return eth_err(port_id, ret);
> >>> +}
> >>> +
> >>>    void
> >>>    rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t
> unsent,
> >>>    		void *userdata __rte_unused)
> >>> @@ -4007,12 +4180,19 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    	rte_errno = ENOTSUP;
> >>>    	return NULL;
> >>>    #endif
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>>    	/* check input parameters */
> >>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >>>    		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
> >>>    		rte_errno = EINVAL;
> >>>    		return NULL;
> >>>    	}
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> >>> +		rte_errno = EINVAL;
> >>> +		return NULL;
> >>> +	}
> >>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >>>
> >>>    	if (cb == NULL) {
> >>> @@ -4084,6 +4264,8 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id,
> >> uint16_t queue_idx,
> >>>    	rte_errno = ENOTSUP;
> >>>    	return NULL;
> >>>    #endif
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>>    	/* check input parameters */
> >>>    	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
> >>>    		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
> >>> @@ -4091,6 +4273,12 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    		return NULL;
> >>>    	}
> >>>
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> >>> +		rte_errno = EINVAL;
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>>    	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> >>>
> >>>    	if (cb == NULL) {
> >>> @@ -4204,6 +4392,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    		return -EINVAL;
> >>>    	}
> >>>
> >>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't get queue info for Rx queue %"PRIu16" of device
> >> with port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> "queue" is repeated 3 times above ;) I'm afraid it is too much, may be:
> >> "Can't get hairpin Rx queue %" PRIu16 " port %" PRIu16 " info\n"
> >> or
> >> "Can't get hairpin Rx queue %" PRIu16 " info of device with port_id=%"
> >> PRIu16 "\n"
> >> Anyway up to you.
> >>
> > O.K.  I will update.
> >
> >>> +			queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -
> >> ENOTSUP);
> >>>    	memset(qinfo, 0, sizeof(*qinfo));
> >>> @@ -4228,6 +4423,13 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    		return -EINVAL;
> >>>    	}
> >>>
> >>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> >>> +		RTE_ETHDEV_LOG(INFO,
> >>> +			"Can't get queue info for Tx queue %"PRIu16" of device
> >> with port_id=%"PRIu16" is hairpin queue\n",
> >>
> >> Same
> >>
> > Same.
> >
> >>> +			queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>>    	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -
> >> ENOTSUP);
> >>>    	memset(qinfo, 0, sizeof(*qinfo));
> >>> @@ -4600,6 +4802,21 @@ int rte_eth_set_queue_rate_limit(uint16_t
> >> port_id, uint16_t queue_idx,
> >>>    }
> >>>
> >>>    int
> >>> +rte_eth_dev_hairpin_capability_get(uint16_t port_id,
> >>> +				   struct rte_eth_hairpin_cap *cap)
> >>> +{
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get,
> >>> +				-ENOTSUP);
> >> Please, move -ENOTSUP to the previous line since line length is sufficient
> >> and make it similar to port_id check above.
> >>
> > Last time I check it didn't have room, I will check again.
> >
> >>> +	memset(cap, 0, sizeof(*cap));
> >>> +	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
> >>> +}
> >>> +
> >>> +int
> >>>    rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
> >>>    {
> >>>    	struct rte_eth_dev *dev;
> >>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> >>> index 9e1f9ae..9b69255 100644
> >>> --- a/lib/librte_ethdev/rte_ethdev.h
> >>> +++ b/lib/librte_ethdev/rte_ethdev.h
> >>> @@ -839,6 +839,46 @@ struct rte_eth_txconf {
> >>>    };
> >>>
> >>>    /**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> >> notice
> >>> + *
> >>> + * A structure used to return the hairpin capabilities that are supported.
> >>> + */
> >>> +struct rte_eth_hairpin_cap {
> >>> +	/** The max number of hairpin queues (different bindings). */
> >>> +	uint16_t max_nb_queues;
> >>> +	/**< Max number of Rx queues to be connected to one Tx queue. */
> >> Should be /**
> >>
> > Will fix.
> >
> >>> +	uint16_t max_rx_2_tx;
> >>> +	/**< Max number of Tx queues to be connected to one Rx queue. */
> >> Should be /**
> >>
> > Will fix.
> >
> >> [snip]
> >
> > Thanks,
> > Ori

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-29 19:39           ` Ori Kam
@ 2019-10-30  6:39             ` Andrew Rybchenko
  2019-10-30  6:56               ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-30  6:39 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

Hi Ori,

On 10/29/19 10:39 PM, Ori Kam wrote:
>> On 10/28/19 9:44 PM, Ori Kam wrote:
>>>> On 10/27/19 3:24 PM, Ori Kam wrote:
>>>>> +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
>>>> The condition should be more tricky if we resetup hairpin queue.
>>>> I.e. we should check if i is rx_queue_id and count it anyway.
>>>>
>>>>> +				count++;
>>>>> +		}
>>>>> +		if (count > cap.max_nb_queues) {
>>>>> +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues
>>>> %d",
>>>>
>>>> I think it would be useful to log max here as well to catch
>>>> unset max cases easier.
>>>>
>>> I'm not sure I understand.
>> If the question is about logging, the answer is simple:
>> if the user forget to initialize maximum number of hairpin queues
>> properly, it will be zero and setup will fail here. So, it would be
>> good to log maximum value here just to make it clear which
>> limit is exceeded.
>>
> Maybe I'm missing something but the PMD sets the max number of hairpin queues.

Yes, it is just my paranoia to simplify debugging the case if PMD 
forgets to do it.

> But in any case I agree we should log what the user requested and what is the max
> that the PMD reports.
>
>> If the question is about above check, let's consider the case when
>> maximum is one and one hairpin queue is already setup, but
>> user tries to setup one more. Above loop will count only one since
>> hairpin state for current queue is set below. So, the condition will
>> allow to setup the second hairpin queue.
>> In theory, we could initialize cound=1 to count this one, but
>> it would break the case when we call setup once again for the
>> queue which is already hairpin. API allows and handles it.
>>
> Nice catch. I think the best solution is to compare the count to cap.max_nb_queues - 1.
> and even before this comparison check if the current queue is already hairpin queue if so
> we can skip this check.
> What do you think?

I think the right solution is always count current queue since it is either
becoming hairpin or already hairpin, i.e.

if (i == rx_queue_id || rte_eth_dev_is_rx_hairpin_queue(dev, i))

So, the result will be always total number of hairpin queues if
this one is hairpin.

Andrew.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
  2019-10-30  6:39             ` Andrew Rybchenko
@ 2019-10-30  6:56               ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30  6:56 UTC (permalink / raw)
  To: Andrew Rybchenko, John McNamara, Marko Kovacevic,
	Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Wednesday, October 30, 2019 8:39 AM
> To: Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Ferruh Yigit <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue
> 
> Hi Ori,
> 
> On 10/29/19 10:39 PM, Ori Kam wrote:
> >> On 10/28/19 9:44 PM, Ori Kam wrote:
> >>>> On 10/27/19 3:24 PM, Ori Kam wrote:
> >>>>> +			if (rte_eth_dev_is_rx_hairpin_queue(dev, i))
> >>>> The condition should be more tricky if we resetup hairpin queue.
> >>>> I.e. we should check if i is rx_queue_id and count it anyway.
> >>>>
> >>>>> +				count++;
> >>>>> +		}
> >>>>> +		if (count > cap.max_nb_queues) {
> >>>>> +			RTE_ETHDEV_LOG(ERR, "To many Rx hairpin
> queues
> >>>> %d",
> >>>>
> >>>> I think it would be useful to log max here as well to catch
> >>>> unset max cases easier.
> >>>>
> >>> I'm not sure I understand.
> >> If the question is about logging, the answer is simple:
> >> if the user forget to initialize maximum number of hairpin queues
> >> properly, it will be zero and setup will fail here. So, it would be
> >> good to log maximum value here just to make it clear which
> >> limit is exceeded.
> >>
> > Maybe I'm missing something but the PMD sets the max number of hairpin
> queues.
> 
> Yes, it is just my paranoia to simplify debugging the case if PMD
> forgets to do it.
> 

Paranoia is a good thing.
I will add the logging.

> > But in any case I agree we should log what the user requested and what is the
> max
> > that the PMD reports.
> >
> >> If the question is about above check, let's consider the case when
> >> maximum is one and one hairpin queue is already setup, but
> >> user tries to setup one more. Above loop will count only one since
> >> hairpin state for current queue is set below. So, the condition will
> >> allow to setup the second hairpin queue.
> >> In theory, we could initialize cound=1 to count this one, but
> >> it would break the case when we call setup once again for the
> >> queue which is already hairpin. API allows and handles it.
> >>
> > Nice catch. I think the best solution is to compare the count to
> cap.max_nb_queues - 1.
> > and even before this comparison check if the current queue is already hairpin
> queue if so
> > we can skip this check.
> > What do you think?
> 
> I think the right solution is always count current queue since it is either
> becoming hairpin or already hairpin, i.e.
> 
> if (i == rx_queue_id || rte_eth_dev_is_rx_hairpin_queue(dev, i))
> 
> So, the result will be always total number of hairpin queues if
> this one is hairpin.
> 
> Andrew.

Good Idea will implement it.

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 00/14] add hairpin feature
  2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
                   ` (18 preceding siblings ...)
  2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
@ 2019-10-30 23:53 ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 01/14] ethdev: move queue state defines to private file Ori Kam
                     ` (14 more replies)
  19 siblings, 15 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  Cc: dev, orika, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger,
	thomas, ferruh.yigit, arybchenko, viacheslavo

This patch set implements the hairpin feature.
The hairpin feature was introduced in RFC[1]

The hairpin feature (different name can be forward) acts as "bump on the wire",
meaning that a packet that is received from the wire can be modified using
offloaded action and then sent back to the wire without application intervention
which save CPU cycles.

The hairpin is the inverse function of loopback in which application
sends a packet then it is received again by the
application without being sent to the wire.

The hairpin can be used by a number of different NVF, for example load
balancer, gateway and so on.

As can be seen from the hairpin description, hairpin is basically RX queue
connected to TX queue.

During the design phase I was thinking of two ways to implement this
feature the first one is adding a new rte flow action. and the second
one is create a special kind of queue.

The advantages of using the queue approch:
1. More control for the application. queue depth (the memory size that
should be used).
2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
will be easy to integrate with such system.
3. Native integression with the rte flow API. Just setting the target
queue/rss to hairpin queue, will result that the traffic will be routed
to the hairpin queue.
4. Enable queue offloading.

Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
different ports assuming the PMD supports it. The same goes the other
way each hairpin Txq can be connected to one or more Rxqs.
This is the reason that both the Txq setup and Rxq setup are getting the
hairpin configuration structure.

From PMD prespctive the number of Rxq/Txq is the total of standard
queues + hairpin queues.

To configure hairpin queue the user should call
rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
of the normal queue setup functions.

The hairpin queues are not part of the normal RSS functiosn.

To use the queues the user simply create a flow that points to RSS/queue
actions that are hairpin queues.
The reason for selecting 2 new functions for hairpin queue setup are:
1. avoid API break.
2. avoid extra and unused parameters.



[1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/

Cc: wenzhuo.lu@intel.com
Cc: bernard.iremonger@intel.com
Cc: thomas@monjalon.net
Cc: ferruh.yigit@intel.com
Cc: arybchenko@solarflare.com
Cc: viacheslavo@mellanox.com

------
V7:
 - all changes are in patch 2: ethdev: add support for hairpin queue
   - Move is_rx/tx_hairpin_queue to ethdev.c and ethdev.h also remove the inline.
   - change checks for max number of hairpin queues.
   - modify log messages.

V6:
 - add missing include in nfb driver.
 - change comparing of rte_eth_dev_is_tx_hairpin_queue /
   rte_eth_dev_is_rx_hairpin_queue to boolean operator.
 - split the doc patch to the relevant patches.

V5:
 - modify log messages to be more distinct.
 - set that log message will be in the same line even if > 80.
 - change peer_n to peer_count.
 - add functions to get if queue is hairpin queue.

V4:
 - update according to comments from ML.

V3:
 - update according to comments from ML.

V2:
 - update according to comments from ML.




Ori Kam (14):
  ethdev: move queue state defines to private file
  ethdev: add support for hairpin queue
  net/mlx5: query hca hairpin capabilities
  net/mlx5: support Rx hairpin queues
  net/mlx5: prepare txq to work with different types
  net/mlx5: support Tx hairpin queues
  net/mlx5: add get hairpin capabilities
  app/testpmd: add hairpin support
  net/mlx5: add hairpin binding function
  net/mlx5: add support for hairpin hrxq
  net/mlx5: add internal tag item and action
  net/mlx5: add id generation function
  net/mlx5: add default flows for hairpin
  net/mlx5: split hairpin flows

 app/test-pmd/parameters.c                |  28 +++
 app/test-pmd/testpmd.c                   | 109 ++++++++-
 app/test-pmd/testpmd.h                   |   3 +
 doc/guides/rel_notes/release_19_11.rst   |   6 +
 drivers/net/mlx5/mlx5.c                  | 170 ++++++++++++-
 drivers/net/mlx5/mlx5.h                  |  69 +++++-
 drivers/net/mlx5/mlx5_devx_cmds.c        | 194 +++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c           | 129 ++++++++--
 drivers/net/mlx5/mlx5_flow.c             | 393 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h             |  67 +++++-
 drivers/net/mlx5/mlx5_flow_dv.c          | 231 +++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c       |  11 +-
 drivers/net/mlx5/mlx5_prm.h              | 127 +++++++++-
 drivers/net/mlx5/mlx5_rss.c              |   1 +
 drivers/net/mlx5/mlx5_rxq.c              | 318 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_rxtx.c             |   2 +-
 drivers/net/mlx5/mlx5_rxtx.h             |  68 +++++-
 drivers/net/mlx5/mlx5_trigger.c          | 140 ++++++++++-
 drivers/net/mlx5/mlx5_txq.c              | 294 +++++++++++++++++++----
 drivers/net/nfb/nfb_tx.h                 |   1 +
 lib/librte_ethdev/rte_ethdev.c           | 232 ++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 177 +++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 ++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |   7 +
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 25 files changed, 2704 insertions(+), 167 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 01/14] ethdev: move queue state defines to private file
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue Ori Kam
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Rastislav Cernay, Jan Remes, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

The queue state defines are internal to the DPDK.
This commit moves them to a private header file.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 drivers/net/nfb/nfb_tx.h              | 1 +
 lib/librte_ethdev/rte_ethdev.h        | 6 ------
 lib/librte_ethdev/rte_ethdev_driver.h | 6 ++++++
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/nfb/nfb_tx.h b/drivers/net/nfb/nfb_tx.h
index edf5ede..b6578cc 100644
--- a/drivers/net/nfb/nfb_tx.h
+++ b/drivers/net/nfb/nfb_tx.h
@@ -10,6 +10,7 @@
 #include <nfb/nfb.h>
 #include <nfb/ndp.h>
 
+#include <rte_ethdev_driver.h>
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
 
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index c36c1b6..9e1f9ae 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1336,12 +1336,6 @@ struct rte_eth_dcb_info {
 	struct rte_eth_dcb_tc_queue_mapping tc_queue;
 };
 
-/**
- * RX/TX queue states
- */
-#define RTE_ETH_QUEUE_STATE_STOPPED 0
-#define RTE_ETH_QUEUE_STATE_STARTED 1
-
 #define RTE_ETH_ALL RTE_MAX_ETHPORTS
 
 /* Macros to check for valid port */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 936ff8c..c404f17 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -22,6 +22,12 @@
 #endif
 
 /**
+ * RX/TX queue states
+ */
+#define RTE_ETH_QUEUE_STATE_STOPPED 0
+#define RTE_ETH_QUEUE_STATE_STARTED 1
+
+/**
  * @internal
  * Returns a ethdev slot specified by the unique identifier name.
  *
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 01/14] ethdev: move queue state defines to private file Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-31  8:25     ` Andrew Rybchenko
  2019-11-05 11:24     ` Ferruh Yigit
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
                     ` (12 subsequent siblings)
  14 siblings, 2 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce hairpin queue type.

The hairpin queue in build from Rx queue binded to Tx queue.
It is used to offload traffic coming from the wire and redirect it back
to the wire.

There are 3 new functions:
- rte_eth_dev_hairpin_capability_get
- rte_eth_rx_hairpin_queue_setup
- rte_eth_tx_hairpin_queue_setup

In order to use the queue, there is a need to create rte_flow
with queue / RSS action that targets one or more of the Rx queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
V7:
 - Move is_rx/tx_hairpin_queue to ethdev.c and ethdev.h also remove the inline.
 - change checks for max number of hairpin queues.
 - modify log messages.

V6:
 - change comparision of rte_eth_dev_is_tx/tx_hairpin_queue to boolean.
 - add hairpin to release note.

V5:
 - add function to check if queue is hairpin queue.
 - modify log messages to be more distinct.
 - update log messages to be only on one line.
 - change peer_n to peer_count.

V4:
 - update according to ML comments.

V3:
 - update according to ML comments.

V2:
 - update according to ML comments
---
 doc/guides/rel_notes/release_19_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev.c           | 232 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 171 ++++++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_core.h      |  91 +++++++++++-
 lib/librte_ethdev/rte_ethdev_driver.h    |   1 +
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 6 files changed, 495 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index ae8e7b2..6871453 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -231,6 +231,11 @@ New Features
   * Added a console command to testpmd app, ``show port (port_id) ptypes`` which
     gives ability to print port supported ptypes in different protocol layers.
 
+* **Added hairpin queue.**
+
+  On supported NICs, we can now setup haipin queue which will offload packets
+  from the wire, backto the wire.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7743205..4c6725f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -923,6 +923,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Rx hairpin queue %"PRIu16" of device with port_id=%"PRIu16"\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -950,6 +957,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Rx hairpin queue %"PRIu16" of device with port_id=%"PRIu16"\n",
+			rx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->rx_queue_state[rx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -983,6 +997,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't start Tx hairpin queue %"PRIu16" of device with port_id=%"PRIu16"\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already started\n",
@@ -1008,6 +1029,13 @@ struct rte_eth_dev *
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't stop Tx hairpin queue %"PRIu16" of device with port_id=%"PRIu16"\n",
+			tx_queue_id, port_id);
+		return -EINVAL;
+	}
+
 	if (dev->data->tx_queue_state[tx_queue_id] == RTE_ETH_QUEUE_STATE_STOPPED) {
 		RTE_ETHDEV_LOG(INFO,
 			"Queue %"PRIu16" of device with port_id=%"PRIu16" already stopped\n",
@@ -1780,6 +1808,78 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			       uint16_t nb_rx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	int ret;
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **rxq;
+	int i;
+	int count;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* if nb_rx_desc is zero use max number of desc from the driver. */
+	if (nb_rx_desc == 0)
+		nb_rx_desc = cap.max_nb_desc;
+	if (nb_rx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu",
+			nb_rx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_rx_2_tx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_rx_2_tx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Rx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	for (i = 0, count = 0; i < dev->data->nb_rx_queues &&
+	     cap.max_nb_queues != UINT16_MAX; i++) {
+		if (i == rx_queue_id || rte_eth_dev_is_rx_hairpin_queue(dev, i))
+			count++;
+	}
+	if (count > cap.max_nb_queues) {
+		RTE_ETHDEV_LOG(ERR, "To many Rx hairpin queues max is %d",
+		cap.max_nb_queues);
+		return -EINVAL;
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->rx_hairpin_queue_setup)(dev, rx_queue_id,
+						      nb_rx_desc, conf);
+	if (ret == 0)
+		dev->data->rx_queue_state[rx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
 		       const struct rte_eth_txconf *tx_conf)
@@ -1878,6 +1978,77 @@ struct rte_eth_dev *
 		       tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
+int
+rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
+			       uint16_t nb_tx_desc,
+			       const struct rte_eth_hairpin_conf *conf)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_hairpin_cap cap;
+	void **txq;
+	int i;
+	int count;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+	dev = &rte_eth_devices[port_id];
+	if (tx_queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", tx_queue_id);
+		return -EINVAL;
+	}
+	ret = rte_eth_dev_hairpin_capability_get(port_id, &cap);
+	if (ret != 0)
+		return ret;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_hairpin_queue_setup,
+				-ENOTSUP);
+	/* if nb_rx_desc is zero use max number of desc from the driver. */
+	if (nb_tx_desc == 0)
+		nb_tx_desc = cap.max_nb_desc;
+	if (nb_tx_desc > cap.max_nb_desc) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_tx_desc(=%hu), should be: <= %hu",
+			nb_tx_desc, cap.max_nb_desc);
+		return -EINVAL;
+	}
+	if (conf->peer_count > cap.max_tx_2_rx) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: <= %hu",
+			conf->peer_count, cap.max_tx_2_rx);
+		return -EINVAL;
+	}
+	if (conf->peer_count == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for number of peers for Tx queue(=%hu), should be: > 0",
+			conf->peer_count);
+		return -EINVAL;
+	}
+	for (i = 0, count = 0; i < dev->data->nb_tx_queues &&
+	     cap.max_nb_queues != UINT16_MAX; i++) {
+		if (i == tx_queue_id || rte_eth_dev_is_tx_hairpin_queue(dev, i))
+			count++;
+	}
+	if (count > cap.max_nb_queues) {
+		RTE_ETHDEV_LOG(ERR, "To many Tx hairpin queues max is %d",
+		cap.max_nb_queues);
+		return -EINVAL;
+	}
+	if (dev->data->dev_started)
+		return -EBUSY;
+	txq = dev->data->tx_queues;
+	if (txq[tx_queue_id] != NULL) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
+		txq[tx_queue_id] = NULL;
+	}
+	ret = (*dev->dev_ops->tx_hairpin_queue_setup)
+		(dev, tx_queue_id, nb_tx_desc, conf);
+	if (ret == 0)
+		dev->data->tx_queue_state[tx_queue_id] =
+			RTE_ETH_QUEUE_STATE_HAIRPIN;
+	return eth_err(port_id, ret);
+}
+
 void
 rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata __rte_unused)
@@ -4007,12 +4178,19 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4084,6 +4262,8 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	rte_errno = ENOTSUP;
 	return NULL;
 #endif
+	struct rte_eth_dev *dev;
+
 	/* check input parameters */
 	if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
 		    queue_id >= rte_eth_devices[port_id].data->nb_tx_queues) {
@@ -4091,6 +4271,12 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return NULL;
 	}
 
+	dev = &rte_eth_devices[port_id];
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
 	struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
 
 	if (cb == NULL) {
@@ -4204,6 +4390,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get hairpin Rx queue %"PRIu16" info of device with port_id=%"PRIu16"\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4228,6 +4421,13 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 		return -EINVAL;
 	}
 
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(INFO,
+			"Can't get hairpin Tx queue %"PRIu16" info of device with port_id=%"PRIu16"\n",
+			queue_id, port_id);
+		return -EINVAL;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
 
 	memset(qinfo, 0, sizeof(*qinfo));
@@ -4600,6 +4800,38 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 }
 
 int
+rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				   struct rte_eth_hairpin_cap *cap)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hairpin_cap_get, -ENOTSUP);
+	memset(cap, 0, sizeof(*cap));
+	return eth_err(port_id, (*dev->dev_ops->hairpin_cap_get)(dev, cap));
+}
+
+int
+rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->rx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
+
+int
+rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	if (dev->data->tx_queue_state[queue_id] ==
+	    RTE_ETH_QUEUE_STATE_HAIRPIN)
+		return 1;
+	return 0;
+}
+
+int
 rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 9e1f9ae..d2dc8ab 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -839,6 +839,46 @@ struct rte_eth_txconf {
 };
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the hairpin capabilities that are supported.
+ */
+struct rte_eth_hairpin_cap {
+	/** The max number of hairpin queues (different bindings). */
+	uint16_t max_nb_queues;
+	/** Max number of Rx queues to be connected to one Tx queue. */
+	uint16_t max_rx_2_tx;
+	/** Max number of Tx queues to be connected to one Rx queue. */
+	uint16_t max_tx_2_rx;
+	uint16_t max_nb_desc; /**< The max num of descriptors. */
+};
+
+#define RTE_ETH_MAX_HAIRPIN_PEERS 32
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to hold hairpin peer data.
+ */
+struct rte_eth_hairpin_peer {
+	uint16_t port; /**< Peer port. */
+	uint16_t queue; /**< Peer queue. */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure hairpin binding.
+ */
+struct rte_eth_hairpin_conf {
+	uint16_t peer_count; /**< The number of peers. */
+	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
+};
+
+/**
  * A structure contains information about HW descriptor ring limitations.
  */
 struct rte_eth_desc_lim {
@@ -1829,6 +1869,37 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		struct rte_mempool *mb_pool);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a hairpin receive queue for an Ethernet device.
+ *
+ * The function set up the selected queue to be used in hairpin.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ *   0 means the PMD will use default value.
+ * @param conf
+ *   The pointer to the hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_rx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Allocate and set up a transmit queue for an Ethernet device.
  *
  * @param port_id
@@ -1881,6 +1952,35 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 		const struct rte_eth_txconf *tx_conf);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Allocate and set up a transmit hairpin queue for an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param tx_queue_id
+ *   The index of the transmit queue to set up.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_tx_desc
+ *   The number of transmit descriptors to allocate for the transmit ring.
+ *   0 to set default PMD value.
+ * @param conf
+ *   The hairpin configuration.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENOMEM) if unable to allocate the resources.
+ */
+__rte_experimental
+int rte_eth_tx_hairpin_queue_setup
+	(uint16_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
  * Return the NUMA socket to which an Ethernet device is connected
  *
  * @param port_id
@@ -1915,7 +2015,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1932,7 +2032,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the receive queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1950,7 +2050,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is started.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -1967,7 +2067,7 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  *   to rte_eth_dev_configure().
  * @return
  *   - 0: Success, the transmit queue is stopped.
- *   - -EINVAL: The port_id or the queue_id out of range.
+ *   - -EINVAL: The port_id or the queue_id out of range or belong to hairpin.
  *   - -EIO: if device is removed.
  *   - -ENOTSUP: The function not supported in PMD driver.
  */
@@ -3633,7 +3733,8 @@ int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
@@ -3653,7 +3754,8 @@ int rte_eth_rx_queue_info_get(uint16_t port_id, uint16_t queue_id,
  * @return
  *   - 0: Success
  *   - -ENOTSUP: routine is not supported by the device PMD.
- *   - -EINVAL:  The port_id or the queue_id is out of range.
+ *   - -EINVAL:  The port_id or the queue_id is out of range, or the queue
+ *               is hairpin queue.
  */
 int rte_eth_tx_queue_info_get(uint16_t port_id, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
@@ -4151,10 +4253,57 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 void *
 rte_eth_dev_get_sec_ctx(uint16_t port_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Query the device hairpin capabilities.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Pointer to a structure that will hold the hairpin capabilities.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ */
+__rte_experimental
+int rte_eth_dev_hairpin_capability_get(uint16_t port_id,
+				       struct rte_eth_hairpin_cap *cap);
 
 #include <rte_ethdev_core.h>
 
 /**
+ * @internal
+ * Check if the selected Rx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+int
+rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+
+/**
+ * @internal
+ * Check if the selected Tx queue is hairpin queue.
+ *
+ * @param dev
+ *  Pointer to the selected device.
+ * @param queue_id
+ *  The selected queue.
+ *
+ * @return
+ *   - (1) if the queue is hairpin queue, 0 otherwise.
+ */
+int
+rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+
+/**
  *
  * Retrieve a burst of input packets from a receive queue of an Ethernet
  * device. The retrieved packets are stored in *rte_mbuf* structures whose
@@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
 				     rx_pkts, nb_pkts);
@@ -4517,6 +4671,11 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
 		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
 		return 0;
 	}
+	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
+		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is hairpin queue\n",
+			       queue_id);
+		return 0;
+	}
 #endif
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 392aea8..f215af7 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -509,6 +509,86 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 /**< @internal Test if a port supports specific mempool ops */
 
 /**
+ * @internal
+ * Get the hairpin capabilities.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param cap
+ *   returns the hairpin capabilities from the device.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ */
+typedef int (*eth_hairpin_cap_get_t)(struct rte_eth_dev *dev,
+				     struct rte_eth_hairpin_cap *cap);
+
+/**
+ * @internal
+ * Setup RX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param rx_queue_id
+ *   the selected RX queue index.
+ * @param nb_rx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the RX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_rx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+	 uint16_t nb_rx_desc,
+	 const struct rte_eth_hairpin_conf *conf);
+
+/**
+ * @internal
+ * Setup TX hairpin queue.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ * @param tx_queue_id
+ *   the selected TX queue index.
+ * @param nb_tx_desc
+ *   the requested number of descriptors for this queue. 0 - use PMD default.
+ * @param conf
+ *   the TX hairpin configuration structure.
+ *
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success, hairpin is supported.
+ * @retval -ENOTSUP
+ *   Hairpin is not supported.
+ * @retval -EINVAL
+ *   One of the parameters is invalid.
+ * @retval -ENOMEM
+ *   Unable to allocate resources.
+ */
+typedef int (*eth_tx_hairpin_queue_setup_t)
+	(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+	 uint16_t nb_tx_desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
+
+/**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
 struct eth_dev_ops {
@@ -644,6 +724,13 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_hairpin_cap_get_t hairpin_cap_get;
+	/**< Returns the hairpin capabilities. */
+	eth_rx_hairpin_queue_setup_t rx_hairpin_queue_setup;
+	/**< Set up device RX hairpin queue. */
+	eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup;
+	/**< Set up device TX hairpin queue. */
 };
 
 /**
@@ -751,9 +838,9 @@ struct rte_eth_dev_data {
 		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
 		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
 	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
-			/**< Queues state: STARTED(1) / STOPPED(0). */
+		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
 	uint32_t dev_flags;             /**< Capabilities. */
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough. */
 	int numa_node;                  /**< NUMA node connection. */
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index c404f17..59d4c01 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -26,6 +26,7 @@
  */
 #define RTE_ETH_QUEUE_STATE_STOPPED 0
 #define RTE_ETH_QUEUE_STATE_STARTED 1
+#define RTE_ETH_QUEUE_STATE_HAIRPIN 2
 
 /**
  * @internal
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index e59d516..48b5389 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -288,4 +288,7 @@ EXPERIMENTAL {
 	rte_eth_rx_burst_mode_get;
 	rte_eth_tx_burst_mode_get;
 	rte_eth_burst_mode_option_name;
+	rte_eth_rx_hairpin_queue_setup;
+	rte_eth_tx_hairpin_queue_setup;
+	rte_eth_dev_hairpin_capability_get;
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 03/14] net/mlx5: query hca hairpin capabilities
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 01/14] ethdev: move queue state defines to private file Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 04/14] net/mlx5: support Rx hairpin queues Ori Kam
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit query and store the hairpin capabilities from the device.

Those capabilities will be used when creating the hairpin queue.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           | 4 ++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b6a51b2..ee04dd0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -187,6 +187,10 @@ struct mlx5_hca_attr {
 	uint32_t lro_max_msg_sz_mode:2;
 	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
 	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 51947d3..17c1671 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -327,6 +327,13 @@ struct mlx5_devx_obj *
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 04/14] net/mlx5: support Rx hairpin queues
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (2 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 05/14] net/mlx5: prepare txq to work with different types Ori Kam
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Rx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW. This results in that all the data part of the RQ is not being
used.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |   2 +
 drivers/net/mlx5/mlx5_rxq.c     | 270 ++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_rxtx.h    |  15 +++
 drivers/net/mlx5/mlx5_trigger.c |   7 ++
 4 files changed, 270 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6dd3def..87993b3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -985,6 +985,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
@@ -1051,6 +1052,7 @@ struct mlx5_dev_spawn_data {
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f0ab843..c70e161 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -106,21 +106,25 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t i;
 	uint16_t n = 0;
+	uint16_t n_ibv = 0;
 
 	if (mlx5_check_mprq_support(dev) < 0)
 		return 0;
 	/* All the configured queues should be enabled. */
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (!rxq)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		if (mlx5_rxq_mprq_enabled(rxq))
 			++n;
 	}
 	/* Multi-Packet RQ can't be partially configured. */
-	assert(n == 0 || n == priv->rxqs_n);
-	return n == priv->rxqs_n;
+	assert(n == 0 || n == n_ibv);
+	return n == n_ibv;
 }
 
 /**
@@ -427,6 +431,7 @@
 }
 
 /**
+ * Rx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -434,25 +439,14 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+static int
+mlx5_rx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 
 	if (!rte_is_power_of_2(desc)) {
 		desc = 1 << log2above(desc);
@@ -476,6 +470,41 @@
 		return -rte_errno;
 	}
 	mlx5_rxq_release(dev, idx);
+	return 0;
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -490,6 +519,56 @@
 }
 
 /**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   Hairpin configuration parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+	int res;
+
+	res = mlx5_rx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->txqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	rxq_ctrl = mlx5_rxq_hairpin_new(dev, idx, desc, hairpin_conf);
+	if (!rxq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Rx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->rxqs)[idx] = &rxq_ctrl->rxq;
+	return 0;
+}
+
+/**
  * DPDK callback to release a RX queue.
  *
  * @param dpdk_rxq
@@ -561,6 +640,24 @@
 }
 
 /**
+ * Release an Rx hairpin related resources.
+ *
+ * @param rxq_obj
+ *   Hairpin Rx queue object.
+ */
+static void
+rxq_obj_hairpin_release(struct mlx5_rxq_obj *rxq_obj)
+{
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+
+	assert(rxq_obj);
+	rq_attr.state = MLX5_RQC_STATE_RST;
+	rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+	mlx5_devx_cmd_modify_rq(rxq_obj->rq, &rq_attr);
+	claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
+}
+
+/**
  * Release an Rx verbs/DevX queue object.
  *
  * @param rxq_obj
@@ -577,14 +674,22 @@
 		assert(rxq_obj->wq);
 	assert(rxq_obj->cq);
 	if (rte_atomic32_dec_and_test(&rxq_obj->refcnt)) {
-		rxq_free_elts(rxq_obj->rxq_ctrl);
-		if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_IBV) {
+		switch (rxq_obj->type) {
+		case MLX5_RXQ_OBJ_TYPE_IBV:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_glue->destroy_wq(rxq_obj->wq));
-		} else if (rxq_obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_RQ) {
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_RQ:
+			rxq_free_elts(rxq_obj->rxq_ctrl);
 			claim_zero(mlx5_devx_cmd_destroy(rxq_obj->rq));
 			rxq_release_rq_resources(rxq_obj->rxq_ctrl);
+			claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
+			break;
+		case MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN:
+			rxq_obj_hairpin_release(rxq_obj);
+			break;
 		}
-		claim_zero(mlx5_glue->destroy_cq(rxq_obj->cq));
 		if (rxq_obj->channel)
 			claim_zero(mlx5_glue->destroy_comp_channel
 				   (rxq_obj->channel));
@@ -1132,6 +1237,70 @@
 }
 
 /**
+ * Create the Rx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Rx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_rxq_obj *
+mlx5_rxq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_rxq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(rxq_data);
+	assert(!rxq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 rxq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Rx queue %u cannot allocate verbs resources",
+			dev->data->port_id, rxq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->rxq_ctrl = rxq_ctrl;
+	attr.hairpin = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->ctx, &attr,
+					   rxq_ctrl->socket);
+	if (!tmpl->rq) {
+		DRV_LOG(ERR,
+			"port %u Rx hairpin queue %u can't create rq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsobj, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->rq)
+		mlx5_devx_cmd_destroy(tmpl->rq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Rx queue Verbs/DevX object.
  *
  * @param dev
@@ -1163,6 +1332,8 @@ struct mlx5_rxq_obj *
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->obj);
+	if (type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_rxq_obj_hairpin_new(dev, idx);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
 	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
@@ -1433,15 +1604,19 @@ struct mlx5_rxq_obj *
 	unsigned int strd_num_n = 0;
 	unsigned int strd_sz_n = 0;
 	unsigned int i;
+	unsigned int n_ibv = 0;
 
 	if (!mlx5_mprq_enabled(dev))
 		return 0;
 	/* Count the total number of descriptors configured. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
+		n_ibv++;
 		desc += 1 << rxq->elts_n;
 		/* Get the max number of strides. */
 		if (strd_num_n < rxq->strd_num_n)
@@ -1466,7 +1641,7 @@ struct mlx5_rxq_obj *
 	 * this Mempool gets available again.
 	 */
 	desc *= 4;
-	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * priv->rxqs_n;
+	obj_num = desc + MLX5_MPRQ_MP_CACHE_SZ * n_ibv;
 	/*
 	 * rte_mempool_create_empty() has sanity check to refuse large cache
 	 * size compared to the number of elements.
@@ -1514,8 +1689,10 @@ struct mlx5_rxq_obj *
 	/* Set mempool for each Rx queue. */
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl = container_of
+			(rxq, struct mlx5_rxq_ctrl, rxq);
 
-		if (rxq == NULL)
+		if (rxq == NULL || rxq_ctrl->type != MLX5_RXQ_TYPE_STANDARD)
 			continue;
 		rxq->mprq_mp = mp;
 	}
@@ -1620,6 +1797,7 @@ struct mlx5_rxq_ctrl *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
 			       MLX5_MR_BTREE_CACHE_N, socket)) {
 		/* rte_errno is already set. */
@@ -1788,6 +1966,49 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Create a DPDK Rx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_rxq_ctrl *
+mlx5_rxq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("RXQ", 1, sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->type = MLX5_RXQ_TYPE_HAIRPIN;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->rxq.rss_hash = 0;
+	tmpl->rxq.port_id = dev->data->port_id;
+	tmpl->priv = priv;
+	tmpl->rxq.mp = NULL;
+	tmpl->rxq.elts_n = log2above(desc);
+	tmpl->rxq.elts = NULL;
+	tmpl->rxq.mr_ctrl.cache_bh = (struct mlx5_mr_btree) { 0 };
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->rxq.idx = idx;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Rx queue.
  *
  * @param dev
@@ -1841,7 +2062,8 @@ struct mlx5_rxq_ctrl *
 		if (rxq_ctrl->dbr_umem_id_valid)
 			claim_zero(mlx5_release_dbr(dev, rxq_ctrl->dbr_umem_id,
 						    rxq_ctrl->dbr_offset));
-		mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			mlx5_mr_btree_free(&rxq_ctrl->rxq.mr_ctrl.cache_bh);
 		LIST_REMOVE(rxq_ctrl, next);
 		rte_free(rxq_ctrl);
 		(*priv->rxqs)[idx] = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4bb28a4..13fdc38 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,13 @@ struct mlx5_rxq_data {
 enum mlx5_rxq_obj_type {
 	MLX5_RXQ_OBJ_TYPE_IBV,		/* mlx5_rxq_obj with ibv_wq. */
 	MLX5_RXQ_OBJ_TYPE_DEVX_RQ,	/* mlx5_rxq_obj with mlx5_devx_rq. */
+	MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_rxq_obj with mlx5_devx_rq and hairpin support. */
+};
+
+enum mlx5_rxq_type {
+	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
+	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -183,6 +190,7 @@ struct mlx5_rxq_ctrl {
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_rxq_obj *obj; /* Verbs/DevX elements. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
+	enum mlx5_rxq_type type; /* Rxq type. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	unsigned int dbr_umem_id_valid:1; /* dbr_umem_id holds a valid value. */
@@ -193,6 +201,7 @@ struct mlx5_rxq_ctrl {
 	uint32_t dbr_umem_id; /* Storing door-bell information, */
 	uint64_t dbr_offset;  /* needed when freeing door-bell. */
 	struct mlx5dv_devx_umem *wq_umem; /* WQ buffer registration info. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 };
 
 enum mlx5_ind_tbl_type {
@@ -339,6 +348,9 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_rx_queue_release(void *dpdk_rxq);
 int mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev);
 void mlx5_rx_intr_vec_disable(struct rte_eth_dev *dev);
@@ -351,6 +363,9 @@ struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
 				   struct rte_mempool *mp);
+struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_rxq_ctrl *mlx5_rxq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_rxq_verify(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 122f31c..cb31ae2 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -118,6 +118,13 @@
 
 		if (!rxq_ctrl)
 			continue;
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_HAIRPIN) {
+			rxq_ctrl->obj = mlx5_rxq_obj_new
+				(dev, i, MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN);
+			if (!rxq_ctrl->obj)
+				goto error;
+			continue;
+		}
 		/* Pre-register Rx mempool. */
 		mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
 		     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 05/14] net/mlx5: prepare txq to work with different types
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (3 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 04/14] net/mlx5: support Rx hairpin queues Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 06/14] net/mlx5: support Tx hairpin queues Ori Kam
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Currenlty all Tx queues are created using Verbs.
This commit modify the naming so it will not include verbs,
since in next commit a new type will be introduce (hairpin)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |  2 +-
 drivers/net/mlx5/mlx5.h         |  2 +-
 drivers/net/mlx5/mlx5_rxtx.c    |  2 +-
 drivers/net/mlx5/mlx5_rxtx.h    | 39 +++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c |  4 +--
 drivers/net/mlx5/mlx5_txq.c     | 70 ++++++++++++++++++++---------------------
 6 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 87993b3..365d5da 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -922,7 +922,7 @@ struct mlx5_dev_spawn_data {
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Rx queues still remain",
 			dev->data->port_id);
-	ret = mlx5_txq_ibv_verify(dev);
+	ret = mlx5_txq_obj_verify(dev);
 	if (ret)
 		DRV_LOG(WARNING, "port %u some Verbs Tx queue still remain",
 			dev->data->port_id);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ee04dd0..3afb4cc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -650,7 +650,7 @@ struct mlx5_priv {
 	LIST_HEAD(rxqobj, mlx5_rxq_obj) rxqsobj; /* Verbs/DevX Rx queues. */
 	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
-	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
+	LIST_HEAD(txqobj, mlx5_txq_obj) txqsobj; /* Verbs/DevX Tx queues. */
 	/* Indirection tables. */
 	LIST_HEAD(ind_tables, mlx5_ind_table_obj) ind_tbls;
 	/* Pointer to next element. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5ec2b48..f597c89 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -863,7 +863,7 @@ enum mlx5_txcmp_code {
 			.qp_state = IBV_QPS_RESET,
 			.port_num = (uint8_t)priv->ibv_port,
 		};
-		struct ibv_qp *qp = txq_ctrl->ibv->qp;
+		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
 		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
 		if (ret) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 13fdc38..12f9bfb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -308,13 +308,31 @@ struct mlx5_txq_data {
 	/* Storage for queued packets, must be the last field. */
 } __rte_cache_aligned;
 
-/* Verbs Rx queue elements. */
-struct mlx5_txq_ibv {
-	LIST_ENTRY(mlx5_txq_ibv) next; /* Pointer to the next element. */
+enum mlx5_txq_obj_type {
+	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
+	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
+};
+
+enum mlx5_txq_type {
+	MLX5_TXQ_TYPE_STANDARD, /* Standard Tx queue. */
+	MLX5_TXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+};
+
+/* Verbs/DevX Tx queue elements. */
+struct mlx5_txq_obj {
+	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
+	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	RTE_STD_C11
+	union {
+		struct {
+			struct ibv_cq *cq; /* Completion Queue. */
+			struct ibv_qp *qp; /* Queue Pair. */
+		};
+		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+	};
 };
 
 /* TX queue control descriptor. */
@@ -322,9 +340,10 @@ struct mlx5_txq_ctrl {
 	LIST_ENTRY(mlx5_txq_ctrl) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	enum mlx5_txq_type type; /* The txq ctrl type. */
 	unsigned int max_inline_data; /* Max inline data. */
 	unsigned int max_tso_header; /* Max TSO header size. */
-	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
+	struct mlx5_txq_obj *obj; /* Verbs/DevX queue object. */
 	struct mlx5_priv *priv; /* Back pointer to private data. */
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
@@ -393,10 +412,10 @@ int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx);
-struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx);
-int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv);
-int mlx5_txq_ibv_verify(struct rte_eth_dev *dev);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
+int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
+int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cb31ae2..50c4df5 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -52,8 +52,8 @@
 		if (!txq_ctrl)
 			continue;
 		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->ibv = mlx5_txq_ibv_new(dev, i);
-		if (!txq_ctrl->ibv) {
+		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
 		}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 53d45e7..a6e2563 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -375,15 +375,15 @@
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_txq_ibv tmpl;
-	struct mlx5_txq_ibv *txq_ibv = NULL;
+	struct mlx5_txq_obj tmpl;
+	struct mlx5_txq_obj *txq_obj = NULL;
 	union {
 		struct ibv_qp_init_attr_ex init;
 		struct ibv_cq_init_attr_ex cq;
@@ -411,7 +411,7 @@ struct mlx5_txq_ibv *
 		rte_errno = EINVAL;
 		return NULL;
 	}
-	memset(&tmpl, 0, sizeof(struct mlx5_txq_ibv));
+	memset(&tmpl, 0, sizeof(struct mlx5_txq_obj));
 	attr.cq = (struct ibv_cq_init_attr_ex){
 		.comp_mask = 0,
 	};
@@ -502,9 +502,9 @@ struct mlx5_txq_ibv *
 		rte_errno = errno;
 		goto error;
 	}
-	txq_ibv = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_ibv), 0,
+	txq_obj = rte_calloc_socket(__func__, 1, sizeof(struct mlx5_txq_obj), 0,
 				    txq_ctrl->socket);
-	if (!txq_ibv) {
+	if (!txq_obj) {
 		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory",
 			dev->data->port_id, idx);
 		rte_errno = ENOMEM;
@@ -568,9 +568,9 @@ struct mlx5_txq_ibv *
 		}
 	}
 #endif
-	txq_ibv->qp = tmpl.qp;
-	txq_ibv->cq = tmpl.cq;
-	rte_atomic32_inc(&txq_ibv->refcnt);
+	txq_obj->qp = tmpl.qp;
+	txq_obj->cq = tmpl.cq;
+	rte_atomic32_inc(&txq_obj->refcnt);
 	txq_ctrl->bf_reg = qp.bf.reg;
 	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
 		txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
@@ -585,18 +585,18 @@ struct mlx5_txq_ibv *
 		goto error;
 	}
 	txq_uar_init(txq_ctrl);
-	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
-	txq_ibv->txq_ctrl = txq_ctrl;
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	txq_obj->txq_ctrl = txq_ctrl;
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
-	return txq_ibv;
+	return txq_obj;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
 	if (tmpl.cq)
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_ibv)
-		rte_free(txq_ibv);
+	if (txq_obj)
+		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	rte_errno = ret; /* Restore rte_errno. */
 	return NULL;
@@ -613,8 +613,8 @@ struct mlx5_txq_ibv *
  * @return
  *   The Verbs object if it exists.
  */
-struct mlx5_txq_ibv *
-mlx5_txq_ibv_get(struct rte_eth_dev *dev, uint16_t idx)
+struct mlx5_txq_obj *
+mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_ctrl *txq_ctrl;
@@ -624,29 +624,29 @@ struct mlx5_txq_ibv *
 	if (!(*priv->txqs)[idx])
 		return NULL;
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq_ctrl->ibv)
-		rte_atomic32_inc(&txq_ctrl->ibv->refcnt);
-	return txq_ctrl->ibv;
+	if (txq_ctrl->obj)
+		rte_atomic32_inc(&txq_ctrl->obj->refcnt);
+	return txq_ctrl->obj;
 }
 
 /**
  * Release an Tx verbs queue object.
  *
- * @param txq_ibv
+ * @param txq_obj
  *   Verbs Tx queue object.
  *
  * @return
  *   1 while a reference on it exists, 0 when freed.
  */
 int
-mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv)
+mlx5_txq_obj_release(struct mlx5_txq_obj *txq_obj)
 {
-	assert(txq_ibv);
-	if (rte_atomic32_dec_and_test(&txq_ibv->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_ibv->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_ibv->cq));
-		LIST_REMOVE(txq_ibv, next);
-		rte_free(txq_ibv);
+	assert(txq_obj);
+	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
+		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		LIST_REMOVE(txq_obj, next);
+		rte_free(txq_obj);
 		return 0;
 	}
 	return 1;
@@ -662,15 +662,15 @@ struct mlx5_txq_ibv *
  *   The number of object not released.
  */
 int
-mlx5_txq_ibv_verify(struct rte_eth_dev *dev)
+mlx5_txq_obj_verify(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
-	struct mlx5_txq_ibv *txq_ibv;
+	struct mlx5_txq_obj *txq_obj;
 
-	LIST_FOREACH(txq_ibv, &priv->txqsibv, next) {
+	LIST_FOREACH(txq_obj, &priv->txqsobj, next) {
 		DRV_LOG(DEBUG, "port %u Verbs Tx queue %u still referenced",
-			dev->data->port_id, txq_ibv->txq_ctrl->txq.idx);
+			dev->data->port_id, txq_obj->txq_ctrl->txq.idx);
 		++ret;
 	}
 	return ret;
@@ -1127,7 +1127,7 @@ struct mlx5_txq_ctrl *
 	if ((*priv->txqs)[idx]) {
 		ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl,
 				    txq);
-		mlx5_txq_ibv_get(dev, idx);
+		mlx5_txq_obj_get(dev, idx);
 		rte_atomic32_inc(&ctrl->refcnt);
 	}
 	return ctrl;
@@ -1153,8 +1153,8 @@ struct mlx5_txq_ctrl *
 	if (!(*priv->txqs)[idx])
 		return 0;
 	txq = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
-	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
-		txq->ibv = NULL;
+	if (txq->obj && !mlx5_txq_obj_release(txq->obj))
+		txq->obj = NULL;
 	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
 		txq_free_elts(txq);
 		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 06/14] net/mlx5: support Tx hairpin queues
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (4 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 05/14] net/mlx5: prepare txq to work with different types Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 07/14] net/mlx5: add get hairpin capabilities Ori Kam
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit adds the support for creating Tx hairpin queues.
Hairpin queue is a queue that is created using DevX and only used
by the HW.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |  36 +++++-
 drivers/net/mlx5/mlx5.h           |  46 ++++++++
 drivers/net/mlx5/mlx5_devx_cmds.c | 186 ++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_prm.h       | 118 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      |  18 ++-
 drivers/net/mlx5/mlx5_trigger.c   |  10 +-
 drivers/net/mlx5/mlx5_txq.c       | 230 +++++++++++++++++++++++++++++++++++---
 7 files changed, 620 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 365d5da..8e7ff1d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -324,6 +324,9 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
 	uint32_t i;
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_devx_tis_attr tis_attr = { 0 };
+#endif
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -390,10 +393,25 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
-	err = mlx5_get_pdn(sh->pd, &sh->pdn);
-	if (err) {
-		DRV_LOG(ERR, "Fail to extract pdn from PD");
-		goto error;
+	if (sh->devx) {
+		err = mlx5_get_pdn(sh->pd, &sh->pdn);
+		if (err) {
+			DRV_LOG(ERR, "Fail to extract pdn from PD");
+			goto error;
+		}
+		sh->td = mlx5_devx_cmd_create_td(sh->ctx);
+		if (!sh->td) {
+			DRV_LOG(ERR, "TD allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
+		tis_attr.transport_domain = sh->td->id;
+		sh->tis = mlx5_devx_cmd_create_tis(sh->ctx, &tis_attr);
+		if (!sh->tis) {
+			DRV_LOG(ERR, "TIS allocation failure");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
@@ -426,6 +444,10 @@ struct mlx5_dev_spawn_data {
 error:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 	assert(sh);
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -495,6 +517,10 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->tis)
+		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
+	if (sh->td)
+		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	rte_free(sh);
@@ -987,6 +1013,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
@@ -1054,6 +1081,7 @@ struct mlx5_dev_spawn_data {
 	.rx_queue_setup = mlx5_rx_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
+	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
 	.rx_queue_release = mlx5_rx_queue_release,
 	.tx_queue_release = mlx5_tx_queue_release,
 	.flow_ctrl_get = mlx5_dev_get_flow_ctrl,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3afb4cc..566bf2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -353,6 +353,43 @@ struct mlx5_devx_rqt_attr {
 	uint32_t rq_list[];
 };
 
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
 /**
  * Type of object being allocated.
  */
@@ -596,6 +633,8 @@ struct mlx5_ibv_shared {
 	uint32_t devx_intr_cnt; /* Devx interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle_devx; /* DEVX interrupt handler. */
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
+	struct mlx5_devx_obj *tis; /* TIS object. */
+	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
@@ -918,5 +957,12 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
 					struct mlx5_devx_tir_attr *tir_attr);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
 					struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
+	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq
+	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
+	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 17c1671..a501f1f 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -717,3 +717,189 @@ struct mlx5_devx_obj *
 	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
 	return rqt;
 }
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c86f8b8..c687cfb 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -671,9 +671,13 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
 	MLX5_CMD_OP_CREATE_RQ = 0x908,
 	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
@@ -1328,6 +1332,23 @@ struct mlx5_ifc_query_tis_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
 	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
@@ -1444,6 +1465,24 @@ struct mlx5_ifc_modify_rq_out_bits {
 	u8 reserved_at_40[0x40];
 };
 
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -1589,6 +1628,85 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 12f9bfb..271b648 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -324,14 +324,18 @@ struct mlx5_txq_obj {
 	LIST_ENTRY(mlx5_txq_obj) next; /* Pointer to the next element. */
 	rte_atomic32_t refcnt; /* Reference counter. */
 	struct mlx5_txq_ctrl *txq_ctrl; /* Pointer to the control queue. */
-	enum mlx5_rxq_obj_type type; /* The txq object type. */
+	enum mlx5_txq_obj_type type; /* The txq object type. */
 	RTE_STD_C11
 	union {
 		struct {
 			struct ibv_cq *cq; /* Completion Queue. */
 			struct ibv_qp *qp; /* Queue Pair. */
 		};
-		struct mlx5_devx_obj *sq; /* DevX object for Sx queue. */
+		struct {
+			struct mlx5_devx_obj *sq;
+			/* DevX object for Sx queue. */
+			struct mlx5_devx_obj *tis; /* The TIS object. */
+		};
 	};
 };
 
@@ -348,6 +352,7 @@ struct mlx5_txq_ctrl {
 	off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
 	void *bf_reg; /* BlueFlame register from Verbs. */
 	uint16_t dump_file_n; /* Number of dump files. */
+	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
@@ -410,15 +415,22 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 
 int mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_txconf *conf);
+int mlx5_tx_hairpin_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 void mlx5_tx_queue_release(void *dpdk_txq);
 int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
-struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx);
+struct mlx5_txq_obj *mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+				      enum mlx5_txq_obj_type type);
 struct mlx5_txq_obj *mlx5_txq_obj_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_obj_release(struct mlx5_txq_obj *txq_ibv);
 int mlx5_txq_obj_verify(struct rte_eth_dev *dev);
 struct mlx5_txq_ctrl *mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_txconf *conf);
+struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 const struct rte_eth_hairpin_conf *hairpin_conf);
 struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 50c4df5..3ec86c4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -51,8 +51,14 @@
 
 		if (!txq_ctrl)
 			continue;
-		txq_alloc_elts(txq_ctrl);
-		txq_ctrl->obj = mlx5_txq_obj_new(dev, i);
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN);
+		} else {
+			txq_alloc_elts(txq_ctrl);
+			txq_ctrl->obj = mlx5_txq_obj_new
+				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a6e2563..dfc379c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -136,30 +136,22 @@
 }
 
 /**
- * DPDK callback to configure a TX queue.
+ * Tx queue presetup checks.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param idx
- *   TX queue index.
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-int
-mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
+static int
+mlx5_tx_queue_pre_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
-	struct mlx5_txq_ctrl *txq_ctrl =
-		container_of(txq, struct mlx5_txq_ctrl, txq);
 
 	if (desc <= MLX5_TX_COMP_THRESH) {
 		DRV_LOG(WARNING,
@@ -191,6 +183,38 @@
 		return -rte_errno;
 	}
 	mlx5_txq_release(dev, idx);
+	return 0;
+}
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
 	txq_ctrl = mlx5_txq_new(dev, idx, desc, socket, conf);
 	if (!txq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
@@ -204,6 +228,57 @@
 }
 
 /**
+ * DPDK callback to configure a TX hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param[in] hairpin_conf
+ *   The hairpin binding configuration.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_tx_hairpin_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			    uint16_t desc,
+			    const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq, struct mlx5_txq_ctrl, txq);
+	int res;
+
+	res = mlx5_tx_queue_pre_setup(dev, idx, desc);
+	if (res)
+		return res;
+	if (hairpin_conf->peer_count != 1 ||
+	    hairpin_conf->peers[0].port != dev->data->port_id ||
+	    hairpin_conf->peers[0].queue >= priv->rxqs_n) {
+		DRV_LOG(ERR, "port %u unable to setup hairpin queue index %u "
+			" invalid hairpind configuration", dev->data->port_id,
+			idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = mlx5_txq_hairpin_new(dev, idx, desc,	hairpin_conf);
+	if (!txq_ctrl) {
+		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
+			dev->data->port_id, idx);
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "port %u adding Tx queue %u to list",
+		dev->data->port_id, idx);
+	(*priv->txqs)[idx] = &txq_ctrl->txq;
+	txq_ctrl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	return 0;
+}
+
+/**
  * DPDK callback to release a TX queue.
  *
  * @param dpdk_txq
@@ -246,6 +321,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 #endif
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	assert(ppriv);
 	ppriv->uar_table[txq_ctrl->txq.idx] = txq_ctrl->bf_reg;
@@ -282,6 +359,8 @@
 	uintptr_t offset;
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return 0;
 	assert(ppriv);
 	/*
 	 * As rdma-core, UARs are mapped in size of OS page
@@ -316,6 +395,8 @@
 	const size_t page_size = sysconf(_SC_PAGESIZE);
 	void *addr;
 
+	if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+		return;
 	addr = ppriv->uar_table[txq_ctrl->txq.idx];
 	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size);
 }
@@ -346,6 +427,8 @@
 			continue;
 		txq = (*priv->txqs)[i];
 		txq_ctrl = container_of(txq, struct mlx5_txq_ctrl, txq);
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_STANDARD)
+			continue;
 		assert(txq->idx == (uint16_t)i);
 		ret = txq_uar_init_secondary(txq_ctrl, fd);
 		if (ret)
@@ -365,18 +448,87 @@
 }
 
 /**
+ * Create the Tx hairpin queue object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The hairpin DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_txq_obj *tmpl = NULL;
+	int ret = 0;
+
+	assert(txq_data);
+	assert(!txq_ctrl->obj);
+	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
+				 txq_ctrl->socket);
+	if (!tmpl) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	tmpl->type = MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN;
+	tmpl->txq_ctrl = txq_ctrl;
+	attr.hairpin = 1;
+	attr.tis_lst_sz = 1;
+	/* Workaround for hairpin startup */
+	attr.wq_attr.log_hairpin_num_packets = log2above(32);
+	/* Workaround for packets larger than 1KB */
+	attr.wq_attr.log_hairpin_data_sz =
+			priv->config.hca_attr.log_max_hairpin_wq_data_sz;
+	attr.tis_num = priv->sh->tis->id;
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->ctx, &attr);
+	if (!tmpl->sq) {
+		DRV_LOG(ERR,
+			"port %u tx hairpin queue %u can't create sq object",
+			dev->data->port_id, idx);
+		rte_errno = errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "port %u sxq %u updated with %p", dev->data->port_id,
+		idx, (void *)&tmpl);
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsobj, tmpl, next);
+	return tmpl;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (tmpl->tis)
+		mlx5_devx_cmd_destroy(tmpl->tis);
+	if (tmpl->sq)
+		mlx5_devx_cmd_destroy(tmpl->sq);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
  *   Pointer to Ethernet device.
  * @param idx
  *   Queue index in DPDK Tx queue array.
+ * @param type
+ *   Type of the Tx queue object to create.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
  */
 struct mlx5_txq_obj *
-mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_txq_obj_new(struct rte_eth_dev *dev, uint16_t idx,
+		 enum mlx5_txq_obj_type type)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
@@ -396,6 +548,8 @@ struct mlx5_txq_obj *
 	const int desc = 1 << txq_data->elts_n;
 	int ret = 0;
 
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
+		return mlx5_txq_obj_hairpin_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -643,8 +797,13 @@ struct mlx5_txq_obj *
 {
 	assert(txq_obj);
 	if (rte_atomic32_dec_and_test(&txq_obj->refcnt)) {
-		claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
-		claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
+			if (txq_obj->tis)
+				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else {
+			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
+			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
+		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
 		return 0;
@@ -1100,6 +1259,7 @@ struct mlx5_txq_ctrl *
 		goto error;
 	}
 	rte_atomic32_inc(&tmpl->refcnt);
+	tmpl->type = MLX5_TXQ_TYPE_STANDARD;
 	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
 	return tmpl;
 error:
@@ -1108,6 +1268,46 @@ struct mlx5_txq_ctrl *
 }
 
 /**
+ * Create a DPDK Tx hairpin queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param hairpin_conf
+ *  The hairpin configuration.
+ *
+ * @return
+ *   A DPDK queue object on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_txq_ctrl *
+mlx5_txq_hairpin_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		     const struct rte_eth_hairpin_conf *hairpin_conf)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *tmpl;
+
+	tmpl = rte_calloc_socket("TXQ", 1,
+				 sizeof(*tmpl), 0, SOCKET_ID_ANY);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	tmpl->priv = priv;
+	tmpl->socket = SOCKET_ID_ANY;
+	tmpl->txq.elts_n = log2above(desc);
+	tmpl->txq.port_id = dev->data->port_id;
+	tmpl->txq.idx = idx;
+	tmpl->hairpin_conf = *hairpin_conf;
+	tmpl->type = MLX5_TXQ_TYPE_HAIRPIN;
+	rte_atomic32_inc(&tmpl->refcnt);
+	LIST_INSERT_HEAD(&priv->txqsctrl, tmpl, next);
+	return tmpl;
+}
+
+/**
  * Get a Tx queue.
  *
  * @param dev
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 07/14] net/mlx5: add get hairpin capabilities
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (5 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 06/14] net/mlx5: support Tx hairpin queues Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support Ori Kam
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commits adds the hairpin get capabilities function.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/rel_notes/release_19_11.rst |  1 +
 drivers/net/mlx5/mlx5.c                |  2 ++
 drivers/net/mlx5/mlx5.h                |  3 ++-
 drivers/net/mlx5/mlx5_ethdev.c         | 27 +++++++++++++++++++++++++++
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 6871453..f6e90cb 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -156,6 +156,7 @@ New Features
   * Added support for VLAN set PCP offload command.
   * Added support for VLAN set VID offload command.
   * Added support for matching on packets withe Geneve tunnel header.
+  * Added hairpin support.
 
 * **Updated the AF_XDP PMD.**
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8e7ff1d..e72e9eb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1039,6 +1039,7 @@ struct mlx5_dev_spawn_data {
 	.udp_tunnel_port_add  = mlx5_udp_tunnel_port_add,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /* Available operations from secondary process. */
@@ -1101,6 +1102,7 @@ struct mlx5_dev_spawn_data {
 	.is_removed = mlx5_is_removed,
 	.get_module_info = mlx5_get_module_info,
 	.get_module_eeprom = mlx5_get_module_eeprom,
+	.hairpin_cap_get = mlx5_hairpin_cap_get,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 566bf2d..742bedd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -789,7 +789,8 @@ int mlx5_get_module_info(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_module_info *modinfo);
 int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
-
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap);
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2278b24..fe1b4d4 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -2114,3 +2114,30 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 	rte_free(eeprom);
 	return ret;
 }
+
+/**
+ * DPDK callback to retrieve hairpin capabilities.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] cap
+ *   Storage for hairpin capability data.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
+			 struct rte_eth_hairpin_cap *cap)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->devx == 0) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	cap->max_nb_queues = UINT16_MAX;
+	cap->max_rx_2_tx = 1;
+	cap->max_tx_2_rx = 1;
+	cap->max_nb_desc = 8192;
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (6 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 07/14] net/mlx5: add get hairpin capabilities Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-31 17:11     ` Ferruh Yigit
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 09/14] net/mlx5: add hairpin binding function Ori Kam
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, orika, stephen

This commit introduce the hairpin queues to the testpmd.
the hairpin queue is configured using --hairpinq=<n>
the hairpin queue adds n queue objects for both the total number
of TX queues and RX queues.
The connection between the queues are 1 to 1, first Rx hairpin queue
will be connected to the first Tx hairpin queue

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/parameters.c |  28 ++++++++++++
 app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |   3 ++
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9ea87c1..9b6e35b 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -149,6 +149,8 @@
 	printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
 	printf("  --txq=N: set the number of TX queues per port to N.\n");
 	printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
+	printf("  --hairpinq=N: set the number of hairpin queues per port to "
+	       "N.\n");
 	printf("  --burst=N: set the number of packets per burst to N.\n");
 	printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
 	printf("  --rxpt=N: set prefetch threshold register of RX rings to N.\n");
@@ -622,6 +624,7 @@
 		{ "txq",			1, 0, 0 },
 		{ "rxd",			1, 0, 0 },
 		{ "txd",			1, 0, 0 },
+		{ "hairpinq",			1, 0, 0 },
 		{ "burst",			1, 0, 0 },
 		{ "mbcache",			1, 0, 0 },
 		{ "txpt",			1, 0, 0 },
@@ -1045,6 +1048,31 @@
 						  " >= 0 && <= %u\n", n,
 						  get_allowed_max_nb_txq(&pid));
 			}
+			if (!strcmp(lgopts[opt_idx].name, "hairpinq")) {
+				n = atoi(optarg);
+				if (n >= 0 &&
+				    check_nb_hairpinq((queueid_t)n) == 0)
+					nb_hairpinq = (queueid_t) n;
+				else
+					rte_exit(EXIT_FAILURE, "txq %d invalid - must be"
+						  " >= 0 && <= %u\n", n,
+						  get_allowed_max_nb_hairpinq
+						  (&pid));
+				if ((n + nb_txq) < 0 ||
+				    check_nb_txq((queueid_t)(n + nb_txq)) != 0)
+					rte_exit(EXIT_FAILURE, "txq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_txq,
+						  get_allowed_max_nb_txq(&pid));
+				if ((n + nb_rxq) < 0 ||
+				    check_nb_rxq((queueid_t)(n + nb_rxq)) != 0)
+					rte_exit(EXIT_FAILURE, "rxq + hairpinq "
+						 "%d invalid - must be"
+						  " >= 0 && <= %u\n",
+						  n + nb_rxq,
+						  get_allowed_max_nb_rxq(&pid));
+			}
 			if (!nb_rxq && !nb_txq) {
 				rte_exit(EXIT_FAILURE, "Either rx or tx queues should "
 						"be non-zero\n");
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 38acbc5..0fc5b45 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -234,6 +234,7 @@ struct fwd_engine * fwd_engines[] = {
 /*
  * Configurable number of RX/TX queues.
  */
+queueid_t nb_hairpinq; /**< Number of hairpin queues per port. */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
 
@@ -1067,6 +1068,53 @@ struct extmem_param {
 	return 0;
 }
 
+/*
+ * Get the allowed maximum number of hairpin queues.
+ * *pid return the port id which has minimal value of
+ * max_hairpin_queues in all ports.
+ */
+queueid_t
+get_allowed_max_nb_hairpinq(portid_t *pid)
+{
+	queueid_t allowed_max_hairpinq = MAX_QUEUE_ID;
+	portid_t pi;
+	struct rte_eth_hairpin_cap cap;
+
+	RTE_ETH_FOREACH_DEV(pi) {
+		if (rte_eth_dev_hairpin_capability_get(pi, &cap) != 0) {
+			*pid = pi;
+			return 0;
+		}
+		if (cap.max_nb_queues < allowed_max_hairpinq) {
+			allowed_max_hairpinq = cap.max_nb_queues;
+			*pid = pi;
+		}
+	}
+	return allowed_max_hairpinq;
+}
+
+/*
+ * Check input hairpin is valid or not.
+ * If input hairpin is not greater than any of maximum number
+ * of hairpin queues of all ports, it is valid.
+ * if valid, return 0, else return -1
+ */
+int
+check_nb_hairpinq(queueid_t hairpinq)
+{
+	queueid_t allowed_max_hairpinq;
+	portid_t pid = 0;
+
+	allowed_max_hairpinq = get_allowed_max_nb_hairpinq(&pid);
+	if (hairpinq > allowed_max_hairpinq) {
+		printf("Fail: input hairpin (%u) can't be greater "
+		       "than max_hairpin_queues (%u) of port %u\n",
+		       hairpinq, allowed_max_hairpinq, pid);
+		return -1;
+	}
+	return 0;
+}
+
 static void
 init_config(void)
 {
@@ -2028,6 +2076,11 @@ struct extmem_param {
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_ether_addr mac_addr;
+	struct rte_eth_hairpin_conf hairpin_conf = {
+		.peer_count = 1,
+	};
+	int i;
+	struct rte_eth_hairpin_cap cap;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2060,9 +2113,16 @@ struct extmem_param {
 			configure_rxtx_dump_callbacks(0);
 			printf("Configuring Port %d (socket %u)\n", pi,
 					port->socket_id);
+			if (nb_hairpinq > 0 &&
+			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
+				printf("Port %d doesn't support hairpin "
+				       "queues\n", pi);
+				return -1;
+			}
 			/* configure port */
-			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
-						&(port->dev_conf));
+			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
+						     nb_txq + nb_hairpinq,
+						     &(port->dev_conf));
 			if (diag != 0) {
 				if (rte_atomic16_cmpset(&(port->port_status),
 				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
@@ -2155,6 +2215,51 @@ struct extmem_param {
 				port->need_reconfig_queues = 1;
 				return -1;
 			}
+			/* setup hairpin queues */
+			i = 0;
+			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_rxq;
+				diag = rte_eth_tx_hairpin_queue_setup
+					(pi, qi, nb_txd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
+			i = 0;
+			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
+				hairpin_conf.peers[0].port = pi;
+				hairpin_conf.peers[0].queue = i + nb_txq;
+				diag = rte_eth_rx_hairpin_queue_setup
+					(pi, qi, nb_rxd, &hairpin_conf);
+				i++;
+				if (diag == 0)
+					continue;
+
+				/* Fail to setup rx queue, return */
+				if (rte_atomic16_cmpset(&(port->port_status),
+							RTE_PORT_HANDLING,
+							RTE_PORT_STOPPED) == 0)
+					printf("Port %d can not be set back "
+							"to stopped\n", pi);
+				printf("Fail to configure port %d hairpin "
+				       "queues\n", pi);
+				/* try to reconfigure queues next time */
+				port->need_reconfig_queues = 1;
+				return -1;
+			}
 		}
 		configure_rxtx_dump_callbacks(verbose_level);
 		/* start port */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index ec10a1a..8da1e8e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -385,6 +385,7 @@ struct queue_stats_mappings {
 
 extern uint64_t rss_hf;
 
+extern queueid_t nb_hairpinq;
 extern queueid_t nb_rxq;
 extern queueid_t nb_txq;
 
@@ -859,6 +860,8 @@ enum print_warning {
 int check_nb_rxq(queueid_t rxq);
 queueid_t get_allowed_max_nb_txq(portid_t *pid);
 int check_nb_txq(queueid_t txq);
+queueid_t get_allowed_max_nb_hairpinq(portid_t *pid);
+int check_nb_hairpinq(queueid_t hairpinq);
 
 uint16_t dump_rx_pkts(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		      uint16_t nb_pkts, __rte_unused uint16_t max_pkts,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 09/14] net/mlx5: add hairpin binding function
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (7 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When starting the port, in addition to creating the queues
we need to bind the hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h           |  1 +
 drivers/net/mlx5/mlx5_devx_cmds.c |  1 +
 drivers/net/mlx5/mlx5_prm.h       |  6 +++
 drivers/net/mlx5/mlx5_trigger.c   | 97 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 742bedd..33cfc5b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -191,6 +191,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
 };
 
 /* Flow list . */
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index a501f1f..3471a9b 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -334,6 +334,7 @@ struct mlx5_devx_obj *
 						    log_max_hairpin_wq_data_sz);
 	attr->log_max_hairpin_num_packets = MLX5_GET
 		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
 	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
 					  eth_net_offloads);
 	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index c687cfb..e4b19f8 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -1628,6 +1628,12 @@ struct mlx5_ifc_create_rqt_in_bits {
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ec86c4..a4fcdb3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -162,6 +162,96 @@
 }
 
 /**
+ * Binds Tx queues to Rx queues for hairpin.
+ *
+ * Binds Tx queues to the target Rx queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hairpin_bind(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_rq_attr rq_attr = { 0 };
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_devx_obj *sq;
+	struct mlx5_devx_obj *rq;
+	unsigned int i;
+	int ret = 0;
+
+	for (i = 0; i != priv->txqs_n; ++i) {
+		txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type != MLX5_TXQ_TYPE_HAIRPIN) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
+		if (!txq_ctrl->obj) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u no txq object found: %d",
+				dev->data->port_id, i);
+			mlx5_txq_release(dev, i);
+			return -rte_errno;
+		}
+		sq = txq_ctrl->obj->sq;
+		rxq_ctrl = mlx5_rxq_get(dev,
+					txq_ctrl->hairpin_conf.peers[0].queue);
+		if (!rxq_ctrl) {
+			mlx5_txq_release(dev, i);
+			rte_errno = EINVAL;
+			DRV_LOG(ERR, "port %u no rxq object found: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			return -rte_errno;
+		}
+		if (rxq_ctrl->type != MLX5_RXQ_TYPE_HAIRPIN ||
+		    rxq_ctrl->hairpin_conf.peers[0].queue != i) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u Tx queue %d can't be binded to "
+				"Rx queue %d", dev->data->port_id,
+				i, txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		rq = rxq_ctrl->obj->rq;
+		if (!rq) {
+			rte_errno = ENOMEM;
+			DRV_LOG(ERR, "port %u hairpin no matching rxq: %d",
+				dev->data->port_id,
+				txq_ctrl->hairpin_conf.peers[0].queue);
+			goto error;
+		}
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.sq_state = MLX5_SQC_STATE_RST;
+		sq_attr.hairpin_peer_rq = rq->id;
+		sq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_sq(sq, &sq_attr);
+		if (ret)
+			goto error;
+		rq_attr.state = MLX5_SQC_STATE_RDY;
+		rq_attr.rq_state = MLX5_SQC_STATE_RST;
+		rq_attr.hairpin_peer_sq = sq->id;
+		rq_attr.hairpin_peer_vhca = priv->config.hca_attr.vhca_id;
+		ret = mlx5_devx_cmd_modify_rq(rq, &rq_attr);
+		if (ret)
+			goto error;
+		mlx5_txq_release(dev, i);
+		mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	}
+	return 0;
+error:
+	mlx5_txq_release(dev, i);
+	mlx5_rxq_release(dev, txq_ctrl->hairpin_conf.peers[0].queue);
+	return -rte_errno;
+}
+
+/**
  * DPDK callback to start the device.
  *
  * Simulate device start by attaching all configured flows.
@@ -192,6 +282,13 @@
 		mlx5_txq_stop(dev);
 		return -rte_errno;
 	}
+	ret = mlx5_hairpin_bind(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		mlx5_txq_stop(dev);
+		return -rte_errno;
+	}
 	dev->data->dev_started = 1;
 	ret = mlx5_rx_intr_vec_enable(dev);
 	if (ret) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 10/14] net/mlx5: add support for hairpin hrxq
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (8 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 09/14] net/mlx5: add hairpin binding function Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 11/14] net/mlx5: add internal tag item and action Ori Kam
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Add support for rss on hairpin queues.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |   3 ++
 drivers/net/mlx5/mlx5_ethdev.c  | 102 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rss.c     |   1 +
 drivers/net/mlx5/mlx5_rxq.c     |  22 ++++++---
 drivers/net/mlx5/mlx5_trigger.c |   6 +++
 5 files changed, 104 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 33cfc5b..a36ba2d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -716,6 +716,7 @@ struct mlx5_priv {
 	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
+	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 };
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
@@ -792,6 +793,8 @@ int mlx5_get_module_eeprom(struct rte_eth_dev *dev,
 			   struct rte_dev_eeprom_info *info);
 int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 			 struct rte_eth_hairpin_cap *cap);
+int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);
+
 /* mlx5_mac.c */
 
 int mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index fe1b4d4..c2bed2f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -383,9 +383,6 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int i;
-	unsigned int j;
-	unsigned int reta_idx_n;
 	const uint8_t use_app_rss_key =
 		!!dev->data->dev_conf.rx_adv_conf.rss_conf.rss_key;
 	int ret = 0;
@@ -431,28 +428,8 @@ struct ethtool_link_settings {
 		DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
 			dev->data->port_id, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		/*
-		 * If the requested number of RX queues is not a power of two,
-		 * use the maximum indirection table size for better balancing.
-		 * The result is always rounded to the next power of two.
-		 */
-		reta_idx_n = (1 << log2above((rxqs_n & (rxqs_n - 1)) ?
-					     priv->config.ind_table_max_size :
-					     rxqs_n));
-		ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
-		if (ret)
-			return ret;
-		/*
-		 * When the number of RX queues is not a power of two,
-		 * the remaining table entries are padded with reused WQs
-		 * and hashes are not spread uniformly.
-		 */
-		for (i = 0, j = 0; (i != reta_idx_n); ++i) {
-			(*priv->reta_idx)[i] = j;
-			if (++j == rxqs_n)
-				j = 0;
-		}
 	}
+	priv->skip_default_rss_reta = 0;
 	ret = mlx5_proc_priv_init(dev);
 	if (ret)
 		return ret;
@@ -460,6 +437,83 @@ struct ethtool_link_settings {
 }
 
 /**
+ * Configure default RSS reta.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int rxqs_n = dev->data->nb_rx_queues;
+	unsigned int i;
+	unsigned int j;
+	unsigned int reta_idx_n;
+	int ret = 0;
+	unsigned int *rss_queue_arr = NULL;
+	unsigned int rss_queue_n = 0;
+
+	if (priv->skip_default_rss_reta)
+		return ret;
+	rss_queue_arr = rte_malloc("", rxqs_n * sizeof(unsigned int), 0);
+	if (!rss_queue_arr) {
+		DRV_LOG(ERR, "port %u cannot allocate RSS queue list (%u)",
+			dev->data->port_id, rxqs_n);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	for (i = 0, j = 0; i < rxqs_n; i++) {
+		struct mlx5_rxq_data *rxq_data;
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		rxq_data = (*priv->rxqs)[i];
+		rxq_ctrl = container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD)
+			rss_queue_arr[j++] = i;
+	}
+	rss_queue_n = j;
+	if (rss_queue_n > priv->config.ind_table_max_size) {
+		DRV_LOG(ERR, "port %u cannot handle this many Rx queues (%u)",
+			dev->data->port_id, rss_queue_n);
+		rte_errno = EINVAL;
+		rte_free(rss_queue_arr);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u Rx queues number update: %u -> %u",
+		dev->data->port_id, priv->rxqs_n, rxqs_n);
+	priv->rxqs_n = rxqs_n;
+	/*
+	 * If the requested number of RX queues is not a power of two,
+	 * use the maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two.
+	 */
+	reta_idx_n = (1 << log2above((rss_queue_n & (rss_queue_n - 1)) ?
+				priv->config.ind_table_max_size :
+				rss_queue_n));
+	ret = mlx5_rss_reta_index_resize(dev, reta_idx_n);
+	if (ret) {
+		rte_free(rss_queue_arr);
+		return ret;
+	}
+	/*
+	 * When the number of RX queues is not a power of two,
+	 * the remaining table entries are padded with reused WQs
+	 * and hashes are not spread uniformly.
+	 */
+	for (i = 0, j = 0; (i != reta_idx_n); ++i) {
+		(*priv->reta_idx)[i] = rss_queue_arr[j];
+		if (++j == rss_queue_n)
+			j = 0;
+	}
+	rte_free(rss_queue_arr);
+	return ret;
+}
+
+/**
  * Sets default tuning parameters.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 891d764..1028264 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -223,6 +223,7 @@
 	}
 	if (dev->data->dev_started) {
 		mlx5_dev_stop(dev);
+		priv->skip_default_rss_reta = 1;
 		return mlx5_dev_start(dev);
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c70e161..2c3d5eb 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2156,9 +2156,13 @@ struct mlx5_rxq_ctrl *
 		}
 	} else { /* ind_tbl->type == MLX5_IND_TBL_TYPE_DEVX */
 		struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+		const unsigned int rqt_n =
+			1 << (rte_is_power_of_2(queues_n) ?
+			      log2above(queues_n) :
+			      log2above(priv->config.ind_table_max_size));
 
 		rqt_attr = rte_calloc(__func__, 1, sizeof(*rqt_attr) +
-				      queues_n * sizeof(uint32_t), 0);
+				      rqt_n * sizeof(uint32_t), 0);
 		if (!rqt_attr) {
 			DRV_LOG(ERR, "port %u cannot allocate RQT resources",
 				dev->data->port_id);
@@ -2166,7 +2170,7 @@ struct mlx5_rxq_ctrl *
 			goto error;
 		}
 		rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
-		rqt_attr->rqt_actual_size = queues_n;
+		rqt_attr->rqt_actual_size = rqt_n;
 		for (i = 0; i != queues_n; ++i) {
 			struct mlx5_rxq_ctrl *rxq = mlx5_rxq_get(dev,
 								 queues[i]);
@@ -2175,6 +2179,9 @@ struct mlx5_rxq_ctrl *
 			rqt_attr->rq_list[i] = rxq->obj->rq->id;
 			ind_tbl->queues[i] = queues[i];
 		}
+		k = i; /* Retain value of i for use in error case. */
+		for (j = 0; k != rqt_n; ++k, ++j)
+			rqt_attr->rq_list[k] = rqt_attr->rq_list[j];
 		ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx,
 							rqt_attr);
 		rte_free(rqt_attr);
@@ -2328,13 +2335,13 @@ struct mlx5_hrxq *
 	struct mlx5_ind_table_obj *ind_tbl;
 	int err;
 	struct mlx5_devx_obj *tir = NULL;
+	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
+	struct mlx5_rxq_ctrl *rxq_ctrl =
+		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_obj_get(dev, queues, queues_n);
 	if (!ind_tbl) {
-		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[0]];
-		struct mlx5_rxq_ctrl *rxq_ctrl =
-			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 		enum mlx5_ind_tbl_type type;
 
 		type = rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_IBV ?
@@ -2430,7 +2437,10 @@ struct mlx5_hrxq *
 		tir_attr.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ;
 		memcpy(&tir_attr.rx_hash_field_selector_outer, &hash_fields,
 		       sizeof(uint64_t));
-		tir_attr.transport_domain = priv->sh->tdn;
+		if (rxq_ctrl->obj->type == MLX5_RXQ_OBJ_TYPE_DEVX_HAIRPIN)
+			tir_attr.transport_domain = priv->sh->td->id;
+		else
+			tir_attr.transport_domain = priv->sh->tdn;
 		memcpy(tir_attr.rx_hash_toeplitz_key, rss_key, rss_key_len);
 		tir_attr.indirect_table = ind_tbl->rqt->id;
 		if (dev->data->dev_conf.lpbk_mode)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a4fcdb3..f66b6ee 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -269,6 +269,12 @@
 	int ret;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+	ret = mlx5_dev_configure_rss_reta(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u reta config failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		return -rte_errno;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 11/14] net/mlx5: add internal tag item and action
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (9 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 12/14] net/mlx5: add id generation function Ori Kam
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

This commit introduce the setting and matching on regiters.
This item and and action will be used with number of different
features like hairpin, metering, metadata.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c    |  52 +++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  48 +++++++++++-
 drivers/net/mlx5/mlx5_flow_dv.c | 158 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_prm.h     |   3 +-
 4 files changed, 254 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d4d956f..a309b6f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -316,6 +316,58 @@ struct mlx5_flow_tunnel_info {
 	},
 };
 
+enum mlx5_feature_name {
+	MLX5_HAIRPIN_RX,
+	MLX5_HAIRPIN_TX,
+	MLX5_APPLICATION,
+};
+
+/**
+ * Translate tag ID to register.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] feature
+ *   The feature that request the register.
+ * @param[in] id
+ *   The request register ID.
+ * @param[out] error
+ *   Error description in case of any.
+ *
+ * @return
+ *   The request register on success, a negative errno
+ *   value otherwise and rte_errno is set.
+ */
+__rte_unused
+static enum modify_reg flow_get_reg_id(struct rte_eth_dev *dev,
+				       enum mlx5_feature_name feature,
+				       uint32_t id,
+				       struct rte_flow_error *error)
+{
+	static enum modify_reg id2reg[] = {
+		[0] = REG_A,
+		[1] = REG_C_2,
+		[2] = REG_C_3,
+		[3] = REG_C_4,
+		[4] = REG_B,};
+
+	dev = (void *)dev;
+	switch (feature) {
+	case MLX5_HAIRPIN_RX:
+		return REG_B;
+	case MLX5_HAIRPIN_TX:
+		return REG_A;
+	case MLX5_APPLICATION:
+		if (id > 4)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM,
+						  NULL, "invalid tag id");
+		return id2reg[id];
+	}
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM,
+				  NULL, "invalid feature name");
+}
+
 /**
  * Discover the maximum number of priority available.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9658db1..a79b48b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -27,6 +27,43 @@
 #include "mlx5.h"
 #include "mlx5_prm.h"
 
+enum modify_reg {
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Private rte flow items. */
+enum mlx5_rte_flow_item_type {
+	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+};
+
+/* Private rte flow actions. */
+enum mlx5_rte_flow_action_type {
+	MLX5_RTE_FLOW_ACTION_TYPE_END = INT_MIN,
+	MLX5_RTE_FLOW_ACTION_TYPE_TAG,
+};
+
+/* Matches on selected register. */
+struct mlx5_rte_flow_item_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
+/* Modify selected register. */
+struct mlx5_rte_flow_action_set_tag {
+	uint16_t id;
+	rte_be32_t data;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -53,11 +90,12 @@
 /* General pattern items bits. */
 #define MLX5_FLOW_ITEM_METADATA (1u << 16)
 #define MLX5_FLOW_ITEM_PORT_ID (1u << 17)
+#define MLX5_FLOW_ITEM_TAG (1u << 18)
 
 /* Pattern MISC bits. */
-#define MLX5_FLOW_LAYER_ICMP (1u << 18)
-#define MLX5_FLOW_LAYER_ICMP6 (1u << 19)
-#define MLX5_FLOW_LAYER_GRE_KEY (1u << 20)
+#define MLX5_FLOW_LAYER_ICMP (1u << 19)
+#define MLX5_FLOW_LAYER_ICMP6 (1u << 20)
+#define MLX5_FLOW_LAYER_GRE_KEY (1u << 21)
 
 /* Pattern tunnel Layer bits (continued). */
 #define MLX5_FLOW_LAYER_IPIP (1u << 21)
@@ -141,6 +179,7 @@
 #define MLX5_FLOW_ACTION_DEC_TCP_SEQ (1u << 29)
 #define MLX5_FLOW_ACTION_INC_TCP_ACK (1u << 30)
 #define MLX5_FLOW_ACTION_DEC_TCP_ACK (1u << 31)
+#define MLX5_FLOW_ACTION_SET_TAG (1ull << 32)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -174,7 +213,8 @@
 				      MLX5_FLOW_ACTION_DEC_TCP_SEQ | \
 				      MLX5_FLOW_ACTION_INC_TCP_ACK | \
 				      MLX5_FLOW_ACTION_DEC_TCP_ACK | \
-				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID)
+				      MLX5_FLOW_ACTION_OF_SET_VLAN_VID | \
+				      MLX5_FLOW_ACTION_SET_TAG)
 
 #define MLX5_FLOW_VLAN_ACTIONS (MLX5_FLOW_ACTION_OF_POP_VLAN | \
 				MLX5_FLOW_ACTION_OF_PUSH_VLAN)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e5f4c4c..1c9dc36 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -724,6 +724,59 @@ struct field_modify_info modify_tcp[] = {
 					     MLX5_MODIFICATION_TYPE_ADD, error);
 }
 
+static enum mlx5_modification_field reg_to_field[] = {
+	[REG_A] = MLX5_MODI_META_DATA_REG_A,
+	[REG_B] = MLX5_MODI_META_DATA_REG_B,
+	[REG_C_0] = MLX5_MODI_META_REG_C_0,
+	[REG_C_1] = MLX5_MODI_META_REG_C_1,
+	[REG_C_2] = MLX5_MODI_META_REG_C_2,
+	[REG_C_3] = MLX5_MODI_META_REG_C_3,
+	[REG_C_4] = MLX5_MODI_META_REG_C_4,
+	[REG_C_5] = MLX5_MODI_META_REG_C_5,
+	[REG_C_6] = MLX5_MODI_META_REG_C_6,
+	[REG_C_7] = MLX5_MODI_META_REG_C_7,
+};
+
+/**
+ * Convert register set to DV specification.
+ *
+ * @param[in,out] resource
+ *   Pointer to the modify-header resource.
+ * @param[in] action
+ *   Pointer to action specification.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_convert_action_set_reg
+			(struct mlx5_flow_dv_modify_hdr_resource *resource,
+			 const struct rte_flow_action *action,
+			 struct rte_flow_error *error)
+{
+	const struct mlx5_rte_flow_action_set_tag *conf = (action->conf);
+	struct mlx5_modification_cmd *actions = resource->actions;
+	uint32_t i = resource->actions_num;
+
+	if (i >= MLX5_MODIFY_NUM)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "too many items to modify");
+	actions[i].action_type = MLX5_MODIFICATION_TYPE_SET;
+	actions[i].field = reg_to_field[conf->id];
+	actions[i].data0 = rte_cpu_to_be_32(actions[i].data0);
+	actions[i].data1 = conf->data;
+	++i;
+	resource->actions_num = i;
+	if (!resource->actions_num)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "invalid modification flow item");
+	return 0;
+}
+
 /**
  * Validate META item.
  *
@@ -4720,6 +4773,94 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add tag item to matcher
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ */
+static void
+flow_dv_translate_item_tag(void *matcher, void *key,
+			   const struct rte_flow_item *item)
+{
+	void *misc2_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters_2);
+	void *misc2_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters_2);
+	const struct mlx5_rte_flow_item_tag *tag_v = item->spec;
+	const struct mlx5_rte_flow_item_tag *tag_m = item->mask;
+	enum modify_reg reg = tag_v->id;
+	rte_be32_t value = tag_v->data;
+	rte_be32_t mask = tag_m->data;
+
+	switch (reg) {
+	case REG_A:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_a,
+				rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_a,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_B:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_b,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_b,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_0:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_0,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_0,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_1:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_1,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_1,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_2:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_2,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_2,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_3:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_3,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_3,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_4:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_4,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_4,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_5:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_5,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_5,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_6:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_6,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_6,
+				rte_be_to_cpu_32(value));
+		break;
+	case REG_C_7:
+		MLX5_SET(fte_match_set_misc2, misc2_m, metadata_reg_c_7,
+				 rte_be_to_cpu_32(mask));
+		MLX5_SET(fte_match_set_misc2, misc2_v, metadata_reg_c_7,
+				rte_be_to_cpu_32(value));
+		break;
+	}
+}
+
+/**
  * Add source vport match to the specified matcher.
  *
  * @param[in, out] matcher
@@ -5305,8 +5446,9 @@ struct field_modify_info modify_tcp[] = {
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
+		int action_type = actions->type;
 
-		switch (actions->type) {
+		switch (action_type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -5621,6 +5763,12 @@ struct field_modify_info modify_tcp[] = {
 					MLX5_FLOW_ACTION_INC_TCP_ACK :
 					MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			if (flow_dv_convert_action_set_reg(&res, actions,
+							   error))
+				return -rte_errno;
+			action_flags |= MLX5_FLOW_ACTION_SET_TAG;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			if (action_flags & MLX5_FLOW_MODIFY_HDR_ACTIONS) {
@@ -5645,8 +5793,9 @@ struct field_modify_info modify_tcp[] = {
 	flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
+		int item_type = items->type;
 
-		switch (items->type) {
+		switch (item_type) {
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
 			flow_dv_translate_item_port_id(dev, match_mask,
 						       match_value, items);
@@ -5797,6 +5946,11 @@ struct field_modify_info modify_tcp[] = {
 						      items, tunnel);
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+			flow_dv_translate_item_tag(match_mask, match_value,
+						   items);
+			last_item = MLX5_FLOW_ITEM_TAG;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index e4b19f8..96b9166 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -628,7 +628,8 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 	u8 metadata_reg_c_1[0x20];
 	u8 metadata_reg_c_0[0x20];
 	u8 metadata_reg_a[0x20];
-	u8 reserved_at_1a0[0x60];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 12/14] net/mlx5: add id generation function
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (10 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 11/14] net/mlx5: add internal tag item and action Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 13/14] net/mlx5: add default flows for hairpin Ori Kam
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When splitting flows for example in hairpin / metering, there is a need
to combine the flows. This is done using ID.
This commit introduce a simple way to generate such IDs.

The reason why bitmap was not used is due to fact that the release and
allocation are O(n) while in the chosen approch the allocation and
release are O(1)

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
 drivers/net/mlx5/mlx5.c      | 120 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h |  14 +++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e72e9eb..01a2ef7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -178,6 +178,124 @@ struct mlx5_dev_spawn_data {
 static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
 static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 
+#define MLX5_FLOW_MIN_ID_POOL_SIZE 512
+#define MLX5_ID_GENERATION_ARRAY_FACTOR 16
+
+/**
+ * Allocate ID pool structure.
+ *
+ * @return
+ *   Pointer to pool object, NULL value otherwise.
+ */
+struct mlx5_flow_id_pool *
+mlx5_flow_id_pool_alloc(void)
+{
+	struct mlx5_flow_id_pool *pool;
+	void *mem;
+
+	pool = rte_zmalloc("id pool allocation", sizeof(*pool),
+			   RTE_CACHE_LINE_SIZE);
+	if (!pool) {
+		DRV_LOG(ERR, "can't allocate id pool");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	mem = rte_zmalloc("", MLX5_FLOW_MIN_ID_POOL_SIZE * sizeof(uint32_t),
+			  RTE_CACHE_LINE_SIZE);
+	if (!mem) {
+		DRV_LOG(ERR, "can't allocate mem for id pool");
+		rte_errno  = ENOMEM;
+		goto error;
+	}
+	pool->free_arr = mem;
+	pool->curr = pool->free_arr;
+	pool->last = pool->free_arr + MLX5_FLOW_MIN_ID_POOL_SIZE;
+	pool->base_index = 0;
+	return pool;
+error:
+	rte_free(pool);
+	return NULL;
+}
+
+/**
+ * Release ID pool structure.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool object to free.
+ */
+void
+mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool)
+{
+	rte_free(pool->free_arr);
+	rte_free(pool);
+}
+
+/**
+ * Generate ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id)
+{
+	if (pool->curr == pool->free_arr) {
+		if (pool->base_index == UINT32_MAX) {
+			rte_errno  = ENOMEM;
+			DRV_LOG(ERR, "no free id");
+			return -rte_errno;
+		}
+		*id = ++pool->base_index;
+		return 0;
+	}
+	*id = *(--pool->curr);
+	return 0;
+}
+
+/**
+ * Release ID.
+ *
+ * @param[in] pool
+ *   Pointer to flow id pool.
+ * @param[out] id
+ *   The generated ID.
+ *
+ * @return
+ *   0 on success, error value otherwise.
+ */
+uint32_t
+mlx5_flow_id_release(struct mlx5_flow_id_pool *pool, uint32_t id)
+{
+	uint32_t size;
+	uint32_t size2;
+	void *mem;
+
+	if (pool->curr == pool->last) {
+		size = pool->curr - pool->free_arr;
+		size2 = size * MLX5_ID_GENERATION_ARRAY_FACTOR;
+		assert(size2 > size);
+		mem = rte_malloc("", size2 * sizeof(uint32_t), 0);
+		if (!mem) {
+			DRV_LOG(ERR, "can't allocate mem for id pool");
+			rte_errno  = ENOMEM;
+			return -rte_errno;
+		}
+		memcpy(mem, pool->free_arr, size * sizeof(uint32_t));
+		rte_free(pool->free_arr);
+		pool->free_arr = mem;
+		pool->curr = pool->free_arr + size;
+		pool->last = pool->free_arr + size2;
+	}
+	*pool->curr = id;
+	pool->curr++;
+	return 0;
+}
+
 /**
  * Initialize the counters management structure.
  *
@@ -328,7 +446,7 @@ struct mlx5_dev_spawn_data {
 	struct mlx5_devx_tis_attr tis_attr = { 0 };
 #endif
 
-	assert(spawn);
+assert(spawn);
 	/* Secondary process should not create the shared context. */
 	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
 	pthread_mutex_lock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a79b48b..fddc06b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -527,8 +527,22 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /* mlx5_flow.c */
 
+struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
+void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
+uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
+uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
+			      uint32_t id);
 int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
 			     bool external, uint32_t group, uint32_t *table,
 			     struct rte_flow_error *error);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 13/14] net/mlx5: add default flows for hairpin
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (11 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 12/14] net/mlx5: add id generation function Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 14/14] net/mlx5: split hairpin flows Ori Kam
  2019-10-31 17:13   ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ferruh Yigit
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

When using hairpin all traffic from TX hairpin queues should jump
to dedecated table where matching can be done using regesters.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_flow.c    | 60 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h    |  9 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c | 63 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c | 18 ++++++++++++
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a36ba2d..1181c1f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -560,6 +560,7 @@ struct mlx5_flow_tbl_resource {
 };
 
 #define MLX5_MAX_TABLES UINT16_MAX
+#define MLX5_HAIRPIN_TX_TABLE (UINT16_MAX - 1)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 
 #define MLX5_DBR_PAGE_SIZE 4096 /* Must be >= 512. */
@@ -883,6 +884,7 @@ int mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 int mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list);
 void mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a309b6f..1148db0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2820,6 +2820,66 @@ struct rte_flow *
 }
 
 /**
+ * Enable default hairpin egress flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue
+ *   The queue index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
+			    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr attr = {
+		.egress = 1,
+		.priority = 0,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = queue,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.last = NULL,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HAIRPIN_TX_TABLE,
+	};
+	struct rte_flow_action actions[2];
+	struct rte_flow *flow;
+	struct rte_flow_error error;
+
+	actions[0].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	actions[0].conf = &jump;
+	actions[1].type = RTE_FLOW_ACTION_TYPE_END;
+	flow = flow_list_create(dev, &priv->ctrl_flows,
+				&attr, items, actions, false, &error);
+	if (!flow) {
+		DRV_LOG(DEBUG,
+			"Failed to create ctrl flow: rte_errno(%d),"
+			" type(%d), message(%s)\n",
+			rte_errno, error.type,
+			error.message ? error.message : " (no stated reason)");
+		return -rte_errno;
+	}
+	return 0;
+}
+
+/**
  * Enable a control flow configured from the control plane.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index fddc06b..f81e1b1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -44,6 +44,7 @@ enum modify_reg {
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 };
 
 /* Private rte flow actions. */
@@ -64,6 +65,11 @@ struct mlx5_rte_flow_action_set_tag {
 	rte_be32_t data;
 };
 
+/* Matches on source queue. */
+struct mlx5_rte_flow_item_tx_queue {
+	uint32_t queue;
+};
+
 /* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
@@ -103,6 +109,9 @@ struct mlx5_rte_flow_action_set_tag {
 #define MLX5_FLOW_LAYER_NVGRE (1u << 23)
 #define MLX5_FLOW_LAYER_GENEVE (1u << 24)
 
+/* Queue items. */
+#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 25)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1c9dc36..13178cc 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3358,7 +3358,9 @@ struct field_modify_info modify_tcp[] = {
 		return ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
-		switch (items->type) {
+		int type = items->type;
+
+		switch (type) {
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -3527,6 +3529,9 @@ struct field_modify_info modify_tcp[] = {
 				return ret;
 			last_item = MLX5_FLOW_LAYER_ICMP6;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -3535,11 +3540,12 @@ struct field_modify_info modify_tcp[] = {
 		item_flags |= last_item;
 	}
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		int type = actions->type;
 		if (actions_n == MLX5_DV_MAX_NUMBER_OF_ACTIONS)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  actions, "too many actions");
-		switch (actions->type) {
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -3805,6 +3811,8 @@ struct field_modify_info modify_tcp[] = {
 						MLX5_FLOW_ACTION_INC_TCP_ACK :
 						MLX5_FLOW_ACTION_DEC_TCP_ACK;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TAG:
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5371,6 +5379,51 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add Tx queue matcher
+ *
+ * @param[in] dev
+ *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] inner
+ *   Item is inner pattern.
+ */
+static void
+flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
+				void *matcher, void *key,
+				const struct rte_flow_item *item)
+{
+	const struct mlx5_rte_flow_item_tx_queue *queue_m;
+	const struct mlx5_rte_flow_item_tx_queue *queue_v;
+	void *misc_m =
+		MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v =
+		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+	struct mlx5_txq_ctrl *txq;
+	uint32_t queue;
+
+
+	queue_m = (const void *)item->mask;
+	if (!queue_m)
+		return;
+	queue_v = (const void *)item->spec;
+	if (!queue_v)
+		return;
+	txq = mlx5_txq_get(dev, queue_v->queue);
+	if (!txq)
+		return;
+	queue = txq->obj->sq->id;
+	MLX5_SET(fte_match_set_misc, misc_m, source_sqn, queue_m->queue);
+	MLX5_SET(fte_match_set_misc, misc_v, source_sqn,
+		 queue & queue_m->queue);
+	mlx5_txq_release(dev, queue_v->queue);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -5951,6 +6004,12 @@ struct field_modify_info modify_tcp[] = {
 						   items);
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+			flow_dv_translate_item_tx_queue(dev, match_mask,
+							match_value,
+							items);
+			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+			break;
 		default:
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f66b6ee..cafab25 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -402,6 +402,24 @@
 	unsigned int j;
 	int ret;
 
+	/*
+	 * Hairpin txq default flow should be created no matter if it is
+	 * isolation mode. Or else all the packets to be sent will be sent
+	 * out directly without the TX flow actions, e.g. encapsulation.
+	 */
+	for (i = 0; i != priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i);
+		if (!txq_ctrl)
+			continue;
+		if (txq_ctrl->type == MLX5_TXQ_TYPE_HAIRPIN) {
+			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			if (ret) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
 	if (priv->config.dv_esw_en && !priv->config.vf)
 		if (!mlx5_flow_create_esw_table_zero_flow(dev))
 			goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* [dpdk-dev] [PATCH v7 14/14] net/mlx5: split hairpin flows
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (12 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 13/14] net/mlx5: add default flows for hairpin Ori Kam
@ 2019-10-30 23:53   ` Ori Kam
  2019-10-31 17:13   ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ferruh Yigit
  14 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-30 23:53 UTC (permalink / raw)
  To: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko
  Cc: dev, orika, jingjing.wu, stephen

Since the encap action is not supported in RX, we need to split the
hairpin flow into RX and TX.

Signed-off-by: Ori Kam <orika@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  10 ++
 drivers/net/mlx5/mlx5.h            |  10 ++
 drivers/net/mlx5/mlx5_flow.c       | 281 +++++++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h       |  14 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  10 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |  11 +-
 drivers/net/mlx5/mlx5_rxq.c        |  26 ++++
 drivers/net/mlx5/mlx5_rxtx.h       |   2 +
 8 files changed, 334 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 01a2ef7..0bbc8d1 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -531,6 +531,12 @@ struct mlx5_flow_id_pool *
 			goto error;
 		}
 	}
+	sh->flow_id_pool = mlx5_flow_id_pool_alloc();
+	if (!sh->flow_id_pool) {
+		DRV_LOG(ERR, "can't create flow id pool");
+		err = ENOMEM;
+		goto error;
+	}
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
 	/*
 	 * Once the device is added to the list of memory event
@@ -570,6 +576,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 	assert(err > 0);
 	rte_errno = err;
@@ -641,6 +649,8 @@ struct mlx5_flow_id_pool *
 		claim_zero(mlx5_devx_cmd_destroy(sh->td));
 	if (sh->ctx)
 		claim_zero(mlx5_glue->close_device(sh->ctx));
+	if (sh->flow_id_pool)
+		mlx5_flow_id_pool_release(sh->flow_id_pool);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1181c1f..f644998 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -578,6 +578,15 @@ struct mlx5_devx_dbr_page {
 	uint64_t dbr_bitmap[MLX5_DBR_BITMAP_SIZE];
 };
 
+/* ID generation structure. */
+struct mlx5_flow_id_pool {
+	uint32_t *free_arr; /**< Pointer to the a array of free values. */
+	uint32_t base_index;
+	/**< The next index that can be used without any free elements. */
+	uint32_t *curr; /**< Pointer to the index to pop. */
+	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -637,6 +646,7 @@ struct mlx5_ibv_shared {
 	struct mlx5dv_devx_cmd_comp *devx_comp; /* DEVX async comp obj. */
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
+	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1148db0..5f01f9c 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -606,7 +606,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -669,7 +669,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = dev_flow->flow;
-	const int mark = !!(flow->actions &
+	const int mark = !!(dev_flow->actions &
 			    (MLX5_FLOW_ACTION_FLAG | MLX5_FLOW_ACTION_MARK));
 	const int tunnel = !!(dev_flow->layers & MLX5_FLOW_LAYER_TUNNEL);
 	unsigned int i;
@@ -2527,6 +2527,210 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 }
 
 /**
+ * Check if the flow should be splited due to hairpin.
+ * The reason for the split is that in current HW we can't
+ * support encap on Rx, so if a flow have encap we move it
+ * to Tx.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   > 0 the number of actions and the flow should be split,
+ *   0 when no split required.
+ */
+static int
+flow_check_hairpin_split(struct rte_eth_dev *dev,
+			 const struct rte_flow_attr *attr,
+			 const struct rte_flow_action actions[])
+{
+	int queue_action = 0;
+	int action_n = 0;
+	int encap = 0;
+	const struct rte_flow_action_queue *queue;
+	const struct rte_flow_action_rss *rss;
+	const struct rte_flow_action_raw_encap *raw_encap;
+
+	if (!attr->ingress)
+		return 0;
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = actions->conf;
+			if (mlx5_rxq_get_type(dev, queue->index) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			rss = actions->conf;
+			if (mlx5_rxq_get_type(dev, rss->queue[0]) !=
+			    MLX5_RXQ_TYPE_HAIRPIN)
+				return 0;
+			queue_action = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			encap = 1;
+			action_n++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4)))
+				encap = 1;
+			action_n++;
+			break;
+		default:
+			action_n++;
+			break;
+		}
+	}
+	if (encap == 1 && queue_action)
+		return action_n;
+	return 0;
+}
+
+#define MLX5_MAX_SPLIT_ACTIONS 24
+#define MLX5_MAX_SPLIT_ITEMS 24
+
+/**
+ * Split the hairpin flow.
+ * Since HW can't support encap on Rx we move the encap to Tx.
+ * If the count action is after the encap then we also
+ * move the count action. in this case the count will also measure
+ * the outer bytes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] actions_rx
+ *   Rx flow actions.
+ * @param[out] actions_tx
+ *   Tx flow actions..
+ * @param[out] pattern_tx
+ *   The pattern items for the Tx flow.
+ * @param[out] flow_id
+ *   The flow ID connected to this flow.
+ *
+ * @return
+ *   0 on success.
+ */
+static int
+flow_hairpin_split(struct rte_eth_dev *dev,
+		   const struct rte_flow_action actions[],
+		   struct rte_flow_action actions_rx[],
+		   struct rte_flow_action actions_tx[],
+		   struct rte_flow_item pattern_tx[],
+		   uint32_t *flow_id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_raw_encap *raw_encap;
+	const struct rte_flow_action_raw_decap *raw_decap;
+	struct mlx5_rte_flow_action_set_tag *set_tag;
+	struct rte_flow_action *tag_action;
+	struct mlx5_rte_flow_item_tag *tag_item;
+	struct rte_flow_item *item;
+	char *addr;
+	struct rte_flow_error error;
+	int encap = 0;
+
+	mlx5_flow_id_get(priv->sh->flow_id_pool, flow_id);
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			rte_memcpy(actions_tx, actions,
+			       sizeof(struct rte_flow_action));
+			actions_tx++;
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (encap) {
+				rte_memcpy(actions_tx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap = actions->conf;
+			if (raw_encap->size >
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+				encap = 1;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			raw_decap = actions->conf;
+			if (raw_decap->size <
+			    (sizeof(struct rte_flow_item_eth) +
+			     sizeof(struct rte_flow_item_ipv4))) {
+				memcpy(actions_tx, actions,
+				       sizeof(struct rte_flow_action));
+				actions_tx++;
+			} else {
+				rte_memcpy(actions_rx, actions,
+					   sizeof(struct rte_flow_action));
+				actions_rx++;
+			}
+			break;
+		default:
+			rte_memcpy(actions_rx, actions,
+				   sizeof(struct rte_flow_action));
+			actions_rx++;
+			break;
+		}
+	}
+	/* Add set meta action and end action for the Rx flow. */
+	tag_action = actions_rx;
+	tag_action->type = MLX5_RTE_FLOW_ACTION_TYPE_TAG;
+	actions_rx++;
+	rte_memcpy(actions_rx, actions, sizeof(struct rte_flow_action));
+	actions_rx++;
+	set_tag = (void *)actions_rx;
+	set_tag->id = flow_get_reg_id(dev, MLX5_HAIRPIN_RX, 0, &error);
+	set_tag->data = rte_cpu_to_be_32(*flow_id);
+	tag_action->conf = set_tag;
+	/* Create Tx item list. */
+	rte_memcpy(actions_tx, actions, sizeof(struct rte_flow_action));
+	addr = (void *)&pattern_tx[2];
+	item = pattern_tx;
+	item->type = MLX5_RTE_FLOW_ITEM_TYPE_TAG;
+	tag_item = (void *)addr;
+	tag_item->data = rte_cpu_to_be_32(*flow_id);
+	tag_item->id = flow_get_reg_id(dev, MLX5_HAIRPIN_TX, 0, &error);
+	item->spec = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	tag_item = (void *)addr;
+	tag_item->data = UINT32_MAX;
+	tag_item->id = UINT16_MAX;
+	item->mask = tag_item;
+	addr += sizeof(struct mlx5_rte_flow_item_tag);
+	item->last = NULL;
+	item++;
+	item->type = RTE_FLOW_ITEM_TYPE_END;
+	return 0;
+}
+
+/**
  * Create a flow and add it to @p list.
  *
  * @param dev
@@ -2554,6 +2758,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		 const struct rte_flow_action actions[],
 		 bool external, struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow *flow = NULL;
 	struct mlx5_flow *dev_flow;
 	const struct rte_flow_action_rss *rss;
@@ -2561,16 +2766,44 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		struct rte_flow_expand_rss buf;
 		uint8_t buffer[2048];
 	} expand_buffer;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_rx;
+	union {
+		struct rte_flow_action actions[MLX5_MAX_SPLIT_ACTIONS];
+		uint8_t buffer[2048];
+	} actions_hairpin_tx;
+	union {
+		struct rte_flow_item items[MLX5_MAX_SPLIT_ITEMS];
+		uint8_t buffer[2048];
+	} items_tx;
 	struct rte_flow_expand_rss *buf = &expand_buffer.buf;
+	const struct rte_flow_action *p_actions_rx = actions;
 	int ret;
 	uint32_t i;
 	uint32_t flow_size;
+	int hairpin_flow = 0;
+	uint32_t hairpin_id = 0;
+	struct rte_flow_attr attr_tx = { .priority = 0 };
 
-	ret = flow_drv_validate(dev, attr, items, actions, external, error);
+	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
+	if (hairpin_flow > 0) {
+		if (hairpin_flow > MLX5_MAX_SPLIT_ACTIONS) {
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		flow_hairpin_split(dev, actions, actions_rx.actions,
+				   actions_hairpin_tx.actions, items_tx.items,
+				   &hairpin_id);
+		p_actions_rx = actions_rx.actions;
+	}
+	ret = flow_drv_validate(dev, attr, items, p_actions_rx, external,
+				error);
 	if (ret < 0)
-		return NULL;
+		goto error_before_flow;
 	flow_size = sizeof(struct rte_flow);
-	rss = flow_get_rss_action(actions);
+	rss = flow_get_rss_action(p_actions_rx);
 	if (rss)
 		flow_size += RTE_ALIGN_CEIL(rss->queue_num * sizeof(uint16_t),
 					    sizeof(void *));
@@ -2579,11 +2812,13 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	flow = rte_calloc(__func__, 1, flow_size, 0);
 	if (!flow) {
 		rte_errno = ENOMEM;
-		return NULL;
+		goto error_before_flow;
 	}
 	flow->drv_type = flow_get_drv_type(dev, attr);
 	flow->ingress = attr->ingress;
 	flow->transfer = attr->transfer;
+	if (hairpin_id != 0)
+		flow->hairpin_flow_id = hairpin_id;
 	assert(flow->drv_type > MLX5_FLOW_TYPE_MIN &&
 	       flow->drv_type < MLX5_FLOW_TYPE_MAX);
 	flow->queue = (void *)(flow + 1);
@@ -2604,7 +2839,7 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	}
 	for (i = 0; i < buf->entries; ++i) {
 		dev_flow = flow_drv_prepare(flow, attr, buf->entry[i].pattern,
-					    actions, error);
+					    p_actions_rx, error);
 		if (!dev_flow)
 			goto error;
 		dev_flow->flow = flow;
@@ -2612,7 +2847,24 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
 		ret = flow_drv_translate(dev, dev_flow, attr,
 					 buf->entry[i].pattern,
-					 actions, error);
+					 p_actions_rx, error);
+		if (ret < 0)
+			goto error;
+	}
+	/* Create the tx flow. */
+	if (hairpin_flow) {
+		attr_tx.group = MLX5_HAIRPIN_TX_TABLE;
+		attr_tx.ingress = 0;
+		attr_tx.egress = 1;
+		dev_flow = flow_drv_prepare(flow, &attr_tx, items_tx.items,
+					    actions_hairpin_tx.actions, error);
+		if (!dev_flow)
+			goto error;
+		dev_flow->flow = flow;
+		LIST_INSERT_HEAD(&flow->dev_flows, dev_flow, next);
+		ret = flow_drv_translate(dev, dev_flow, &attr_tx,
+					 items_tx.items,
+					 actions_hairpin_tx.actions, error);
 		if (ret < 0)
 			goto error;
 	}
@@ -2624,8 +2876,16 @@ uint32_t mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
 	TAILQ_INSERT_TAIL(list, flow, next);
 	flow_rxq_flags_set(dev, flow);
 	return flow;
+error_before_flow:
+	if (hairpin_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     hairpin_id);
+	return NULL;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	assert(flow);
 	flow_drv_destroy(dev, flow);
 	rte_free(flow);
@@ -2715,12 +2975,17 @@ struct rte_flow *
 flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 		  struct rte_flow *flow)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	/*
 	 * Update RX queue flags only if port is started, otherwise it is
 	 * already clean.
 	 */
 	if (dev->data->dev_started)
 		flow_rxq_flags_trim(dev, flow);
+	if (flow->hairpin_flow_id)
+		mlx5_flow_id_release(priv->sh->flow_id_pool,
+				     flow->hairpin_flow_id);
 	flow_drv_destroy(dev, flow);
 	TAILQ_REMOVE(list, flow, next);
 	rte_free(flow->fdir);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f81e1b1..7559810 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -466,6 +466,8 @@ struct mlx5_flow {
 	struct rte_flow *flow; /**< Pointer to the main flow. */
 	uint64_t layers;
 	/**< Bit-fields of present layers, see MLX5_FLOW_LAYER_*. */
+	uint64_t actions;
+	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	union {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		struct mlx5_flow_dv dv;
@@ -487,12 +489,11 @@ struct rte_flow {
 	uint16_t (*queue)[]; /**< Destination queues to redirect traffic to. */
 	LIST_HEAD(dev_flows, mlx5_flow) dev_flows;
 	/**< Device flows that are part of the flow. */
-	uint64_t actions;
-	/**< Bit-fields of detected actions, see MLX5_FLOW_ACTION_*. */
 	struct mlx5_fdir *fdir; /**< Pointer to associated FDIR if any. */
 	uint8_t ingress; /**< 1 if the flow is ingress. */
 	uint32_t group; /**< The group index. */
 	uint8_t transfer; /**< 1 if the flow is E-Switch flow. */
+	uint32_t hairpin_flow_id; /**< The flow id used for hairpin. */
 };
 
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
@@ -536,15 +537,6 @@ struct mlx5_flow_driver_ops {
 #define MLX5_CNT_CONTAINER_UNUSED(sh, batch, thread) (&(sh)->cmng.ccont \
 	[(~((sh)->cmng.mhi[batch] >> (thread)) & 0x1) * 2 + (batch)])
 
-/* ID generation structure. */
-struct mlx5_flow_id_pool {
-	uint32_t *free_arr; /**< Pointer to the a array of free values. */
-	uint32_t base_index;
-	/**< The next index that can be used without any free elements. */
-	uint32_t *curr; /**< Pointer to the index to pop. */
-	uint32_t *last; /**< Pointer to the last element in the empty arrray. */
-};
-
 /* mlx5_flow.c */
 
 struct mlx5_flow_id_pool *mlx5_flow_id_pool_alloc(void);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 13178cc..d9a7fd4 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5843,7 +5843,7 @@ struct field_modify_info modify_tcp[] = {
 			modify_action_position = actions_n++;
 	}
 	dev_flow->dv.actions_n = actions_n;
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int item_type = items->type;
@@ -6070,7 +6070,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		dv = &dev_flow->dv;
 		n = dv->actions_n;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			if (flow->transfer) {
 				dv->actions[n++] = priv->sh->esw_drop_action;
 			} else {
@@ -6085,7 +6085,7 @@ struct field_modify_info modify_tcp[] = {
 				}
 				dv->actions[n++] = dv->hrxq->action;
 			}
-		} else if (flow->actions &
+		} else if (dev_flow->actions &
 			   (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS)) {
 			struct mlx5_hrxq *hrxq;
 
@@ -6141,7 +6141,7 @@ struct field_modify_info modify_tcp[] = {
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		struct mlx5_flow_dv *dv = &dev_flow->dv;
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
@@ -6375,7 +6375,7 @@ struct field_modify_info modify_tcp[] = {
 			dv->flow = NULL;
 		}
 		if (dv->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, dv->hrxq);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 23110f2..fd27f6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -191,7 +191,7 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) || \
 	defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
-	if (flow->actions & MLX5_FLOW_ACTION_COUNT) {
+	if (flow->counter->cs) {
 		struct rte_flow_query_count *qc = data;
 		uint64_t counters[2] = {0, 0};
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
@@ -1410,7 +1410,6 @@
 		     const struct rte_flow_action actions[],
 		     struct rte_flow_error *error)
 {
-	struct rte_flow *flow = dev_flow->flow;
 	uint64_t item_flags = 0;
 	uint64_t action_flags = 0;
 	uint64_t priority = attr->priority;
@@ -1460,7 +1459,7 @@
 						  "action not supported");
 		}
 	}
-	flow->actions = action_flags;
+	dev_flow->actions = action_flags;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 
@@ -1592,7 +1591,7 @@
 			verbs->flow = NULL;
 		}
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
@@ -1656,7 +1655,7 @@
 
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
-		if (flow->actions & MLX5_FLOW_ACTION_DROP) {
+		if (dev_flow->actions & MLX5_FLOW_ACTION_DROP) {
 			verbs->hrxq = mlx5_hrxq_drop_new(dev);
 			if (!verbs->hrxq) {
 				rte_flow_error_set
@@ -1717,7 +1716,7 @@
 	LIST_FOREACH(dev_flow, &flow->dev_flows, next) {
 		verbs = &dev_flow->verbs;
 		if (verbs->hrxq) {
-			if (flow->actions & MLX5_FLOW_ACTION_DROP)
+			if (dev_flow->actions & MLX5_FLOW_ACTION_DROP)
 				mlx5_hrxq_drop_release(dev);
 			else
 				mlx5_hrxq_release(dev, verbs->hrxq);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2c3d5eb..24d0eaa 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2097,6 +2097,32 @@ struct mlx5_rxq_ctrl *
 }
 
 /**
+ * Get a Rx queue type.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   The Rx queue type.
+ */
+enum mlx5_rxq_type
+mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_ctrl *rxq_ctrl = NULL;
+
+	if ((*priv->rxqs)[idx]) {
+		rxq_ctrl = container_of((*priv->rxqs)[idx],
+					struct mlx5_rxq_ctrl,
+					rxq);
+		return rxq_ctrl->type;
+	}
+	return MLX5_RXQ_TYPE_UNDEFINED;
+}
+
+/**
  * Create an indirection table.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 271b648..d4ba25f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -166,6 +166,7 @@ enum mlx5_rxq_obj_type {
 enum mlx5_rxq_type {
 	MLX5_RXQ_TYPE_STANDARD, /* Standard Rx queue. */
 	MLX5_RXQ_TYPE_HAIRPIN, /* Hairpin Rx queue. */
+	MLX5_RXQ_TYPE_UNDEFINED,
 };
 
 /* Verbs/DevX Rx queue elements. */
@@ -406,6 +407,7 @@ struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint16_t *queues, uint32_t queues_n);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_verify(struct rte_eth_dev *dev);
+enum mlx5_rxq_type mlx5_rxq_get_type(struct rte_eth_dev *dev, uint16_t idx);
 struct mlx5_hrxq *mlx5_hrxq_drop_new(struct rte_eth_dev *dev);
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue Ori Kam
@ 2019-10-31  8:25     ` Andrew Rybchenko
  2019-11-05 11:24     ` Ferruh Yigit
  1 sibling, 0 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-10-31  8:25 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon, Ferruh Yigit
  Cc: dev, jingjing.wu, stephen

On 10/31/19 2:53 AM, Ori Kam wrote:
> This commit introduce hairpin queue type.
>
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
>
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
>
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
>
> Signed-off-by: Ori Kam <orika@mellanox.com>
> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

LGTM, one note below which could be fixed on applying

> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index e59d516..48b5389 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -288,4 +288,7 @@ EXPERIMENTAL {
>   	rte_eth_rx_burst_mode_get;
>   	rte_eth_tx_burst_mode_get;
>   	rte_eth_burst_mode_option_name;
> +	rte_eth_rx_hairpin_queue_setup;
> +	rte_eth_tx_hairpin_queue_setup;
> +	rte_eth_dev_hairpin_capability_get;
>   };

As I understand  rte_eth_dev_is_rx_hairpin_queue() and
rte_eth_dev_is_tx_hairpin_queue() should be listed above.
Yes, these functions are internal, but used in header
inline functions and should be visible outside.
We have discussed similar case recently in [1].

[1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support Ori Kam
@ 2019-10-31 17:11     ` Ferruh Yigit
  2019-10-31 17:36       ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Ferruh Yigit @ 2019-10-31 17:11 UTC (permalink / raw)
  To: Ori Kam, Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, stephen

On 10/30/2019 11:53 PM, Ori Kam wrote:
> This commit introduce the hairpin queues to the testpmd.
> the hairpin queue is configured using --hairpinq=<n>
> the hairpin queue adds n queue objects for both the total number
> of TX queues and RX queues.
> The connection between the queues are 1 to 1, first Rx hairpin queue
> will be connected to the first Tx hairpin queue
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  app/test-pmd/parameters.c |  28 ++++++++++++
>  app/test-pmd/testpmd.c    | 109 +++++++++++++++++++++++++++++++++++++++++++++-
>  app/test-pmd/testpmd.h    |   3 ++

New parameter should be documented in 'doc/guides/testpmd_app_ug/run_app.rst',
can you please describe befiefly how it works, what the mapping will like by
default etc..

<...>

> @@ -2028,6 +2076,11 @@ struct extmem_param {
>  	queueid_t qi;
>  	struct rte_port *port;
>  	struct rte_ether_addr mac_addr;
> +	struct rte_eth_hairpin_conf hairpin_conf = {
> +		.peer_count = 1,
> +	};
> +	int i;
> +	struct rte_eth_hairpin_cap cap;
>  
>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>  		return 0;
> @@ -2060,9 +2113,16 @@ struct extmem_param {
>  			configure_rxtx_dump_callbacks(0);
>  			printf("Configuring Port %d (socket %u)\n", pi,
>  					port->socket_id);
> +			if (nb_hairpinq > 0 &&
> +			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
> +				printf("Port %d doesn't support hairpin "
> +				       "queues\n", pi);
> +				return -1;
> +			}
>  			/* configure port */
> -			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
> -						&(port->dev_conf));
> +			diag = rte_eth_dev_configure(pi, nb_rxq + nb_hairpinq,
> +						     nb_txq + nb_hairpinq,
> +						     &(port->dev_conf));
>  			if (diag != 0) {
>  				if (rte_atomic16_cmpset(&(port->port_status),
>  				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
> @@ -2155,6 +2215,51 @@ struct extmem_param {
>  				port->need_reconfig_queues = 1;
>  				return -1;
>  			}
> +			/* setup hairpin queues */
> +			i = 0;
> +			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
> +				hairpin_conf.peers[0].port = pi;
> +				hairpin_conf.peers[0].queue = i + nb_rxq;
> +				diag = rte_eth_tx_hairpin_queue_setup
> +					(pi, qi, nb_txd, &hairpin_conf);
> +				i++;
> +				if (diag == 0)
> +					continue;
> +
> +				/* Fail to setup rx queue, return */
> +				if (rte_atomic16_cmpset(&(port->port_status),
> +							RTE_PORT_HANDLING,
> +							RTE_PORT_STOPPED) == 0)
> +					printf("Port %d can not be set back "
> +							"to stopped\n", pi);
> +				printf("Fail to configure port %d hairpin "
> +				       "queues\n", pi);
> +				/* try to reconfigure queues next time */
> +				port->need_reconfig_queues = 1;
> +				return -1;
> +			}
> +			i = 0;
> +			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
> +				hairpin_conf.peers[0].port = pi;
> +				hairpin_conf.peers[0].queue = i + nb_txq;
> +				diag = rte_eth_rx_hairpin_queue_setup
> +					(pi, qi, nb_rxd, &hairpin_conf);
> +				i++;
> +				if (diag == 0)
> +					continue;
> +
> +				/* Fail to setup rx queue, return */
> +				if (rte_atomic16_cmpset(&(port->port_status),
> +							RTE_PORT_HANDLING,
> +							RTE_PORT_STOPPED) == 0)
> +					printf("Port %d can not be set back "
> +							"to stopped\n", pi);
> +				printf("Fail to configure port %d hairpin "
> +				       "queues\n", pi);
> +				/* try to reconfigure queues next time */
> +				port->need_reconfig_queues = 1;
> +				return -1;
> +			}

'start_port()' function is already huge, what do you think moving hairpin
related setup into a specific function and call it when "nb_hairpinq > 0" only?
This makes the hairpin related config more clear and 'start_port()' function
simpler.
I think all hairpin related operations can be extracted, like capability check,
'rte_eth_dev_configure' & hairpin queue setup.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 00/14] add hairpin feature
  2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
                     ` (13 preceding siblings ...)
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 14/14] net/mlx5: split hairpin flows Ori Kam
@ 2019-10-31 17:13   ` Ferruh Yigit
  14 siblings, 0 replies; 186+ messages in thread
From: Ferruh Yigit @ 2019-10-31 17:13 UTC (permalink / raw)
  To: Ori Kam
  Cc: dev, jingjing.wu, stephen, wenzhuo.lu, bernard.iremonger, thomas,
	arybchenko, viacheslavo

On 10/30/2019 11:53 PM, Ori Kam wrote:
> This patch set implements the hairpin feature.
> The hairpin feature was introduced in RFC[1]
> 
> The hairpin feature (different name can be forward) acts as "bump on the wire",
> meaning that a packet that is received from the wire can be modified using
> offloaded action and then sent back to the wire without application intervention
> which save CPU cycles.
> 
> The hairpin is the inverse function of loopback in which application
> sends a packet then it is received again by the
> application without being sent to the wire.
> 
> The hairpin can be used by a number of different NVF, for example load
> balancer, gateway and so on.
> 
> As can be seen from the hairpin description, hairpin is basically RX queue
> connected to TX queue.
> 
> During the design phase I was thinking of two ways to implement this
> feature the first one is adding a new rte flow action. and the second
> one is create a special kind of queue.
> 
> The advantages of using the queue approch:
> 1. More control for the application. queue depth (the memory size that
> should be used).
> 2. Enable QoS. QoS is normaly a parametr of queue, so in this approch it
> will be easy to integrate with such system.
> 3. Native integression with the rte flow API. Just setting the target
> queue/rss to hairpin queue, will result that the traffic will be routed
> to the hairpin queue.
> 4. Enable queue offloading.
> 
> Each hairpin Rxq can be connected Txq / number of Txqs which can belong to a
> different ports assuming the PMD supports it. The same goes the other
> way each hairpin Txq can be connected to one or more Rxqs.
> This is the reason that both the Txq setup and Rxq setup are getting the
> hairpin configuration structure.
> 
> From PMD prespctive the number of Rxq/Txq is the total of standard
> queues + hairpin queues.
> 
> To configure hairpin queue the user should call
> rte_eth_rx_hairpin_queue_setup / rte_eth_tx_hairpin_queue_setup insteed
> of the normal queue setup functions.
> 
> The hairpin queues are not part of the normal RSS functiosn.
> 
> To use the queues the user simply create a flow that points to RSS/queue
> actions that are hairpin queues.
> The reason for selecting 2 new functions for hairpin queue setup are:
> 1. avoid API break.
> 2. avoid extra and unused parameters.
> 
> 
> 
> [1] https://inbox.dpdk.org/dev/1565703468-55617-1-git-send-email-orika@mellanox.com/
> 
> Cc: wenzhuo.lu@intel.com
> Cc: bernard.iremonger@intel.com
> Cc: thomas@monjalon.net
> Cc: ferruh.yigit@intel.com
> Cc: arybchenko@solarflare.com
> Cc: viacheslavo@mellanox.com
> 
> ------
> V7:
>  - all changes are in patch 2: ethdev: add support for hairpin queue
>    - Move is_rx/tx_hairpin_queue to ethdev.c and ethdev.h also remove the inline.
>    - change checks for max number of hairpin queues.
>    - modify log messages.
> 
> V6:
>  - add missing include in nfb driver.
>  - change comparing of rte_eth_dev_is_tx_hairpin_queue /
>    rte_eth_dev_is_rx_hairpin_queue to boolean operator.
>  - split the doc patch to the relevant patches.
> 
> V5:
>  - modify log messages to be more distinct.
>  - set that log message will be in the same line even if > 80.
>  - change peer_n to peer_count.
>  - add functions to get if queue is hairpin queue.
> 
> V4:
>  - update according to comments from ML.
> 
> V3:
>  - update according to comments from ML.
> 
> V2:
>  - update according to comments from ML.
> 
> 
> 
> 
> Ori Kam (14):
>   ethdev: move queue state defines to private file
>   ethdev: add support for hairpin queue
>   net/mlx5: query hca hairpin capabilities
>   net/mlx5: support Rx hairpin queues
>   net/mlx5: prepare txq to work with different types
>   net/mlx5: support Tx hairpin queues
>   net/mlx5: add get hairpin capabilities
>   app/testpmd: add hairpin support
>   net/mlx5: add hairpin binding function
>   net/mlx5: add support for hairpin hrxq
>   net/mlx5: add internal tag item and action
>   net/mlx5: add id generation function
>   net/mlx5: add default flows for hairpin
>   net/mlx5: split hairpin flows

Series applied to dpdk-next-net/master, thanks.

Except testpmd patch (8/14), it can be worked on separately.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
  2019-10-31 17:11     ` Ferruh Yigit
@ 2019-10-31 17:36       ` Ori Kam
  2019-10-31 17:54         ` Ferruh Yigit
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-10-31 17:36 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, stephen

Hi Ferruh,

Thanks, for the comments PSB,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, October 31, 2019 7:12 PM
> To: Ori Kam <orika@mellanox.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> Jingjing Wu <jingjing.wu@intel.com>; Bernard Iremonger
> <bernard.iremonger@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
> 
> On 10/30/2019 11:53 PM, Ori Kam wrote:
> > This commit introduce the hairpin queues to the testpmd.
> > the hairpin queue is configured using --hairpinq=<n>
> > the hairpin queue adds n queue objects for both the total number
> > of TX queues and RX queues.
> > The connection between the queues are 1 to 1, first Rx hairpin queue
> > will be connected to the first Tx hairpin queue
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  app/test-pmd/parameters.c |  28 ++++++++++++
> >  app/test-pmd/testpmd.c    | 109
> +++++++++++++++++++++++++++++++++++++++++++++-
> >  app/test-pmd/testpmd.h    |   3 ++
> 
> New parameter should be documented in
> 'doc/guides/testpmd_app_ug/run_app.rst',
> can you please describe befiefly how it works, what the mapping will like by
> default etc..
> 

The default is no hairpin queues,
If hairpinq = x is specified then we are adding x queues to the Rx queue list and x queues to the Tx, and bind them one
Rx queue to on Tx queue.

For example: test pmd parameters are:
--rxq=3 --txq=2 --hairpinq=4 the result will be:

3 normal Txq (queues 0,1,2)
2 normal Txq (queues 0,1)  
4 hairpin queues (Rxq 3,4,5,6 Txq 2,3,4,5) while Rxq(3) will be connected to Txq(2), Rxq(4) will be connected to Txq(3) and so on.

> <...>
> 
> > @@ -2028,6 +2076,11 @@ struct extmem_param {
> >  	queueid_t qi;
> >  	struct rte_port *port;
> >  	struct rte_ether_addr mac_addr;
> > +	struct rte_eth_hairpin_conf hairpin_conf = {
> > +		.peer_count = 1,
> > +	};
> > +	int i;
> > +	struct rte_eth_hairpin_cap cap;
> >
> >  	if (port_id_is_invalid(pid, ENABLED_WARN))
> >  		return 0;
> > @@ -2060,9 +2113,16 @@ struct extmem_param {
> >  			configure_rxtx_dump_callbacks(0);
> >  			printf("Configuring Port %d (socket %u)\n", pi,
> >  					port->socket_id);
> > +			if (nb_hairpinq > 0 &&
> > +			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
> > +				printf("Port %d doesn't support hairpin "
> > +				       "queues\n", pi);
> > +				return -1;
> > +			}
> >  			/* configure port */
> > -			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
> > -						&(port->dev_conf));
> > +			diag = rte_eth_dev_configure(pi, nb_rxq +
> nb_hairpinq,
> > +						     nb_txq + nb_hairpinq,
> > +						     &(port->dev_conf));
> >  			if (diag != 0) {
> >  				if (rte_atomic16_cmpset(&(port->port_status),
> >  				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> > @@ -2155,6 +2215,51 @@ struct extmem_param {
> >  				port->need_reconfig_queues = 1;
> >  				return -1;
> >  			}
> > +			/* setup hairpin queues */
> > +			i = 0;
> > +			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
> > +				hairpin_conf.peers[0].port = pi;
> > +				hairpin_conf.peers[0].queue = i + nb_rxq;
> > +				diag = rte_eth_tx_hairpin_queue_setup
> > +					(pi, qi, nb_txd, &hairpin_conf);
> > +				i++;
> > +				if (diag == 0)
> > +					continue;
> > +
> > +				/* Fail to setup rx queue, return */
> > +				if (rte_atomic16_cmpset(&(port->port_status),
> > +
> 	RTE_PORT_HANDLING,
> > +							RTE_PORT_STOPPED)
> == 0)
> > +					printf("Port %d can not be set back "
> > +							"to stopped\n", pi);
> > +				printf("Fail to configure port %d hairpin "
> > +				       "queues\n", pi);
> > +				/* try to reconfigure queues next time */
> > +				port->need_reconfig_queues = 1;
> > +				return -1;
> > +			}
> > +			i = 0;
> > +			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
> > +				hairpin_conf.peers[0].port = pi;
> > +				hairpin_conf.peers[0].queue = i + nb_txq;
> > +				diag = rte_eth_rx_hairpin_queue_setup
> > +					(pi, qi, nb_rxd, &hairpin_conf);
> > +				i++;
> > +				if (diag == 0)
> > +					continue;
> > +
> > +				/* Fail to setup rx queue, return */
> > +				if (rte_atomic16_cmpset(&(port->port_status),
> > +
> 	RTE_PORT_HANDLING,
> > +							RTE_PORT_STOPPED)
> == 0)
> > +					printf("Port %d can not be set back "
> > +							"to stopped\n", pi);
> > +				printf("Fail to configure port %d hairpin "
> > +				       "queues\n", pi);
> > +				/* try to reconfigure queues next time */
> > +				port->need_reconfig_queues = 1;
> > +				return -1;
> > +			}
> 
> 'start_port()' function is already huge, what do you think moving hairpin
> related setup into a specific function and call it when "nb_hairpinq > 0" only?
> This makes the hairpin related config more clear and 'start_port()' function
> simpler.
> I think all hairpin related operations can be extracted, like capability check,
> 'rte_eth_dev_configure' & hairpin queue setup.

I have no strong feeling, I just wanted to keep the function with the same format that all actions are done inside
the function, also it was my intention to easily show the connection between the two types of queues.

Best,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
  2019-10-31 17:36       ` Ori Kam
@ 2019-10-31 17:54         ` Ferruh Yigit
  2019-10-31 18:59           ` Ori Kam
  0 siblings, 1 reply; 186+ messages in thread
From: Ferruh Yigit @ 2019-10-31 17:54 UTC (permalink / raw)
  To: Ori Kam, Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, stephen

On 10/31/2019 5:36 PM, Ori Kam wrote:
> Hi Ferruh,
> 
> Thanks, for the comments PSB,
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Thursday, October 31, 2019 7:12 PM
>> To: Ori Kam <orika@mellanox.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
>> Jingjing Wu <jingjing.wu@intel.com>; Bernard Iremonger
>> <bernard.iremonger@intel.com>
>> Cc: dev@dpdk.org; stephen@networkplumber.org
>> Subject: Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
>>
>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>> This commit introduce the hairpin queues to the testpmd.
>>> the hairpin queue is configured using --hairpinq=<n>
>>> the hairpin queue adds n queue objects for both the total number
>>> of TX queues and RX queues.
>>> The connection between the queues are 1 to 1, first Rx hairpin queue
>>> will be connected to the first Tx hairpin queue
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>> ---
>>>  app/test-pmd/parameters.c |  28 ++++++++++++
>>>  app/test-pmd/testpmd.c    | 109
>> +++++++++++++++++++++++++++++++++++++++++++++-
>>>  app/test-pmd/testpmd.h    |   3 ++
>>
>> New parameter should be documented in
>> 'doc/guides/testpmd_app_ug/run_app.rst',
>> can you please describe befiefly how it works, what the mapping will like by
>> default etc..
>>
> 
> The default is no hairpin queues,
> If hairpinq = x is specified then we are adding x queues to the Rx queue list and x queues to the Tx, and bind them one
> Rx queue to on Tx queue.
> 
> For example: test pmd parameters are:
> --rxq=3 --txq=2 --hairpinq=4 the result will be:
> 
> 3 normal Txq (queues 0,1,2)
> 2 normal Txq (queues 0,1)  
> 4 hairpin queues (Rxq 3,4,5,6 Txq 2,3,4,5) while Rxq(3) will be connected to Txq(2), Rxq(4) will be connected to Txq(3) and so on.

Thanks, can you please put them into documentation in next version?

> 
>> <...>
>>
>>> @@ -2028,6 +2076,11 @@ struct extmem_param {
>>>  	queueid_t qi;
>>>  	struct rte_port *port;
>>>  	struct rte_ether_addr mac_addr;
>>> +	struct rte_eth_hairpin_conf hairpin_conf = {
>>> +		.peer_count = 1,
>>> +	};
>>> +	int i;
>>> +	struct rte_eth_hairpin_cap cap;
>>>
>>>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>>>  		return 0;
>>> @@ -2060,9 +2113,16 @@ struct extmem_param {
>>>  			configure_rxtx_dump_callbacks(0);
>>>  			printf("Configuring Port %d (socket %u)\n", pi,
>>>  					port->socket_id);
>>> +			if (nb_hairpinq > 0 &&
>>> +			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
>>> +				printf("Port %d doesn't support hairpin "
>>> +				       "queues\n", pi);
>>> +				return -1;
>>> +			}
>>>  			/* configure port */
>>> -			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
>>> -						&(port->dev_conf));
>>> +			diag = rte_eth_dev_configure(pi, nb_rxq +
>> nb_hairpinq,
>>> +						     nb_txq + nb_hairpinq,
>>> +						     &(port->dev_conf));
>>>  			if (diag != 0) {
>>>  				if (rte_atomic16_cmpset(&(port->port_status),
>>>  				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
>> == 0)
>>> @@ -2155,6 +2215,51 @@ struct extmem_param {
>>>  				port->need_reconfig_queues = 1;
>>>  				return -1;
>>>  			}
>>> +			/* setup hairpin queues */
>>> +			i = 0;
>>> +			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
>>> +				hairpin_conf.peers[0].port = pi;
>>> +				hairpin_conf.peers[0].queue = i + nb_rxq;
>>> +				diag = rte_eth_tx_hairpin_queue_setup
>>> +					(pi, qi, nb_txd, &hairpin_conf);
>>> +				i++;
>>> +				if (diag == 0)
>>> +					continue;
>>> +
>>> +				/* Fail to setup rx queue, return */
>>> +				if (rte_atomic16_cmpset(&(port->port_status),
>>> +
>> 	RTE_PORT_HANDLING,
>>> +							RTE_PORT_STOPPED)
>> == 0)
>>> +					printf("Port %d can not be set back "
>>> +							"to stopped\n", pi);
>>> +				printf("Fail to configure port %d hairpin "
>>> +				       "queues\n", pi);
>>> +				/* try to reconfigure queues next time */
>>> +				port->need_reconfig_queues = 1;
>>> +				return -1;
>>> +			}
>>> +			i = 0;
>>> +			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
>>> +				hairpin_conf.peers[0].port = pi;
>>> +				hairpin_conf.peers[0].queue = i + nb_txq;
>>> +				diag = rte_eth_rx_hairpin_queue_setup
>>> +					(pi, qi, nb_rxd, &hairpin_conf);
>>> +				i++;
>>> +				if (diag == 0)
>>> +					continue;
>>> +
>>> +				/* Fail to setup rx queue, return */
>>> +				if (rte_atomic16_cmpset(&(port->port_status),
>>> +
>> 	RTE_PORT_HANDLING,
>>> +							RTE_PORT_STOPPED)
>> == 0)
>>> +					printf("Port %d can not be set back "
>>> +							"to stopped\n", pi);
>>> +				printf("Fail to configure port %d hairpin "
>>> +				       "queues\n", pi);
>>> +				/* try to reconfigure queues next time */
>>> +				port->need_reconfig_queues = 1;
>>> +				return -1;
>>> +			}
>>
>> 'start_port()' function is already huge, what do you think moving hairpin
>> related setup into a specific function and call it when "nb_hairpinq > 0" only?
>> This makes the hairpin related config more clear and 'start_port()' function
>> simpler.
>> I think all hairpin related operations can be extracted, like capability check,
>> 'rte_eth_dev_configure' & hairpin queue setup.
> 
> I have no strong feeling, I just wanted to keep the function with the same format that all actions are done inside
> the function, also it was my intention to easily show the connection between the two types of queues.
> 
> Best,
> Ori
> 


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
  2019-10-31 17:54         ` Ferruh Yigit
@ 2019-10-31 18:59           ` Ori Kam
  0 siblings, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-10-31 18:59 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Jingjing Wu, Bernard Iremonger; +Cc: dev, stephen



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, October 31, 2019 7:55 PM
> To: Ori Kam <orika@mellanox.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> Jingjing Wu <jingjing.wu@intel.com>; Bernard Iremonger
> <bernard.iremonger@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
> 
> On 10/31/2019 5:36 PM, Ori Kam wrote:
> > Hi Ferruh,
> >
> > Thanks, for the comments PSB,
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Thursday, October 31, 2019 7:12 PM
> >> To: Ori Kam <orika@mellanox.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>;
> >> Jingjing Wu <jingjing.wu@intel.com>; Bernard Iremonger
> >> <bernard.iremonger@intel.com>
> >> Cc: dev@dpdk.org; stephen@networkplumber.org
> >> Subject: Re: [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support
> >>
> >> On 10/30/2019 11:53 PM, Ori Kam wrote:
> >>> This commit introduce the hairpin queues to the testpmd.
> >>> the hairpin queue is configured using --hairpinq=<n>
> >>> the hairpin queue adds n queue objects for both the total number
> >>> of TX queues and RX queues.
> >>> The connection between the queues are 1 to 1, first Rx hairpin queue
> >>> will be connected to the first Tx hairpin queue
> >>>
> >>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >>> ---
> >>>  app/test-pmd/parameters.c |  28 ++++++++++++
> >>>  app/test-pmd/testpmd.c    | 109
> >> +++++++++++++++++++++++++++++++++++++++++++++-
> >>>  app/test-pmd/testpmd.h    |   3 ++
> >>
> >> New parameter should be documented in
> >> 'doc/guides/testpmd_app_ug/run_app.rst',
> >> can you please describe befiefly how it works, what the mapping will like by
> >> default etc..
> >>
> >
> > The default is no hairpin queues,
> > If hairpinq = x is specified then we are adding x queues to the Rx queue list
> and x queues to the Tx, and bind them one
> > Rx queue to on Tx queue.
> >
> > For example: test pmd parameters are:
> > --rxq=3 --txq=2 --hairpinq=4 the result will be:
> >
> > 3 normal Txq (queues 0,1,2)
> > 2 normal Txq (queues 0,1)
> > 4 hairpin queues (Rxq 3,4,5,6 Txq 2,3,4,5) while Rxq(3) will be connected to
> Txq(2), Rxq(4) will be connected to Txq(3) and so on.
> 
> Thanks, can you please put them into documentation in next version?
> 

Sure no problem,

> >
> >> <...>
> >>
> >>> @@ -2028,6 +2076,11 @@ struct extmem_param {
> >>>  	queueid_t qi;
> >>>  	struct rte_port *port;
> >>>  	struct rte_ether_addr mac_addr;
> >>> +	struct rte_eth_hairpin_conf hairpin_conf = {
> >>> +		.peer_count = 1,
> >>> +	};
> >>> +	int i;
> >>> +	struct rte_eth_hairpin_cap cap;
> >>>
> >>>  	if (port_id_is_invalid(pid, ENABLED_WARN))
> >>>  		return 0;
> >>> @@ -2060,9 +2113,16 @@ struct extmem_param {
> >>>  			configure_rxtx_dump_callbacks(0);
> >>>  			printf("Configuring Port %d (socket %u)\n", pi,
> >>>  					port->socket_id);
> >>> +			if (nb_hairpinq > 0 &&
> >>> +			    rte_eth_dev_hairpin_capability_get(pi, &cap)) {
> >>> +				printf("Port %d doesn't support hairpin "
> >>> +				       "queues\n", pi);
> >>> +				return -1;
> >>> +			}
> >>>  			/* configure port */
> >>> -			diag = rte_eth_dev_configure(pi, nb_rxq, nb_txq,
> >>> -						&(port->dev_conf));
> >>> +			diag = rte_eth_dev_configure(pi, nb_rxq +
> >> nb_hairpinq,
> >>> +						     nb_txq + nb_hairpinq,
> >>> +						     &(port->dev_conf));
> >>>  			if (diag != 0) {
> >>>  				if (rte_atomic16_cmpset(&(port->port_status),
> >>>  				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> >> == 0)
> >>> @@ -2155,6 +2215,51 @@ struct extmem_param {
> >>>  				port->need_reconfig_queues = 1;
> >>>  				return -1;
> >>>  			}
> >>> +			/* setup hairpin queues */
> >>> +			i = 0;
> >>> +			for (qi = nb_txq; qi < nb_hairpinq + nb_txq; qi++) {
> >>> +				hairpin_conf.peers[0].port = pi;
> >>> +				hairpin_conf.peers[0].queue = i + nb_rxq;
> >>> +				diag = rte_eth_tx_hairpin_queue_setup
> >>> +					(pi, qi, nb_txd, &hairpin_conf);
> >>> +				i++;
> >>> +				if (diag == 0)
> >>> +					continue;
> >>> +
> >>> +				/* Fail to setup rx queue, return */
> >>> +				if (rte_atomic16_cmpset(&(port->port_status),
> >>> +
> >> 	RTE_PORT_HANDLING,
> >>> +							RTE_PORT_STOPPED)
> >> == 0)
> >>> +					printf("Port %d can not be set back "
> >>> +							"to stopped\n", pi);
> >>> +				printf("Fail to configure port %d hairpin "
> >>> +				       "queues\n", pi);
> >>> +				/* try to reconfigure queues next time */
> >>> +				port->need_reconfig_queues = 1;
> >>> +				return -1;
> >>> +			}
> >>> +			i = 0;
> >>> +			for (qi = nb_rxq; qi < nb_hairpinq + nb_rxq; qi++) {
> >>> +				hairpin_conf.peers[0].port = pi;
> >>> +				hairpin_conf.peers[0].queue = i + nb_txq;
> >>> +				diag = rte_eth_rx_hairpin_queue_setup
> >>> +					(pi, qi, nb_rxd, &hairpin_conf);
> >>> +				i++;
> >>> +				if (diag == 0)
> >>> +					continue;
> >>> +
> >>> +				/* Fail to setup rx queue, return */
> >>> +				if (rte_atomic16_cmpset(&(port->port_status),
> >>> +
> >> 	RTE_PORT_HANDLING,
> >>> +							RTE_PORT_STOPPED)
> >> == 0)
> >>> +					printf("Port %d can not be set back "
> >>> +							"to stopped\n", pi);
> >>> +				printf("Fail to configure port %d hairpin "
> >>> +				       "queues\n", pi);
> >>> +				/* try to reconfigure queues next time */
> >>> +				port->need_reconfig_queues = 1;
> >>> +				return -1;
> >>> +			}
> >>
> >> 'start_port()' function is already huge, what do you think moving hairpin
> >> related setup into a specific function and call it when "nb_hairpinq > 0"
> only?
> >> This makes the hairpin related config more clear and 'start_port()' function
> >> simpler.
> >> I think all hairpin related operations can be extracted, like capability check,
> >> 'rte_eth_dev_configure' & hairpin queue setup.
> >
> > I have no strong feeling, I just wanted to keep the function with the same
> format that all actions are done inside
> > the function, also it was my intention to easily show the connection between
> the two types of queues.
> >
> > Best,
> > Ori
> >

Thanks,
Ori

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue Ori Kam
  2019-10-31  8:25     ` Andrew Rybchenko
@ 2019-11-05 11:24     ` Ferruh Yigit
  2019-11-05 11:36       ` Ori Kam
  1 sibling, 1 reply; 186+ messages in thread
From: Ferruh Yigit @ 2019-11-05 11:24 UTC (permalink / raw)
  To: Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon,
	Andrew Rybchenko
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 10/30/2019 11:53 PM, Ori Kam wrote:
> This commit introduce hairpin queue type.
> 
> The hairpin queue in build from Rx queue binded to Tx queue.
> It is used to offload traffic coming from the wire and redirect it back
> to the wire.
> 
> There are 3 new functions:
> - rte_eth_dev_hairpin_capability_get
> - rte_eth_rx_hairpin_queue_setup
> - rte_eth_tx_hairpin_queue_setup
> 
> In order to use the queue, there is a need to create rte_flow
> with queue / RSS action that targets one or more of the Rx queues.
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>

<...>

>  #include <rte_ethdev_core.h>
>  
>  /**
> + * @internal
> + * Check if the selected Rx queue is hairpin queue.
> + *
> + * @param dev
> + *  Pointer to the selected device.
> + * @param queue_id
> + *  The selected queue.
> + *
> + * @return
> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> + */
> +int
> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id);
> +
> +/**
> + * @internal
> + * Check if the selected Tx queue is hairpin queue.
> + *
> + * @param dev
> + *  Pointer to the selected device.
> + * @param queue_id
> + *  The selected queue.
> + *
> + * @return
> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> + */
> +int
> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t queue_id);
> +
> +/**

If these functions are internal why there are in 'rte_ethdev.h' ?

>   *
>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t port_id,
>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
>  		return 0;
>  	}
> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is hairpin queue\n",
> +			       queue_id);
> +		return 0;
> +	}
>  #endif
>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>  				     rx_pkts, nb_pkts);
> @@ -4517,6 +4671,11 @@ static inline int rte_eth_tx_descriptor_status(uint16_t port_id,
>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n", queue_id);
>  		return 0;
>  	}
> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is hairpin queue\n",
> +			       queue_id);
> +		return 0;
> +	}
>  #endif

Hi Ori,

These are causing build error, thanks Jerin for catching, because they are
internal and called by a public static inline API, so whoever calls
'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],

as far as I can see there are two options:
1) Remove these checks
2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal

If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API we
should go with (2) else (1).



[1]
/usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
`rte_eth_dev_is_rx_hairpin_queue'
/usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
`rte_eth_dev_is_rx_hairpin_queue'
/usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
`rte_eth_dev_is_tx_hairpin_queue'
/usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
`rte_eth_dev_is_tx_hairpin_queue'
/usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
`rte_eth_dev_is_tx_hairpin_queue'
collect2: error: ld returned 1 exit status

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 11:24     ` Ferruh Yigit
@ 2019-11-05 11:36       ` Ori Kam
  2019-11-05 11:49         ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-11-05 11:36 UTC (permalink / raw)
  To: Ferruh Yigit, John McNamara, Marko Kovacevic, Thomas Monjalon,
	Andrew Rybchenko
  Cc: dev, jingjing.wu, stephen, Jerin Jacob



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Tuesday, November 5, 2019 1:25 PM
> To: Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
> Jacob <jerin.jacob@caviumnetworks.com>
> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
> 
> On 10/30/2019 11:53 PM, Ori Kam wrote:
> > This commit introduce hairpin queue type.
> >
> > The hairpin queue in build from Rx queue binded to Tx queue.
> > It is used to offload traffic coming from the wire and redirect it back
> > to the wire.
> >
> > There are 3 new functions:
> > - rte_eth_dev_hairpin_capability_get
> > - rte_eth_rx_hairpin_queue_setup
> > - rte_eth_tx_hairpin_queue_setup
> >
> > In order to use the queue, there is a need to create rte_flow
> > with queue / RSS action that targets one or more of the Rx queues.
> >
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> <...>
> 
> >  #include <rte_ethdev_core.h>
> >
> >  /**
> > + * @internal
> > + * Check if the selected Rx queue is hairpin queue.
> > + *
> > + * @param dev
> > + *  Pointer to the selected device.
> > + * @param queue_id
> > + *  The selected queue.
> > + *
> > + * @return
> > + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > + */
> > +int
> > +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> queue_id);
> > +
> > +/**
> > + * @internal
> > + * Check if the selected Tx queue is hairpin queue.
> > + *
> > + * @param dev
> > + *  Pointer to the selected device.
> > + * @param queue_id
> > + *  The selected queue.
> > + *
> > + * @return
> > + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > + */
> > +int
> > +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> queue_id);
> > +
> > +/**
> 
> If these functions are internal why there are in 'rte_ethdev.h' ?
> 
> >   *
> >   * Retrieve a burst of input packets from a receive queue of an Ethernet
> >   * device. The retrieved packets are stored in *rte_mbuf* structures whose
> > @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> port_id,
> >  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> queue_id);
> >  		return 0;
> >  	}
> > +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> > +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
> hairpin queue\n",
> > +			       queue_id);
> > +		return 0;
> > +	}
> >  #endif
> >  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> >  				     rx_pkts, nb_pkts);
> > @@ -4517,6 +4671,11 @@ static inline int
> rte_eth_tx_descriptor_status(uint16_t port_id,
> >  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> queue_id);
> >  		return 0;
> >  	}
> > +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> > +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
> hairpin queue\n",
> > +			       queue_id);
> > +		return 0;
> > +	}
> >  #endif
> 
> Hi Ori,
> 
> These are causing build error, thanks Jerin for catching, because they are
> internal and called by a public static inline API, so whoever calls
> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> 
> as far as I can see there are two options:
> 1) Remove these checks
> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
> 
> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
> we
> should go with (2) else (1).
>

I think we can skip the tests,
But it was Andrew request so we must get is response.
It was also his empathies that they should be internal.


> 
> 
> [1]
> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
> `rte_eth_dev_is_rx_hairpin_queue'
> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
> `rte_eth_dev_is_rx_hairpin_queue'
> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
> `rte_eth_dev_is_tx_hairpin_queue'
> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
> `rte_eth_dev_is_tx_hairpin_queue'
> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
> `rte_eth_dev_is_tx_hairpin_queue'
> collect2: error: ld returned 1 exit status

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 11:36       ` Ori Kam
@ 2019-11-05 11:49         ` Andrew Rybchenko
  2019-11-05 12:00           ` Ori Kam
  2019-11-05 12:05           ` Ferruh Yigit
  0 siblings, 2 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 11:49 UTC (permalink / raw)
  To: Ori Kam, Ferruh Yigit, John McNamara, Marko Kovacevic, Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/19 2:36 PM, Ori Kam wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Tuesday, November 5, 2019 1:25 PM
>> To: Ori Kam <orika@mellanox.com>; John McNamara
>> <john.mcnamara@intel.com>; Marko Kovacevic
>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>> Andrew Rybchenko <arybchenko@solarflare.com>
>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
>> Jacob <jerin.jacob@caviumnetworks.com>
>> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
>>
>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>> This commit introduce hairpin queue type.
>>>
>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>> It is used to offload traffic coming from the wire and redirect it back
>>> to the wire.
>>>
>>> There are 3 new functions:
>>> - rte_eth_dev_hairpin_capability_get
>>> - rte_eth_rx_hairpin_queue_setup
>>> - rte_eth_tx_hairpin_queue_setup
>>>
>>> In order to use the queue, there is a need to create rte_flow
>>> with queue / RSS action that targets one or more of the Rx queues.
>>>
>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>
>> <...>
>>
>>>  #include <rte_ethdev_core.h>
>>>
>>>  /**
>>> + * @internal
>>> + * Check if the selected Rx queue is hairpin queue.
>>> + *
>>> + * @param dev
>>> + *  Pointer to the selected device.
>>> + * @param queue_id
>>> + *  The selected queue.
>>> + *
>>> + * @return
>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>> + */
>>> +int
>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>> queue_id);
>>> +
>>> +/**
>>> + * @internal
>>> + * Check if the selected Tx queue is hairpin queue.
>>> + *
>>> + * @param dev
>>> + *  Pointer to the selected device.
>>> + * @param queue_id
>>> + *  The selected queue.
>>> + *
>>> + * @return
>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>> + */
>>> +int
>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>> queue_id);
>>> +
>>> +/**
>>
>> If these functions are internal why there are in 'rte_ethdev.h' ?
>>
>>>   *
>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
>>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>> port_id,
>>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>> queue_id);
>>>  		return 0;
>>>  	}
>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
>> hairpin queue\n",
>>> +			       queue_id);
>>> +		return 0;
>>> +	}
>>>  #endif
>>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>  				     rx_pkts, nb_pkts);
>>> @@ -4517,6 +4671,11 @@ static inline int
>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>> queue_id);
>>>  		return 0;
>>>  	}
>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
>> hairpin queue\n",
>>> +			       queue_id);
>>> +		return 0;
>>> +	}
>>>  #endif
>>
>> Hi Ori,
>>
>> These are causing build error, thanks Jerin for catching, because they are
>> internal and called by a public static inline API, so whoever calls
>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>
>> as far as I can see there are two options:
>> 1) Remove these checks
>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>
>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>> we
>> should go with (2) else (1).
>>
> 
> I think we can skip the tests,
> But it was Andrew request so we must get is response.
> It was also his empathies that they should be internal.

It is important for me to keep rte_eth_dev_state internal and
few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
I'm OK to make the function experimental or keep it internal
(no API/ABI stability requirements) but externally visible (in .map).

>> [1]
>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
>> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
>> `rte_eth_dev_is_rx_hairpin_queue'
>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
>> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
>> `rte_eth_dev_is_rx_hairpin_queue'
>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
>> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
>> `rte_eth_dev_is_tx_hairpin_queue'
>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
>> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
>> `rte_eth_dev_is_tx_hairpin_queue'
>> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
>> `rte_eth_dev_is_tx_hairpin_queue'
>> collect2: error: ld returned 1 exit status


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 11:49         ` Andrew Rybchenko
@ 2019-11-05 12:00           ` Ori Kam
  2019-11-05 12:05           ` Ferruh Yigit
  1 sibling, 0 replies; 186+ messages in thread
From: Ori Kam @ 2019-11-05 12:00 UTC (permalink / raw)
  To: Andrew Rybchenko, Ferruh Yigit, John McNamara, Marko Kovacevic,
	Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob



> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Tuesday, November 5, 2019 1:49 PM
> To: Ori Kam <orika@mellanox.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
> Jacob <jerin.jacob@caviumnetworks.com>
> Subject: Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
> 
> On 11/5/19 2:36 PM, Ori Kam wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Tuesday, November 5, 2019 1:25 PM
> >> To: Ori Kam <orika@mellanox.com>; John McNamara
> >> <john.mcnamara@intel.com>; Marko Kovacevic
> >> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> >> Andrew Rybchenko <arybchenko@solarflare.com>
> >> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org;
> Jerin
> >> Jacob <jerin.jacob@caviumnetworks.com>
> >> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
> >>
> >> On 10/30/2019 11:53 PM, Ori Kam wrote:
> >>> This commit introduce hairpin queue type.
> >>>
> >>> The hairpin queue in build from Rx queue binded to Tx queue.
> >>> It is used to offload traffic coming from the wire and redirect it back
> >>> to the wire.
> >>>
> >>> There are 3 new functions:
> >>> - rte_eth_dev_hairpin_capability_get
> >>> - rte_eth_rx_hairpin_queue_setup
> >>> - rte_eth_tx_hairpin_queue_setup
> >>>
> >>> In order to use the queue, there is a need to create rte_flow
> >>> with queue / RSS action that targets one or more of the Rx queues.
> >>>
> >>> Signed-off-by: Ori Kam <orika@mellanox.com>
> >>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
> >>
> >> <...>
> >>
> >>>  #include <rte_ethdev_core.h>
> >>>
> >>>  /**
> >>> + * @internal
> >>> + * Check if the selected Rx queue is hairpin queue.
> >>> + *
> >>> + * @param dev
> >>> + *  Pointer to the selected device.
> >>> + * @param queue_id
> >>> + *  The selected queue.
> >>> + *
> >>> + * @return
> >>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>> + */
> >>> +int
> >>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >> queue_id);
> >>> +
> >>> +/**
> >>> + * @internal
> >>> + * Check if the selected Tx queue is hairpin queue.
> >>> + *
> >>> + * @param dev
> >>> + *  Pointer to the selected device.
> >>> + * @param queue_id
> >>> + *  The selected queue.
> >>> + *
> >>> + * @return
> >>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>> + */
> >>> +int
> >>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >> queue_id);
> >>> +
> >>> +/**
> >>
> >> If these functions are internal why there are in 'rte_ethdev.h' ?
> >>
> >>>   *
> >>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
> >>>   * device. The retrieved packets are stored in *rte_mbuf* structures
> whose
> >>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
> >> port_id,
> >>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> >> queue_id);
> >>>  		return 0;
> >>>  	}
> >>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
> >> hairpin queue\n",
> >>> +			       queue_id);
> >>> +		return 0;
> >>> +	}
> >>>  #endif
> >>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> >>>  				     rx_pkts, nb_pkts);
> >>> @@ -4517,6 +4671,11 @@ static inline int
> >> rte_eth_tx_descriptor_status(uint16_t port_id,
> >>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
> >> queue_id);
> >>>  		return 0;
> >>>  	}
> >>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
> >> hairpin queue\n",
> >>> +			       queue_id);
> >>> +		return 0;
> >>> +	}
> >>>  #endif
> >>
> >> Hi Ori,
> >>
> >> These are causing build error, thanks Jerin for catching, because they are
> >> internal and called by a public static inline API, so whoever calls
> >> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> >> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> >>
> >> as far as I can see there are two options:
> >> 1) Remove these checks
> >> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of
> internal
> >>
> >> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
> >> we
> >> should go with (2) else (1).
> >>
> >
> > I think we can skip the tests,
> > But it was Andrew request so we must get is response.
> > It was also his empathies that they should be internal.
> 
> It is important for me to keep rte_eth_dev_state internal and
> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> I'm OK to make the function experimental or keep it internal
> (no API/ABI stability requirements) but externally visible (in .map).
> 

Just to make sure I understand you mean just to add the is_rx_hairpin_queue to the map file right?


> >> [1]
> >> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
> >> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
> >> `rte_eth_dev_is_rx_hairpin_queue'
> >> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
> >> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
> >> `rte_eth_dev_is_rx_hairpin_queue'
> >> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function
> `txa_service_buffer_retry':
> >> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
> >> `rte_eth_dev_is_tx_hairpin_queue'
> >> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
> >> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
> >> `rte_eth_dev_is_tx_hairpin_queue'
> >> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference
> to
> >> `rte_eth_dev_is_tx_hairpin_queue'
> >> collect2: error: ld returned 1 exit status


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 11:49         ` Andrew Rybchenko
  2019-11-05 12:00           ` Ori Kam
@ 2019-11-05 12:05           ` Ferruh Yigit
  2019-11-05 12:12             ` Andrew Rybchenko
  1 sibling, 1 reply; 186+ messages in thread
From: Ferruh Yigit @ 2019-11-05 12:05 UTC (permalink / raw)
  To: Andrew Rybchenko, Ori Kam, John McNamara, Marko Kovacevic,
	Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
> On 11/5/19 2:36 PM, Ori Kam wrote:
>>
>>
>>> -----Original Message-----
>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>> Sent: Tuesday, November 5, 2019 1:25 PM
>>> To: Ori Kam <orika@mellanox.com>; John McNamara
>>> <john.mcnamara@intel.com>; Marko Kovacevic
>>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>>> Andrew Rybchenko <arybchenko@solarflare.com>
>>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
>>> Jacob <jerin.jacob@caviumnetworks.com>
>>> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
>>>
>>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>>> This commit introduce hairpin queue type.
>>>>
>>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>>> It is used to offload traffic coming from the wire and redirect it back
>>>> to the wire.
>>>>
>>>> There are 3 new functions:
>>>> - rte_eth_dev_hairpin_capability_get
>>>> - rte_eth_rx_hairpin_queue_setup
>>>> - rte_eth_tx_hairpin_queue_setup
>>>>
>>>> In order to use the queue, there is a need to create rte_flow
>>>> with queue / RSS action that targets one or more of the Rx queues.
>>>>
>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>
>>> <...>
>>>
>>>>  #include <rte_ethdev_core.h>
>>>>
>>>>  /**
>>>> + * @internal
>>>> + * Check if the selected Rx queue is hairpin queue.
>>>> + *
>>>> + * @param dev
>>>> + *  Pointer to the selected device.
>>>> + * @param queue_id
>>>> + *  The selected queue.
>>>> + *
>>>> + * @return
>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>> + */
>>>> +int
>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>> queue_id);
>>>> +
>>>> +/**
>>>> + * @internal
>>>> + * Check if the selected Tx queue is hairpin queue.
>>>> + *
>>>> + * @param dev
>>>> + *  Pointer to the selected device.
>>>> + * @param queue_id
>>>> + *  The selected queue.
>>>> + *
>>>> + * @return
>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>> + */
>>>> +int
>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>> queue_id);
>>>> +
>>>> +/**
>>>
>>> If these functions are internal why there are in 'rte_ethdev.h' ?
>>>
>>>>   *
>>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
>>>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>>> port_id,
>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>>> queue_id);
>>>>  		return 0;
>>>>  	}
>>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
>>> hairpin queue\n",
>>>> +			       queue_id);
>>>> +		return 0;
>>>> +	}
>>>>  #endif
>>>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>>  				     rx_pkts, nb_pkts);
>>>> @@ -4517,6 +4671,11 @@ static inline int
>>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>>> queue_id);
>>>>  		return 0;
>>>>  	}
>>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
>>> hairpin queue\n",
>>>> +			       queue_id);
>>>> +		return 0;
>>>> +	}
>>>>  #endif
>>>
>>> Hi Ori,
>>>
>>> These are causing build error, thanks Jerin for catching, because they are
>>> internal and called by a public static inline API, so whoever calls
>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>
>>> as far as I can see there are two options:
>>> 1) Remove these checks
>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>
>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>> we
>>> should go with (2) else (1).
>>>
>>
>> I think we can skip the tests,
>> But it was Andrew request so we must get is response.
>> It was also his empathies that they should be internal.
> 
> It is important for me to keep rte_eth_dev_state internal and
> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.

Are you saying you don't want to option to make
'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
'RTE_ETH_QUEUE_STATE_xxx' being public?

> I'm OK to make the function experimental or keep it internal
> (no API/ABI stability requirements) but externally visible (in .map).

I think we can't do this, add a function deceleration to the public header file
and add it to the .map file but keep it internal. Instead we can make it a
proper API and it should be experimental at least first release.

The question above was do we need this API, or instead should remove the check
from rx/tx_burst APIs?

> 
>>> [1]
>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
>>> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
>>> `rte_eth_dev_is_rx_hairpin_queue'
>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
>>> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
>>> `rte_eth_dev_is_rx_hairpin_queue'
>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
>>> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
>>> `rte_eth_dev_is_tx_hairpin_queue'
>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
>>> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
>>> `rte_eth_dev_is_tx_hairpin_queue'
>>> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
>>> `rte_eth_dev_is_tx_hairpin_queue'
>>> collect2: error: ld returned 1 exit status
> 


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:05           ` Ferruh Yigit
@ 2019-11-05 12:12             ` Andrew Rybchenko
  2019-11-05 12:23               ` Ferruh Yigit
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 12:12 UTC (permalink / raw)
  To: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/19 3:05 PM, Ferruh Yigit wrote:
> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>> On 11/5/19 2:36 PM, Ori Kam wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Tuesday, November 5, 2019 1:25 PM
>>>> To: Ori Kam <orika@mellanox.com>; John McNamara
>>>> <john.mcnamara@intel.com>; Marko Kovacevic
>>>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>> Andrew Rybchenko <arybchenko@solarflare.com>
>>>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
>>>> Jacob <jerin.jacob@caviumnetworks.com>
>>>> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
>>>>
>>>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>>>> This commit introduce hairpin queue type.
>>>>>
>>>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>>>> It is used to offload traffic coming from the wire and redirect it back
>>>>> to the wire.
>>>>>
>>>>> There are 3 new functions:
>>>>> - rte_eth_dev_hairpin_capability_get
>>>>> - rte_eth_rx_hairpin_queue_setup
>>>>> - rte_eth_tx_hairpin_queue_setup
>>>>>
>>>>> In order to use the queue, there is a need to create rte_flow
>>>>> with queue / RSS action that targets one or more of the Rx queues.
>>>>>
>>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>
>>>> <...>
>>>>
>>>>>  #include <rte_ethdev_core.h>
>>>>>
>>>>>  /**
>>>>> + * @internal
>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>> + *
>>>>> + * @param dev
>>>>> + *  Pointer to the selected device.
>>>>> + * @param queue_id
>>>>> + *  The selected queue.
>>>>> + *
>>>>> + * @return
>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>> + */
>>>>> +int
>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>> queue_id);
>>>>> +
>>>>> +/**
>>>>> + * @internal
>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>> + *
>>>>> + * @param dev
>>>>> + *  Pointer to the selected device.
>>>>> + * @param queue_id
>>>>> + *  The selected queue.
>>>>> + *
>>>>> + * @return
>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>> + */
>>>>> +int
>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>> queue_id);
>>>>> +
>>>>> +/**
>>>>
>>>> If these functions are internal why there are in 'rte_ethdev.h' ?
>>>>
>>>>>   *
>>>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>>>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
>>>>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>>>> port_id,
>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>>>> queue_id);
>>>>>  		return 0;
>>>>>  	}
>>>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>>>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
>>>> hairpin queue\n",
>>>>> +			       queue_id);
>>>>> +		return 0;
>>>>> +	}
>>>>>  #endif
>>>>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>>>  				     rx_pkts, nb_pkts);
>>>>> @@ -4517,6 +4671,11 @@ static inline int
>>>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>>>> queue_id);
>>>>>  		return 0;
>>>>>  	}
>>>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>>>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
>>>> hairpin queue\n",
>>>>> +			       queue_id);
>>>>> +		return 0;
>>>>> +	}
>>>>>  #endif
>>>>
>>>> Hi Ori,
>>>>
>>>> These are causing build error, thanks Jerin for catching, because they are
>>>> internal and called by a public static inline API, so whoever calls
>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>
>>>> as far as I can see there are two options:
>>>> 1) Remove these checks
>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>>
>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>>> we
>>>> should go with (2) else (1).
>>>>
>>>
>>> I think we can skip the tests,
>>> But it was Andrew request so we must get is response.
>>> It was also his empathies that they should be internal.
>>
>> It is important for me to keep rte_eth_dev_state internal and
>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> 
> Are you saying you don't want to option to make
> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
> 'RTE_ETH_QUEUE_STATE_xxx' being public?

Yes.

>> I'm OK to make the function experimental or keep it internal
>> (no API/ABI stability requirements) but externally visible (in .map).
> 
> I think we can't do this, add a function deceleration to the public header file
> and add it to the .map file but keep it internal. Instead we can make it a
> proper API and it should be experimental at least first release.

We have discussed similar thing with Olivier recently [1].

[1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/

> The question above was do we need this API, or instead should remove the check
> from rx/tx_burst APIs?

I think these checks are useful to ensure that these functions
are not used for hairpin queues. At least to catch it with debug
enabled.

>>>> [1]
>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
>>>> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
>>>> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
>>>> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
>>>> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>> collect2: error: ld returned 1 exit status
>>


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:12             ` Andrew Rybchenko
@ 2019-11-05 12:23               ` Ferruh Yigit
  2019-11-05 12:27                 ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Ferruh Yigit @ 2019-11-05 12:23 UTC (permalink / raw)
  To: Andrew Rybchenko, Ori Kam, John McNamara, Marko Kovacevic,
	Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>>> On 11/5/19 2:36 PM, Ori Kam wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>> Sent: Tuesday, November 5, 2019 1:25 PM
>>>>> To: Ori Kam <orika@mellanox.com>; John McNamara
>>>>> <john.mcnamara@intel.com>; Marko Kovacevic
>>>>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>>> Andrew Rybchenko <arybchenko@solarflare.com>
>>>>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
>>>>> Jacob <jerin.jacob@caviumnetworks.com>
>>>>> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
>>>>>
>>>>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>>>>> This commit introduce hairpin queue type.
>>>>>>
>>>>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>>>>> It is used to offload traffic coming from the wire and redirect it back
>>>>>> to the wire.
>>>>>>
>>>>>> There are 3 new functions:
>>>>>> - rte_eth_dev_hairpin_capability_get
>>>>>> - rte_eth_rx_hairpin_queue_setup
>>>>>> - rte_eth_tx_hairpin_queue_setup
>>>>>>
>>>>>> In order to use the queue, there is a need to create rte_flow
>>>>>> with queue / RSS action that targets one or more of the Rx queues.
>>>>>>
>>>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>>>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>>
>>>>> <...>
>>>>>
>>>>>>  #include <rte_ethdev_core.h>
>>>>>>
>>>>>>  /**
>>>>>> + * @internal
>>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>>> + *
>>>>>> + * @param dev
>>>>>> + *  Pointer to the selected device.
>>>>>> + * @param queue_id
>>>>>> + *  The selected queue.
>>>>>> + *
>>>>>> + * @return
>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>> + */
>>>>>> +int
>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>> queue_id);
>>>>>> +
>>>>>> +/**
>>>>>> + * @internal
>>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>>> + *
>>>>>> + * @param dev
>>>>>> + *  Pointer to the selected device.
>>>>>> + * @param queue_id
>>>>>> + *  The selected queue.
>>>>>> + *
>>>>>> + * @return
>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>> + */
>>>>>> +int
>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>> queue_id);
>>>>>> +
>>>>>> +/**
>>>>>
>>>>> If these functions are internal why there are in 'rte_ethdev.h' ?
>>>>>
>>>>>>   *
>>>>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>>>>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
>>>>>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>>>>> port_id,
>>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>>>>> queue_id);
>>>>>>  		return 0;
>>>>>>  	}
>>>>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>>>>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
>>>>> hairpin queue\n",
>>>>>> +			       queue_id);
>>>>>> +		return 0;
>>>>>> +	}
>>>>>>  #endif
>>>>>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>>>>  				     rx_pkts, nb_pkts);
>>>>>> @@ -4517,6 +4671,11 @@ static inline int
>>>>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>>>>> queue_id);
>>>>>>  		return 0;
>>>>>>  	}
>>>>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>>>>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
>>>>> hairpin queue\n",
>>>>>> +			       queue_id);
>>>>>> +		return 0;
>>>>>> +	}
>>>>>>  #endif
>>>>>
>>>>> Hi Ori,
>>>>>
>>>>> These are causing build error, thanks Jerin for catching, because they are
>>>>> internal and called by a public static inline API, so whoever calls
>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>>
>>>>> as far as I can see there are two options:
>>>>> 1) Remove these checks
>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>>>
>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>>>> we
>>>>> should go with (2) else (1).
>>>>>
>>>>
>>>> I think we can skip the tests,
>>>> But it was Andrew request so we must get is response.
>>>> It was also his empathies that they should be internal.
>>>
>>> It is important for me to keep rte_eth_dev_state internal and
>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
>>
>> Are you saying you don't want to option to make
>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
> 
> Yes.

+1

> 
>>> I'm OK to make the function experimental or keep it internal
>>> (no API/ABI stability requirements) but externally visible (in .map).
>>
>> I think we can't do this, add a function deceleration to the public header file
>> and add it to the .map file but keep it internal. Instead we can make it a
>> proper API and it should be experimental at least first release.
> 
> We have discussed similar thing with Olivier recently [1].
> 
> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/

Yes we can say they are internal but there won't be anything preventing
applications to use them.

> 
>> The question above was do we need this API, or instead should remove the check
>> from rx/tx_burst APIs?
> 
> I think these checks are useful to ensure that these functions
> are not used for hairpin queues. At least to catch it with debug
> enabled.

OK, if so what not make them proper API? Any concern on it?

> 
>>>>> [1]
>>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
>>>>> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
>>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
>>>>> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
>>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
>>>>> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
>>>>> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>> collect2: error: ld returned 1 exit status
>>>
> 


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:23               ` Ferruh Yigit
@ 2019-11-05 12:27                 ` Andrew Rybchenko
  2019-11-05 12:51                   ` Thomas Monjalon
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 12:27 UTC (permalink / raw)
  To: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, Thomas Monjalon
  Cc: dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/19 3:23 PM, Ferruh Yigit wrote:
> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>> Sent: Tuesday, November 5, 2019 1:25 PM
>>>>>> To: Ori Kam <orika@mellanox.com>; John McNamara
>>>>>> <john.mcnamara@intel.com>; Marko Kovacevic
>>>>>> <marko.kovacevic@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
>>>>>> Andrew Rybchenko <arybchenko@solarflare.com>
>>>>>> Cc: dev@dpdk.org; jingjing.wu@intel.com; stephen@networkplumber.org; Jerin
>>>>>> Jacob <jerin.jacob@caviumnetworks.com>
>>>>>> Subject: Re: [PATCH v7 02/14] ethdev: add support for hairpin queue
>>>>>>
>>>>>> On 10/30/2019 11:53 PM, Ori Kam wrote:
>>>>>>> This commit introduce hairpin queue type.
>>>>>>>
>>>>>>> The hairpin queue in build from Rx queue binded to Tx queue.
>>>>>>> It is used to offload traffic coming from the wire and redirect it back
>>>>>>> to the wire.
>>>>>>>
>>>>>>> There are 3 new functions:
>>>>>>> - rte_eth_dev_hairpin_capability_get
>>>>>>> - rte_eth_rx_hairpin_queue_setup
>>>>>>> - rte_eth_tx_hairpin_queue_setup
>>>>>>>
>>>>>>> In order to use the queue, there is a need to create rte_flow
>>>>>>> with queue / RSS action that targets one or more of the Rx queues.
>>>>>>>
>>>>>>> Signed-off-by: Ori Kam <orika@mellanox.com>
>>>>>>> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>>> <...>
>>>>>>
>>>>>>>  #include <rte_ethdev_core.h>
>>>>>>>
>>>>>>>  /**
>>>>>>> + * @internal
>>>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>>>> + *
>>>>>>> + * @param dev
>>>>>>> + *  Pointer to the selected device.
>>>>>>> + * @param queue_id
>>>>>>> + *  The selected queue.
>>>>>>> + *
>>>>>>> + * @return
>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>> queue_id);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * @internal
>>>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>>>> + *
>>>>>>> + * @param dev
>>>>>>> + *  Pointer to the selected device.
>>>>>>> + * @param queue_id
>>>>>>> + *  The selected queue.
>>>>>>> + *
>>>>>>> + * @return
>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>> + */
>>>>>>> +int
>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>> queue_id);
>>>>>>> +
>>>>>>> +/**
>>>>>> If these functions are internal why there are in 'rte_ethdev.h' ?
>>>>>>
>>>>>>>   *
>>>>>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>>>>>   * device. The retrieved packets are stored in *rte_mbuf* structures whose
>>>>>>> @@ -4251,6 +4400,11 @@ int rte_eth_dev_adjust_nb_rx_tx_desc(uint16_t
>>>>>> port_id,
>>>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
>>>>>> queue_id);
>>>>>>>  		return 0;
>>>>>>>  	}
>>>>>>> +	if (rte_eth_dev_is_rx_hairpin_queue(dev, queue_id)) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR, "Rx burst failed, queue_id=%u is
>>>>>> hairpin queue\n",
>>>>>>> +			       queue_id);
>>>>>>> +		return 0;
>>>>>>> +	}
>>>>>>>  #endif
>>>>>>>  	nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>>>>>>>  				     rx_pkts, nb_pkts);
>>>>>>> @@ -4517,6 +4671,11 @@ static inline int
>>>>>> rte_eth_tx_descriptor_status(uint16_t port_id,
>>>>>>>  		RTE_ETHDEV_LOG(ERR, "Invalid TX queue_id=%u\n",
>>>>>> queue_id);
>>>>>>>  		return 0;
>>>>>>>  	}
>>>>>>> +	if (rte_eth_dev_is_tx_hairpin_queue(dev, queue_id)) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR, "Tx burst failed, queue_id=%u is
>>>>>> hairpin queue\n",
>>>>>>> +			       queue_id);
>>>>>>> +		return 0;
>>>>>>> +	}
>>>>>>>  #endif
>>>>>> Hi Ori,
>>>>>>
>>>>>> These are causing build error, thanks Jerin for catching, because they are
>>>>>> internal and called by a public static inline API, so whoever calls
>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>>>
>>>>>> as far as I can see there are two options:
>>>>>> 1) Remove these checks
>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>>>>
>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>>>>> we
>>>>>> should go with (2) else (1).
>>>>>>
>>>>> I think we can skip the tests,
>>>>> But it was Andrew request so we must get is response.
>>>>> It was also his empathies that they should be internal.
>>>> It is important for me to keep rte_eth_dev_state internal and
>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
>>> Are you saying you don't want to option to make
>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
>> Yes.
> +1
>
>>>> I'm OK to make the function experimental or keep it internal
>>>> (no API/ABI stability requirements) but externally visible (in .map).
>>> I think we can't do this, add a function deceleration to the public header file
>>> and add it to the .map file but keep it internal. Instead we can make it a
>>> proper API and it should be experimental at least first release.
>> We have discussed similar thing with Olivier recently [1].
>>
>> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/
> Yes we can say they are internal but there won't be anything preventing
> applications to use them.

That's true, but making it internal says - don't use it.
Anyway, I have no strong opinion on experimental vs internal.

>>> The question above was do we need this API, or instead should remove the check
>>> from rx/tx_burst APIs?
>> I think these checks are useful to ensure that these functions
>> are not used for hairpin queues. At least to catch it with debug
>> enabled.
> OK, if so what not make them proper API? Any concern on it?
>
>>>>>> [1]
>>>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_eth_rx':
>>>>>> rte_event_eth_rx_adapter.c:(.text+0x1728): undefined reference to
>>>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>>>> /usr/bin/ld: rte_event_eth_rx_adapter.o: in function `rxa_service_func':
>>>>>> rte_event_eth_rx_adapter.c:(.text+0x22ab): undefined reference to
>>>>>> `rte_eth_dev_is_rx_hairpin_queue'
>>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_buffer_retry':
>>>>>> rte_event_eth_tx_adapter.c:(.text+0xa43): undefined reference to
>>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.o: in function `txa_service_func':
>>>>>> rte_event_eth_tx_adapter.c:(.text+0xe7d): undefined reference to
>>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>>> /usr/bin/ld: rte_event_eth_tx_adapter.c:(.text+0x1155): undefined reference to
>>>>>> `rte_eth_dev_is_tx_hairpin_queue'
>>>>>> collect2: error: ld returned 1 exit status


^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:27                 ` Andrew Rybchenko
@ 2019-11-05 12:51                   ` Thomas Monjalon
  2019-11-05 12:53                     ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Thomas Monjalon @ 2019-11-05 12:51 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, dev,
	jingjing.wu, stephen, Jerin Jacob

05/11/2019 13:27, Andrew Rybchenko:
> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
> > On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
> >> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
> >>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
> >>>> On 11/5/19 2:36 PM, Ori Kam wrote:
 >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>>>>>  /**
> >>>>>>> + * @internal
> >>>>>>> + * Check if the selected Rx queue is hairpin queue.
> >>>>>>> + *
> >>>>>>> + * @param dev
> >>>>>>> + *  Pointer to the selected device.
> >>>>>>> + * @param queue_id
> >>>>>>> + *  The selected queue.
> >>>>>>> + *
> >>>>>>> + * @return
> >>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>>>>>> + */
> >>>>>>> +int
> >>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >>>>>> queue_id);
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * @internal
> >>>>>>> + * Check if the selected Tx queue is hairpin queue.
> >>>>>>> + *
> >>>>>>> + * @param dev
> >>>>>>> + *  Pointer to the selected device.
> >>>>>>> + * @param queue_id
> >>>>>>> + *  The selected queue.
> >>>>>>> + *
> >>>>>>> + * @return
> >>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>>>>>> + */
> >>>>>>> +int
> >>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >>>>>> queue_id);
[...]
> >>>>>> These are causing build error, thanks Jerin for catching, because they are
> >>>>>> internal and called by a public static inline API, so whoever calls
> >>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> >>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> >>>>>>
> >>>>>> as far as I can see there are two options:
> >>>>>> 1) Remove these checks
> >>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
> >>>>>>
> >>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
> >>>>>> we
> >>>>>> should go with (2) else (1).
> >>>>>>
> >>>>> I think we can skip the tests,
> >>>>> But it was Andrew request so we must get is response.
> >>>>> It was also his empathies that they should be internal.
> >>>> It is important for me to keep rte_eth_dev_state internal and
> >>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> >>> Are you saying you don't want to option to make
> >>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
> >>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
> >> Yes.
> > +1
> >
> >>>> I'm OK to make the function experimental or keep it internal
> >>>> (no API/ABI stability requirements) but externally visible (in .map).
> >>> I think we can't do this, add a function deceleration to the public header file
> >>> and add it to the .map file but keep it internal. Instead we can make it a
> >>> proper API and it should be experimental at least first release.
> >> We have discussed similar thing with Olivier recently [1].
> >>
> >> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/
> > Yes we can say they are internal but there won't be anything preventing
> > applications to use them.
> 
> That's true, but making it internal says - don't use it.
> Anyway, I have no strong opinion on experimental vs internal.
> 
> >>> The question above was do we need this API, or instead should remove the check
> >>> from rx/tx_burst APIs?
> >> I think these checks are useful to ensure that these functions
> >> are not used for hairpin queues. At least to catch it with debug
> >> enabled.
> > OK, if so what not make them proper API? Any concern on it?

Why we should not use this API in applications?



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:51                   ` Thomas Monjalon
@ 2019-11-05 12:53                     ` Andrew Rybchenko
  2019-11-05 13:02                       ` Thomas Monjalon
  0 siblings, 1 reply; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 12:53 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, dev,
	jingjing.wu, stephen, Jerin Jacob

On 11/5/19 3:51 PM, Thomas Monjalon wrote:
> 05/11/2019 13:27, Andrew Rybchenko:
>> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
>>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
>>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
>>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
>  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>>>>>  /**
>>>>>>>>> + * @internal
>>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>>>>>> + *
>>>>>>>>> + * @param dev
>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>> + * @param queue_id
>>>>>>>>> + *  The selected queue.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>>>> queue_id);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @internal
>>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>>>>>> + *
>>>>>>>>> + * @param dev
>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>> + * @param queue_id
>>>>>>>>> + *  The selected queue.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>> + */
>>>>>>>>> +int
>>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>>>> queue_id);
> [...]
>>>>>>>> These are causing build error, thanks Jerin for catching, because they are
>>>>>>>> internal and called by a public static inline API, so whoever calls
>>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>>>>>
>>>>>>>> as far as I can see there are two options:
>>>>>>>> 1) Remove these checks
>>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>>>>>>
>>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>>>>>>> we
>>>>>>>> should go with (2) else (1).
>>>>>>>>
>>>>>>> I think we can skip the tests,
>>>>>>> But it was Andrew request so we must get is response.
>>>>>>> It was also his empathies that they should be internal.
>>>>>> It is important for me to keep rte_eth_dev_state internal and
>>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
>>>>> Are you saying you don't want to option to make
>>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
>>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
>>>> Yes.
>>> +1
>>>
>>>>>> I'm OK to make the function experimental or keep it internal
>>>>>> (no API/ABI stability requirements) but externally visible (in .map).
>>>>> I think we can't do this, add a function deceleration to the public header file
>>>>> and add it to the .map file but keep it internal. Instead we can make it a
>>>>> proper API and it should be experimental at least first release.
>>>> We have discussed similar thing with Olivier recently [1].
>>>>
>>>> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/
>>> Yes we can say they are internal but there won't be anything preventing
>>> applications to use them.
>>
>> That's true, but making it internal says - don't use it.
>> Anyway, I have no strong opinion on experimental vs internal.
>>
>>>>> The question above was do we need this API, or instead should remove the check
>>>>> from rx/tx_burst APIs?
>>>> I think these checks are useful to ensure that these functions
>>>> are not used for hairpin queues. At least to catch it with debug
>>>> enabled.
>>> OK, if so what not make them proper API? Any concern on it?
> 
> Why we should not use this API in applications?

I think the valid question is why application needs the API.
Basically I don't mind, just want to be sure that only required
API is exposed.

^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 12:53                     ` Andrew Rybchenko
@ 2019-11-05 13:02                       ` Thomas Monjalon
  2019-11-05 13:23                         ` Ori Kam
  2019-11-05 13:41                         ` Andrew Rybchenko
  0 siblings, 2 replies; 186+ messages in thread
From: Thomas Monjalon @ 2019-11-05 13:02 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, dev,
	jingjing.wu, stephen, Jerin Jacob

05/11/2019 13:53, Andrew Rybchenko:
> On 11/5/19 3:51 PM, Thomas Monjalon wrote:
> > 05/11/2019 13:27, Andrew Rybchenko:
> >> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
> >>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
> >>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
> >>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
> >>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
> >  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>>>>>>>  /**
> >>>>>>>>> + * @internal
> >>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
> >>>>>>>>> + *
> >>>>>>>>> + * @param dev
> >>>>>>>>> + *  Pointer to the selected device.
> >>>>>>>>> + * @param queue_id
> >>>>>>>>> + *  The selected queue.
> >>>>>>>>> + *
> >>>>>>>>> + * @return
> >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>>>>>>>> + */
> >>>>>>>>> +int
> >>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >>>>>>>> queue_id);
> >>>>>>>>> +
> >>>>>>>>> +/**
> >>>>>>>>> + * @internal
> >>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
> >>>>>>>>> + *
> >>>>>>>>> + * @param dev
> >>>>>>>>> + *  Pointer to the selected device.
> >>>>>>>>> + * @param queue_id
> >>>>>>>>> + *  The selected queue.
> >>>>>>>>> + *
> >>>>>>>>> + * @return
> >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> >>>>>>>>> + */
> >>>>>>>>> +int
> >>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
> >>>>>>>> queue_id);
> > [...]
> >>>>>>>> These are causing build error, thanks Jerin for catching, because they are
> >>>>>>>> internal and called by a public static inline API, so whoever calls
> >>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> >>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> >>>>>>>>
> >>>>>>>> as far as I can see there are two options:
> >>>>>>>> 1) Remove these checks
> >>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
> >>>>>>>>
> >>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
> >>>>>>>> we
> >>>>>>>> should go with (2) else (1).
> >>>>>>>>
> >>>>>>> I think we can skip the tests,
> >>>>>>> But it was Andrew request so we must get is response.
> >>>>>>> It was also his empathies that they should be internal.
> >>>>>> It is important for me to keep rte_eth_dev_state internal and
> >>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> >>>>> Are you saying you don't want to option to make
> >>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
> >>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
> >>>> Yes.
> >>> +1
> >>>
> >>>>>> I'm OK to make the function experimental or keep it internal
> >>>>>> (no API/ABI stability requirements) but externally visible (in .map).
> >>>>> I think we can't do this, add a function deceleration to the public header file
> >>>>> and add it to the .map file but keep it internal. Instead we can make it a
> >>>>> proper API and it should be experimental at least first release.
> >>>> We have discussed similar thing with Olivier recently [1].
> >>>>
> >>>> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/
> >>> Yes we can say they are internal but there won't be anything preventing
> >>> applications to use them.
> >>
> >> That's true, but making it internal says - don't use it.
> >> Anyway, I have no strong opinion on experimental vs internal.
> >>
> >>>>> The question above was do we need this API, or instead should remove the check
> >>>>> from rx/tx_burst APIs?
> >>>> I think these checks are useful to ensure that these functions
> >>>> are not used for hairpin queues. At least to catch it with debug
> >>>> enabled.
> >>> OK, if so what not make them proper API? Any concern on it?
> > 
> > Why we should not use this API in applications?
> 
> I think the valid question is why application needs the API.
> Basically I don't mind, just want to be sure that only required
> API is exposed.

Because hairpin queues are not standard queues,
we may need to distinguish them.
I see it as a good helper for applications.
Am I missing something obvious?




^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 13:02                       ` Thomas Monjalon
@ 2019-11-05 13:23                         ` Ori Kam
  2019-11-05 13:27                           ` Thomas Monjalon
  2019-11-05 13:41                         ` Andrew Rybchenko
  1 sibling, 1 reply; 186+ messages in thread
From: Ori Kam @ 2019-11-05 13:23 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, jingjing.wu,
	stephen, Jerin Jacob

Hi All



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, November 5, 2019 3:02 PM
> To: Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Ori Kam <orika@mellanox.com>;
> John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; dev@dpdk.org; jingjing.wu@intel.com;
> stephen@networkplumber.org; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Subject: Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
> 
> 05/11/2019 13:53, Andrew Rybchenko:
> > On 11/5/19 3:51 PM, Thomas Monjalon wrote:
> > > 05/11/2019 13:27, Andrew Rybchenko:
> > >> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
> > >>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
> > >>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
> > >>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
> > >>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
> > >  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> > >>>>>>>>>  /**
> > >>>>>>>>> + * @internal
> > >>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
> > >>>>>>>>> + *
> > >>>>>>>>> + * @param dev
> > >>>>>>>>> + *  Pointer to the selected device.
> > >>>>>>>>> + * @param queue_id
> > >>>>>>>>> + *  The selected queue.
> > >>>>>>>>> + *
> > >>>>>>>>> + * @return
> > >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > >>>>>>>>> + */
> > >>>>>>>>> +int
> > >>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev,
> uint16_t
> > >>>>>>>> queue_id);
> > >>>>>>>>> +
> > >>>>>>>>> +/**
> > >>>>>>>>> + * @internal
> > >>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
> > >>>>>>>>> + *
> > >>>>>>>>> + * @param dev
> > >>>>>>>>> + *  Pointer to the selected device.
> > >>>>>>>>> + * @param queue_id
> > >>>>>>>>> + *  The selected queue.
> > >>>>>>>>> + *
> > >>>>>>>>> + * @return
> > >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > >>>>>>>>> + */
> > >>>>>>>>> +int
> > >>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev,
> uint16_t
> > >>>>>>>> queue_id);
> > > [...]
> > >>>>>>>> These are causing build error, thanks Jerin for catching, because
> they are
> > >>>>>>>> internal and called by a public static inline API, so whoever calls
> > >>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> > >>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> > >>>>>>>>
> > >>>>>>>> as far as I can see there are two options:
> > >>>>>>>> 1) Remove these checks
> > >>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead
> of internal
> > >>>>>>>>
> > >>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()'
> public API
> > >>>>>>>> we
> > >>>>>>>> should go with (2) else (1).
> > >>>>>>>>
> > >>>>>>> I think we can skip the tests,
> > >>>>>>> But it was Andrew request so we must get is response.
> > >>>>>>> It was also his empathies that they should be internal.
> > >>>>>> It is important for me to keep rte_eth_dev_state internal and
> > >>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> > >>>>> Are you saying you don't want to option to make
> > >>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force
> the
> > >>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
> > >>>> Yes.
> > >>> +1
> > >>>
> > >>>>>> I'm OK to make the function experimental or keep it internal
> > >>>>>> (no API/ABI stability requirements) but externally visible (in .map).
> > >>>>> I think we can't do this, add a function deceleration to the public
> header file
> > >>>>> and add it to the .map file but keep it internal. Instead we can make it
> a
> > >>>>> proper API and it should be experimental at least first release.
> > >>>> We have discussed similar thing with Olivier recently [1].
> > >>>>
> > >>>> [1]
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Finbox.dpdk
> .org%2Fdev%2F20191030142938.bpi4txlrebqfq7uw%40platinum%2F&amp;data
> =02%7C01%7Corika%40mellanox.com%7Cb065a7e085aa47723cc408d761f06c9
> f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1%7C6370855575589250
> 92&amp;sdata=6kY30p%2BEr4DMQiMqbBPX%2BJZ7h0eMp0FxnzhnE%2F7U%2
> BeM%3D&amp;reserved=0
> > >>> Yes we can say they are internal but there won't be anything preventing
> > >>> applications to use them.
> > >>
> > >> That's true, but making it internal says - don't use it.
> > >> Anyway, I have no strong opinion on experimental vs internal.
> > >>
> > >>>>> The question above was do we need this API, or instead should remove
> the check
> > >>>>> from rx/tx_burst APIs?
> > >>>> I think these checks are useful to ensure that these functions
> > >>>> are not used for hairpin queues. At least to catch it with debug
> > >>>> enabled.
> > >>> OK, if so what not make them proper API? Any concern on it?
> > >
> > > Why we should not use this API in applications?
> >
> > I think the valid question is why application needs the API.
> > Basically I don't mind, just want to be sure that only required
> > API is exposed.
> 
> Because hairpin queues are not standard queues,
> we may need to distinguish them.
> I see it as a good helper for applications.
> Am I missing something obvious?
> 
> 

Moving the API to experimental results in the following error:
error: 'rte_eth_dev_is_rx_hairpin_queue' is deprecated (declared at /.autodirect/mtrswgwork/orika/pegasus04_share/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:4276): Symbol is not yet part of stable ABI [-Werror=deprecated-declarations]

I suggest that we remove the checks, in any case this checks are only on debug mode, and when using data path the user must be very careful and what he
is doing.

Best,
Ori



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 13:23                         ` Ori Kam
@ 2019-11-05 13:27                           ` Thomas Monjalon
  2019-11-05 13:34                             ` Andrew Rybchenko
  0 siblings, 1 reply; 186+ messages in thread
From: Thomas Monjalon @ 2019-11-05 13:27 UTC (permalink / raw)
  To: Ori Kam, Andrew Rybchenko, Ferruh Yigit
  Cc: John McNamara, Marko Kovacevic, dev, jingjing.wu, stephen, Jerin Jacob

05/11/2019 14:23, Ori Kam:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 05/11/2019 13:53, Andrew Rybchenko:
> > > On 11/5/19 3:51 PM, Thomas Monjalon wrote:
> > > > 05/11/2019 13:27, Andrew Rybchenko:
> > > >> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
> > > >>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
> > > >>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
> > > >>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
> > > >>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
> > > >  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> > > >>>>>>>>>  /**
> > > >>>>>>>>> + * @internal
> > > >>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
> > > >>>>>>>>> + *
> > > >>>>>>>>> + * @param dev
> > > >>>>>>>>> + *  Pointer to the selected device.
> > > >>>>>>>>> + * @param queue_id
> > > >>>>>>>>> + *  The selected queue.
> > > >>>>>>>>> + *
> > > >>>>>>>>> + * @return
> > > >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > > >>>>>>>>> + */
> > > >>>>>>>>> +int
> > > >>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev,
> > uint16_t
> > > >>>>>>>> queue_id);
> > > >>>>>>>>> +
> > > >>>>>>>>> +/**
> > > >>>>>>>>> + * @internal
> > > >>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
> > > >>>>>>>>> + *
> > > >>>>>>>>> + * @param dev
> > > >>>>>>>>> + *  Pointer to the selected device.
> > > >>>>>>>>> + * @param queue_id
> > > >>>>>>>>> + *  The selected queue.
> > > >>>>>>>>> + *
> > > >>>>>>>>> + * @return
> > > >>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
> > > >>>>>>>>> + */
> > > >>>>>>>>> +int
> > > >>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev,
> > uint16_t
> > > >>>>>>>> queue_id);
> > > > [...]
> > > >>>>>>>> These are causing build error, thanks Jerin for catching, because
> > they are
> > > >>>>>>>> internal and called by a public static inline API, so whoever calls
> > > >>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
> > > >>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
> > > >>>>>>>>
> > > >>>>>>>> as far as I can see there are two options:
> > > >>>>>>>> 1) Remove these checks
> > > >>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead
> > of internal
> > > >>>>>>>>
> > > >>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()'
> > public API
> > > >>>>>>>> we
> > > >>>>>>>> should go with (2) else (1).
> > > >>>>>>>>
> > > >>>>>>> I think we can skip the tests,
> > > >>>>>>> But it was Andrew request so we must get is response.
> > > >>>>>>> It was also his empathies that they should be internal.
> > > >>>>>> It is important for me to keep rte_eth_dev_state internal and
> > > >>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
> > > >>>>> Are you saying you don't want to option to make
> > > >>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force
> > the
> > > >>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
> > > >>>> Yes.
> > > >>> +1
> > > >>>
> > > >>>>>> I'm OK to make the function experimental or keep it internal
> > > >>>>>> (no API/ABI stability requirements) but externally visible (in .map).
> > > >>>>> I think we can't do this, add a function deceleration to the public
> > header file
> > > >>>>> and add it to the .map file but keep it internal. Instead we can make it
> > a
> > > >>>>> proper API and it should be experimental at least first release.

Using an experimental function in an inline API may propagate the
experimental flag further.

> Moving the API to experimental results in the following error:
> error: 'rte_eth_dev_is_rx_hairpin_queue' is deprecated (declared at /.autodirect/mtrswgwork/orika/pegasus04_share/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:4276): Symbol is not yet part of stable ABI [-Werror=deprecated-declarations]
> 
> I suggest that we remove the checks, in any case this checks are only on debug mode, and when using data path the user must be very careful and what he
> is doing.

I agree it seems better to remove these checks for now.
We can decide to re-introduce them later as non-experimental.



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 13:27                           ` Thomas Monjalon
@ 2019-11-05 13:34                             ` Andrew Rybchenko
  0 siblings, 0 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 13:34 UTC (permalink / raw)
  To: Thomas Monjalon, Ori Kam, Ferruh Yigit
  Cc: John McNamara, Marko Kovacevic, dev, jingjing.wu, stephen, Jerin Jacob

On 11/5/19 4:27 PM, Thomas Monjalon wrote:
> 05/11/2019 14:23, Ori Kam:
>> From: Thomas Monjalon <thomas@monjalon.net>
>>> 05/11/2019 13:53, Andrew Rybchenko:
>>>> On 11/5/19 3:51 PM, Thomas Monjalon wrote:
>>>>> 05/11/2019 13:27, Andrew Rybchenko:
>>>>>> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
>>>>>>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
>>>>>>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
>>>>>>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>>>>>>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
>>>>>  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>>>>>>>>>  /**
>>>>>>>>>>>>> + * @internal
>>>>>>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * @param dev
>>>>>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>>>>>> + * @param queue_id
>>>>>>>>>>>>> + *  The selected queue.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * @return
>>>>>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev,
>>> uint16_t
>>>>>>>>>>>> queue_id);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * @internal
>>>>>>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * @param dev
>>>>>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>>>>>> + * @param queue_id
>>>>>>>>>>>>> + *  The selected queue.
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * @return
>>>>>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev,
>>> uint16_t
>>>>>>>>>>>> queue_id);
>>>>> [...]
>>>>>>>>>>>> These are causing build error, thanks Jerin for catching, because
>>> they are
>>>>>>>>>>>> internal and called by a public static inline API, so whoever calls
>>>>>>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>>>>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>>>>>>>>>
>>>>>>>>>>>> as far as I can see there are two options:
>>>>>>>>>>>> 1) Remove these checks
>>>>>>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead
>>> of internal
>>>>>>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()'
>>> public API
>>>>>>>>>>>> we
>>>>>>>>>>>> should go with (2) else (1).
>>>>>>>>>>>>
>>>>>>>>>>> I think we can skip the tests,
>>>>>>>>>>> But it was Andrew request so we must get is response.
>>>>>>>>>>> It was also his empathies that they should be internal.
>>>>>>>>>> It is important for me to keep rte_eth_dev_state internal and
>>>>>>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
>>>>>>>>> Are you saying you don't want to option to make
>>>>>>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force
>>> the
>>>>>>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
>>>>>>>> Yes.
>>>>>>> +1
>>>>>>>
>>>>>>>>>> I'm OK to make the function experimental or keep it internal
>>>>>>>>>> (no API/ABI stability requirements) but externally visible (in .map).
>>>>>>>>> I think we can't do this, add a function deceleration to the public
>>> header file
>>>>>>>>> and add it to the .map file but keep it internal. Instead we can make it
>>> a
>>>>>>>>> proper API and it should be experimental at least first release.
> Using an experimental function in an inline API may propagate the
> experimental flag further.
>
>> Moving the API to experimental results in the following error:
>> error: 'rte_eth_dev_is_rx_hairpin_queue' is deprecated (declared at /.autodirect/mtrswgwork/orika/pegasus04_share/dpdk.org/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:4276): Symbol is not yet part of stable ABI [-Werror=deprecated-declarations]
>>
>> I suggest that we remove the checks, in any case this checks are only on debug mode, and when using data path the user must be very careful and what he
>> is doing.
> I agree it seems better to remove these checks for now.
> We can decide to re-introduce them later as non-experimental.

OK



^ permalink raw reply	[flat|nested] 186+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue
  2019-11-05 13:02                       ` Thomas Monjalon
  2019-11-05 13:23                         ` Ori Kam
@ 2019-11-05 13:41                         ` Andrew Rybchenko
  1 sibling, 0 replies; 186+ messages in thread
From: Andrew Rybchenko @ 2019-11-05 13:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, Ori Kam, John McNamara, Marko Kovacevic, dev,
	jingjing.wu, stephen, Jerin Jacob

On 11/5/19 4:02 PM, Thomas Monjalon wrote:
> 05/11/2019 13:53, Andrew Rybchenko:
>> On 11/5/19 3:51 PM, Thomas Monjalon wrote:
>>> 05/11/2019 13:27, Andrew Rybchenko:
>>>> On 11/5/19 3:23 PM, Ferruh Yigit wrote:
>>>>> On 11/5/2019 12:12 PM, Andrew Rybchenko wrote:
>>>>>> On 11/5/19 3:05 PM, Ferruh Yigit wrote:
>>>>>>> On 11/5/2019 11:49 AM, Andrew Rybchenko wrote:
>>>>>>>> On 11/5/19 2:36 PM, Ori Kam wrote:
>>>  >>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>>>>>>>  /**
>>>>>>>>>>> + * @internal
>>>>>>>>>>> + * Check if the selected Rx queue is hairpin queue.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @param dev
>>>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>>>> + * @param queue_id
>>>>>>>>>>> + *  The selected queue.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @return
>>>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +rte_eth_dev_is_rx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>>>>>> queue_id);
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * @internal
>>>>>>>>>>> + * Check if the selected Tx queue is hairpin queue.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @param dev
>>>>>>>>>>> + *  Pointer to the selected device.
>>>>>>>>>>> + * @param queue_id
>>>>>>>>>>> + *  The selected queue.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @return
>>>>>>>>>>> + *   - (1) if the queue is hairpin queue, 0 otherwise.
>>>>>>>>>>> + */
>>>>>>>>>>> +int
>>>>>>>>>>> +rte_eth_dev_is_tx_hairpin_queue(struct rte_eth_dev *dev, uint16_t
>>>>>>>>>> queue_id);
>>> [...]
>>>>>>>>>> These are causing build error, thanks Jerin for catching, because they are
>>>>>>>>>> internal and called by a public static inline API, so whoever calls
>>>>>>>>>> 'rte_eth_rx/tx_burst()' APIs in the shared build, can't find
>>>>>>>>>> 'rte_eth_dev_is_rx/tx_hairpin_queue()' functions [1],
>>>>>>>>>>
>>>>>>>>>> as far as I can see there are two options:
>>>>>>>>>> 1) Remove these checks
>>>>>>>>>> 2) Make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API instead of internal
>>>>>>>>>>
>>>>>>>>>> If there is a value to make 'rte_eth_dev_is_rx/tx_hairpin_queue()' public API
>>>>>>>>>> we
>>>>>>>>>> should go with (2) else (1).
>>>>>>>>>>
>>>>>>>>> I think we can skip the tests,
>>>>>>>>> But it was Andrew request so we must get is response.
>>>>>>>>> It was also his empathies that they should be internal.
>>>>>>>> It is important for me to keep rte_eth_dev_state internal and
>>>>>>>> few patches ago rte_eth_dev_is_rx_hairpin_queue() was inline.
>>>>>>> Are you saying you don't want to option to make
>>>>>>> 'rte_eth_dev_is_rx_hairpin_queue()' static inline because it will force the
>>>>>>> 'RTE_ETH_QUEUE_STATE_xxx' being public?
>>>>>> Yes.
>>>>> +1
>>>>>
>>>>>>>> I'm OK to make the function experimental or keep it internal
>>>>>>>> (no API/ABI stability requirements) but externally visible (in .map).
>>>>>>> I think we can't do this, add a function deceleration to the public header file
>>>>>>> and add it to the .map file but keep it internal. Instead we can make it a
>>>>>>> proper API and it should be experimental at least first release.
>>>>>> We have discussed similar thing with Olivier recently [1].
>>>>>>
>>>>>> [1] http://inbox.dpdk.org/dev/20191030142938.bpi4txlrebqfq7uw@platinum/
>>>>> Yes we can say they are internal but there won't be anything preventing
>>>>> applications to use them.
>>>> That's true, but making it internal says - don't use it.
>>>> Anyway, I have no strong opinion on experimental vs internal.
>>>>
>>>>>>> The question above was do we need this API, or instead should remove the check
>>>>>>> from rx/tx_burst APIs?
>>>>>> I think these checks are useful to ensure that these functions
>>>>>> are not used for hairpin queues. At least to catch it with debug
>>>>>> enabled.
>>>>> OK, if so what not make them proper API? Any concern on it?
>>> Why we should not use this API in applications?
>> I think the valid question is why application needs the API.
>> Basically I don't mind, just want to be sure that only required
>> API is exposed.
> Because hairpin queues are not standard queues,
> we may need to distinguish them.
> I see it as a good helper for applications.
> Am I missing something obvious?

I think no. I would prefer explicit reason, but since
these function are simple, I'm OK to go with
"may need to distinguish them"


^ permalink raw reply	[flat|nested] 186+ messages in thread

end of thread, other threads:[~2019-11-05 13:42 UTC | newest]

Thread overview: 186+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26  6:28 [dpdk-dev] [PATCH 00/13] add hairpin feature Ori Kam
2019-09-26  6:28 ` [dpdk-dev] [PATCH 01/13] ethdev: support setup function for hairpin queue Ori Kam
2019-09-26 12:18   ` Andrew Rybchenko
     [not found]     ` <AM0PR0502MB4019A2FEADE5F9DCD0D9DDFED2860@AM0PR0502MB4019.eurprd05.prod.outlook.com>
2019-09-26 15:58       ` Ori Kam
2019-09-26 17:24         ` Andrew Rybchenko
2019-09-28 15:19           ` Ori Kam
2019-09-29 12:10             ` Andrew Rybchenko
2019-10-02 12:19               ` Ori Kam
2019-10-03 13:26                 ` Andrew Rybchenko
2019-10-03 17:46                   ` Ori Kam
2019-10-03 18:39     ` Ray Kinsella
2019-09-26  6:28 ` [dpdk-dev] [PATCH 02/13] net/mlx5: query hca hairpin capabilities Ori Kam
2019-09-26  9:31   ` Slava Ovsiienko
2019-09-26  6:28 ` [dpdk-dev] [PATCH 03/13] net/mlx5: support Rx hairpin queues Ori Kam
2019-09-26  9:32   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 04/13] net/mlx5: prepare txq to work with different types Ori Kam
2019-09-26  9:32   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 05/13] net/mlx5: support Tx hairpin queues Ori Kam
2019-09-26  9:32   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 06/13] app/testpmd: add hairpin support Ori Kam
2019-09-26  9:32   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 07/13] net/mlx5: add hairpin binding function Ori Kam
2019-09-26  9:33   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 08/13] net/mlx5: add support for hairpin hrxq Ori Kam
2019-09-26  9:33   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 09/13] net/mlx5: add internal tag item and action Ori Kam
2019-09-26  9:33   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 10/13] net/mlx5: add id generation function Ori Kam
2019-09-26  9:34   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 11/13] net/mlx5: add default flows for hairpin Ori Kam
2019-09-26  9:34   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 12/13] net/mlx5: split hairpin flows Ori Kam
2019-09-26  9:34   ` Slava Ovsiienko
2019-09-26  6:29 ` [dpdk-dev] [PATCH 13/13] doc: add hairpin feature Ori Kam
2019-09-26  9:34   ` Slava Ovsiienko
2019-09-26 12:32 ` [dpdk-dev] [PATCH 00/13] " Andrew Rybchenko
2019-09-26 15:22   ` Ori Kam
2019-09-26 15:48     ` Andrew Rybchenko
2019-09-26 16:11       ` Ori Kam
2019-10-04 19:54 ` [dpdk-dev] [PATCH v2 00/14] " Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 01/14] ethdev: add support for hairpin queue Ori Kam
2019-10-08 16:11     ` Andrew Rybchenko
2019-10-10 21:07       ` Ori Kam
2019-10-14  9:37         ` Andrew Rybchenko
2019-10-14 10:19           ` Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 03/14] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 04/14] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 05/14] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 06/14] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 07/14] app/testpmd: add hairpin support Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 08/14] net/mlx5: add hairpin binding function Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 10/14] net/mlx5: add internal tag item and action Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 11/14] net/mlx5: add id generation function Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 12/14] net/mlx5: add default flows for hairpin Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 13/14] net/mlx5: split hairpin flows Ori Kam
2019-10-04 19:54   ` [dpdk-dev] [PATCH v2 14/14] doc: add hairpin feature Ori Kam
2019-10-08 14:55     ` Andrew Rybchenko
2019-10-10  8:24       ` Ori Kam
2019-10-15  9:04 ` [dpdk-dev] [PATCH v3 00/14] " Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 01/14] ethdev: add support for hairpin queue Ori Kam
2019-10-15 10:12     ` Andrew Rybchenko
2019-10-16 19:36       ` Ori Kam
2019-10-17 10:41         ` Andrew Rybchenko
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 02/14] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 03/14] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 04/14] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 05/14] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 06/14] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 07/14] app/testpmd: add hairpin support Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 08/14] net/mlx5: add hairpin binding function Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 09/14] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 10/14] net/mlx5: add internal tag item and action Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 11/14] net/mlx5: add id generation function Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 12/14] net/mlx5: add default flows for hairpin Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 13/14] net/mlx5: split hairpin flows Ori Kam
2019-10-15  9:04   ` [dpdk-dev] [PATCH v3 14/14] doc: add hairpin feature Ori Kam
2019-10-17 15:32 ` [dpdk-dev] [PATCH v4 00/15] " Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 01/15] ethdev: move queue state defines to private file Ori Kam
2019-10-17 15:37     ` Stephen Hemminger
2019-10-22 10:59     ` Andrew Rybchenko
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 02/15] ethdev: add support for hairpin queue Ori Kam
2019-10-17 21:01     ` Thomas Monjalon
2019-10-22 11:37     ` Andrew Rybchenko
2019-10-23  6:23       ` Ori Kam
2019-10-23  7:04     ` Thomas Monjalon
2019-10-23 10:09       ` Ori Kam
2019-10-23 10:18         ` Bruce Richardson
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 04/15] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 05/15] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 06/15] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 07/15] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 08/15] app/testpmd: add hairpin support Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 09/15] net/mlx5: add hairpin binding function Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 11/15] net/mlx5: add internal tag item and action Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 12/15] net/mlx5: add id generation function Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 13/15] net/mlx5: add default flows for hairpin Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 14/15] net/mlx5: split hairpin flows Ori Kam
2019-10-17 15:32   ` [dpdk-dev] [PATCH v4 15/15] doc: add hairpin feature Ori Kam
2019-10-18 19:07   ` [dpdk-dev] [PATCH v4 00/15] " Ferruh Yigit
2019-10-23 13:37 ` [dpdk-dev] [PATCH v5 " Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 01/15] ethdev: move queue state defines to private file Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 02/15] ethdev: add support for hairpin queue Ori Kam
2019-10-24  7:54     ` Andrew Rybchenko
2019-10-24  8:29       ` Ori Kam
2019-10-24 14:47         ` Andrew Rybchenko
2019-10-24 15:17           ` Thomas Monjalon
2019-10-24 15:30             ` Andrew Rybchenko
2019-10-24 15:34               ` Thomas Monjalon
2019-10-25 19:01                 ` Ori Kam
2019-10-25 22:16                   ` Thomas Monjalon
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 03/15] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 04/15] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 05/15] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 06/15] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 07/15] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 08/15] app/testpmd: add hairpin support Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 09/15] net/mlx5: add hairpin binding function Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 10/15] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 11/15] net/mlx5: add internal tag item and action Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 12/15] net/mlx5: add id generation function Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 13/15] net/mlx5: add default flows for hairpin Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 14/15] net/mlx5: split hairpin flows Ori Kam
2019-10-23 13:37   ` [dpdk-dev] [PATCH v5 15/15] doc: add hairpin feature Ori Kam
2019-10-24  8:11     ` Thomas Monjalon
2019-10-25 18:49   ` [dpdk-dev] [PATCH v5 00/15] " Ferruh Yigit
2019-10-27 12:24 ` [dpdk-dev] [PATCH v6 00/14] " Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 01/14] ethdev: move queue state defines to private file Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 02/14] ethdev: add support for hairpin queue Ori Kam
2019-10-28 15:16     ` Andrew Rybchenko
2019-10-28 18:44       ` Ori Kam
2019-10-29  7:38         ` Andrew Rybchenko
2019-10-29 19:39           ` Ori Kam
2019-10-30  6:39             ` Andrew Rybchenko
2019-10-30  6:56               ` Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 04/14] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 05/14] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 06/14] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 07/14] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 08/14] app/testpmd: add hairpin support Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 09/14] net/mlx5: add hairpin binding function Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 11/14] net/mlx5: add internal tag item and action Ori Kam
2019-10-27 12:24   ` [dpdk-dev] [PATCH v6 12/14] net/mlx5: add id generation function Ori Kam
2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 13/14] net/mlx5: add default flows for hairpin Ori Kam
2019-10-27 12:25   ` [dpdk-dev] [PATCH v6 14/14] net/mlx5: split hairpin flows Ori Kam
2019-10-30 23:53 ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 01/14] ethdev: move queue state defines to private file Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 02/14] ethdev: add support for hairpin queue Ori Kam
2019-10-31  8:25     ` Andrew Rybchenko
2019-11-05 11:24     ` Ferruh Yigit
2019-11-05 11:36       ` Ori Kam
2019-11-05 11:49         ` Andrew Rybchenko
2019-11-05 12:00           ` Ori Kam
2019-11-05 12:05           ` Ferruh Yigit
2019-11-05 12:12             ` Andrew Rybchenko
2019-11-05 12:23               ` Ferruh Yigit
2019-11-05 12:27                 ` Andrew Rybchenko
2019-11-05 12:51                   ` Thomas Monjalon
2019-11-05 12:53                     ` Andrew Rybchenko
2019-11-05 13:02                       ` Thomas Monjalon
2019-11-05 13:23                         ` Ori Kam
2019-11-05 13:27                           ` Thomas Monjalon
2019-11-05 13:34                             ` Andrew Rybchenko
2019-11-05 13:41                         ` Andrew Rybchenko
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 03/14] net/mlx5: query hca hairpin capabilities Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 04/14] net/mlx5: support Rx hairpin queues Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 05/14] net/mlx5: prepare txq to work with different types Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 06/14] net/mlx5: support Tx hairpin queues Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 07/14] net/mlx5: add get hairpin capabilities Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 08/14] app/testpmd: add hairpin support Ori Kam
2019-10-31 17:11     ` Ferruh Yigit
2019-10-31 17:36       ` Ori Kam
2019-10-31 17:54         ` Ferruh Yigit
2019-10-31 18:59           ` Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 09/14] net/mlx5: add hairpin binding function Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 10/14] net/mlx5: add support for hairpin hrxq Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 11/14] net/mlx5: add internal tag item and action Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 12/14] net/mlx5: add id generation function Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 13/14] net/mlx5: add default flows for hairpin Ori Kam
2019-10-30 23:53   ` [dpdk-dev] [PATCH v7 14/14] net/mlx5: split hairpin flows Ori Kam
2019-10-31 17:13   ` [dpdk-dev] [PATCH v7 00/14] add hairpin feature Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).