DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Direct re-arming of buffers on receive side
@ 2022-09-27  2:47 Feifei Wang
  2022-09-27  2:47 ` [PATCH v2 1/3] ethdev: add API for direct rearm mode Feifei Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Feifei Wang @ 2022-09-27  2:47 UTC (permalink / raw)
  Cc: dev, nd, Feifei Wang

Currently, the transmit side frees the buffers into the lcore cache and
the receive side allocates buffers from the lcore cache. The transmit
side typically frees 32 buffers resulting in 32*8=256B of stores to
lcore cache. The receive side allocates 32 buffers and stores them in
the receive side software ring, resulting in 32*8=256B of stores and
256B of load from the lcore cache.

This patch proposes a mechanism to avoid freeing to/allocating from
the lcore cache. i.e. the receive side will free the buffers from
transmit side directly into it's software ring. This will avoid the 256B
of loads and stores introduced by the lcore cache. It also frees up the
cache lines used by the lcore cache.

However, this solution poses several constraints:

1)The receive queue needs to know which transmit queue it should take
the buffers from. The application logic decides which transmit port to
use to send out the packets. In many use cases the NIC might have a
single port ([1], [2], [3]), in which case a given transmit queue is
always mapped to a single receive queue (1:1 Rx queue: Tx queue). This
is easy to configure.

If the NIC has 2 ports (there are several references), then we will have
1:2 (RX queue: TX queue) mapping which is still easy to configure.
However, if this is generalized to 'N' ports, the configuration can be
long. More over the PMD would have to scan a list of transmit queues to
pull the buffers from.

2)The other factor that needs to be considered is 'run-to-completion' vs
'pipeline' models. In the run-to-completion model, the receive side and
the transmit side are running on the same lcore serially. In the pipeline
model. The receive side and transmit side might be running on different
lcores in parallel. This requires locking. This is not supported at this
point.

3)Tx and Rx buffers must be from the same mempool. And we also must
ensure Tx buffer free number is equal to Rx buffer free number:
(txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH)
Thus, 'tx_next_dd' can be updated correctly in direct-rearm mode. This
is due to tx_next_dd is a variable to compute tx sw-ring free location.
Its value will be one more round than the position where next time free
starts.

Current status in this patch:
1)Two APIs are added for users to enable direct-rearm mode:
  In control plane, users can call 'rte_eth_txq_data_get' to get Tx sw_ring
  pointer and its txq_info (This avoid Rx load Tx data directly);

  In data plane, users can  call 'rte_eth_rx_direct_rearm' to rearm Rx
  buffers and free Tx buffers at the same time (Currently it supports 1:1
  (rxq:txq) mapping:)
-----------------------------------------------------------------------
  control plane:
  	rte_eth_txq_data_get(*txq_data);
  data plane:
  	loop {
  		rte_eth_rx_direct_rearm(*txq_data){
     			for (i = 0; i <= 32; i++) {
       				rx.mbuf[i] = tx.mbuf[i];
       				initialize descs[i];
    			}
		}
		rte_eth_rx_burst;
		rte_eth_tx_burst;
  	}
-----------------------------------------------------------------------
2)The i40e driver is changed to do the direct re-arm of the receive
  side.
3)L3fwd application is modified to enable direct rearm mode. Users can
  enable direct-rearm and map queues by input parameters.

Testing status:
1.The testing results for L3fwd are as follows:
-------------------------------------------------------------------
enabled direct rearm
-------------------------------------------------------------------
Arm:
N1SDP(neon path):
without fast-free mode		with fast-free mode
	+15.09%				+4.2%

Ampere Altra(neon path):
without fast-free mode		with fast-free mode
	+10.9%				+14.6%
-------------------------------------------------------------------

2.The testing results for VPP-L3fwd are as follows:
-------------------------------------------------------------------
Arm:
N1SDP(neon path):
with direct re-arm mode enabled
	+4.5%

Ampere Altra(neon path):
with direct re-arm mode enabled
        +6.5%
-------------------------------------------------------------------

Reference:
[1] https://store.nvidia.com/en-us/networking/store/product/MCX623105AN-CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECryptoDisabled/
[2] https://www.intel.com/content/www/us/en/products/sku/192561/intel-ethernet-network-adapter-e810cqda1/specifications.html
[3] https://www.broadcom.com/products/ethernet-connectivity/network-adapters/100gb-nic-ocp/n1100g

V2:
1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa)
2. Add 'txq_data_get' API to get txq info for Rx (Konstantin)
3. Use input parameter to enable direct rearm in l3fwd (Konstantin)
4. Add condition detection for direct rearm API (Morten, Andrew Rybchenko)

Feifei Wang (3):
  ethdev: add API for direct rearm mode
  net/i40e: enable direct rearm mode
  examples/l3fwd: enable direct rearm mode

 drivers/net/i40e/i40e_ethdev.c        |  1 +
 drivers/net/i40e/i40e_ethdev.h        |  2 +
 drivers/net/i40e/i40e_rxtx.c          | 19 ++++++
 drivers/net/i40e/i40e_rxtx.h          |  2 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 93 ++++++++++++++++++++++++++
 examples/l3fwd/l3fwd.h                | 12 ++++
 examples/l3fwd/l3fwd_lpm.c            | 22 +++++++
 examples/l3fwd/main.c                 | 94 +++++++++++++++++++++++++-
 lib/ethdev/ethdev_driver.h            |  9 +++
 lib/ethdev/ethdev_private.c           |  1 +
 lib/ethdev/rte_ethdev.c               | 37 +++++++++++
 lib/ethdev/rte_ethdev.h               | 95 +++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev_core.h          |  5 ++
 lib/ethdev/version.map                |  4 ++
 14 files changed, 395 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-09-27  2:47 [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
@ 2022-09-27  2:47 ` Feifei Wang
  2022-10-03  8:58   ` Konstantin Ananyev
  2022-09-27  2:47 ` [PATCH v2 2/3] net/i40e: enable " Feifei Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-09-27  2:47 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Feifei Wang, Honnappa Nagarahalli, Ruifeng Wang

Add API for enabling direct rearm mode and for mapping RX and TX
queues. Currently, the API supports 1:1(txq : rxq) mapping.

Furthermore, to avoid Rx load Tx data directly, add API called
'rte_eth_txq_data_get' to get Tx sw_ring and its information.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/ethdev/ethdev_driver.h   |  9 ++++
 lib/ethdev/ethdev_private.c  |  1 +
 lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
 lib/ethdev/rte_ethdev.h      | 95 ++++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev_core.h |  5 ++
 lib/ethdev/version.map       |  4 ++
 6 files changed, 151 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 47a55a419e..14f52907c1 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -58,6 +58,8 @@ struct rte_eth_dev {
 	eth_rx_descriptor_status_t rx_descriptor_status;
 	/** Check the status of a Tx descriptor */
 	eth_tx_descriptor_status_t tx_descriptor_status;
+	/**  Use Tx mbufs for Rx to rearm */
+	eth_rx_direct_rearm_t rx_direct_rearm;
 
 	/**
 	 * Device data that is shared between primary and secondary processes
@@ -486,6 +488,11 @@ typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
 				    uint16_t rx_queue_id);
 
+/**< @internal Get Tx information of a transmit queue of an Ethernet device. */
+typedef void (*eth_txq_data_get_t)(struct rte_eth_dev *dev,
+				      uint16_t tx_queue_id,
+				      struct rte_eth_txq_data *txq_data);
+
 /** @internal Release memory resources allocated by given Rx/Tx queue. */
 typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
 				    uint16_t queue_id);
@@ -1138,6 +1145,8 @@ struct eth_dev_ops {
 	eth_rxq_info_get_t         rxq_info_get;
 	/** Retrieve Tx queue information */
 	eth_txq_info_get_t         txq_info_get;
+	/** Get the address where Tx data is stored */
+	eth_txq_data_get_t         txq_data_get;
 	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst mode */
 	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst mode */
 	eth_fw_version_get_t       fw_version_get; /**< Get firmware version */
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 48090c879a..bfe16c7d77 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
 	fpo->rx_queue_count = dev->rx_queue_count;
 	fpo->rx_descriptor_status = dev->rx_descriptor_status;
 	fpo->tx_descriptor_status = dev->tx_descriptor_status;
+	fpo->rx_direct_rearm = dev->rx_direct_rearm;
 
 	fpo->rxq.data = dev->data->rx_queues;
 	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 0c2c1088c0..0dccec2e4b 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
 	return ret;
 }
 
+int
+rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
+			struct rte_eth_txq_data *txq_data)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	if (txq_data == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx queue %u data to NULL\n",
+			port_id, queue_id);
+		return -EINVAL;
+	}
+
+	if (dev->data->tx_queues == NULL ||
+			dev->data->tx_queues[queue_id] == NULL) {
+		RTE_ETHDEV_LOG(ERR,
+			   "Tx queue %"PRIu16" of device with port_id=%"
+			   PRIu16" has not been setup\n",
+			   queue_id, port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->txq_data_get == NULL)
+		return -ENOTSUP;
+
+	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
+
+	return 0;
+}
+
 static int
 rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 			     uint16_t n_seg, uint32_t *mbp_buf_size,
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 2e783536c1..daf7f05d62 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
 	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
 } __rte_cache_min_aligned;
 
+/**
+ * @internal
+ * Structure used to hold pointers to internal ethdev Tx data.
+ * The main purpose is to load and store Tx queue data in direct rearm mode.
+ */
+struct rte_eth_txq_data {
+	uint64_t *offloads;
+	void *tx_sw_ring;
+	volatile void *tx_ring;
+	uint16_t *tx_next_dd;
+	uint16_t *nb_tx_free;
+	uint16_t nb_tx_desc;
+	uint16_t tx_rs_thresh;
+	uint16_t tx_free_thresh;
+} __rte_cache_min_aligned;
+
+
 /* Generic Burst mode flag definition, values can be ORed. */
 
 /**
@@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t port_id, uint16_t queue_id,
 int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
 		const struct rte_eth_rxtx_callback *user_cb);
 
+/**
+ * Get the address which Tx data is stored.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The Tx queue on the Ethernet device for which information
+ *   will be retrieved.
+ * @param txq_data
+ *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENODEV:  If *port_id* is invalid.
+ *   - -ENOTSUP: routine is not supported by the device PMD.
+ *   - -EINVAL:  The queue_id is out of range.
+ */
+__rte_experimental
+int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
+		struct rte_eth_txq_data *txq_data);
+
 /**
  * Retrieve information about given port's Rx queue.
  *
@@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Put Tx buffers into Rx sw-ring and rearm descs.
+ *
+ * @param port_id
+ *   Port identifying the receive side.
+ * @param queue_id
+ *   The index of the transmit queue identifying the receive side.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param txq_data
+ *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
+ * @return
+ *   The number of direct-rearmed buffers.
+ */
+__rte_experimental
+static __rte_always_inline uint16_t
+rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
+		struct rte_eth_txq_data *txq_data)
+{
+	uint16_t nb_rearm;
+	struct rte_eth_fp_ops *p;
+	void *qd;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+	if (port_id >= RTE_MAX_ETHPORTS ||
+			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid port_id=%u or queue_id=%u\n",
+			port_id, queue_id);
+		return 0;
+	}
+#endif
+
+	p = &rte_eth_fp_ops[port_id];
+	qd = p->rxq.data[queue_id];
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+
+	if (qd == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for port_id=%u\n",
+			queue_id, port_id);
+		return 0;
+	}
+
+	if (!p->rx_direct_rearm)
+		return -ENOTSUP;
+#endif
+
+	nb_rearm = p->rx_direct_rearm(qd, txq_data);
+
+	return nb_rearm;
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index dcf8adab92..6a328e2481 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -56,6 +56,9 @@ typedef int (*eth_rx_descriptor_status_t)(void *rxq, uint16_t offset);
 /** @internal Check the status of a Tx descriptor */
 typedef int (*eth_tx_descriptor_status_t)(void *txq, uint16_t offset);
 
+/** @internal Put Tx buffers into Rx sw-ring and rearm descs */
+typedef uint16_t (*eth_rx_direct_rearm_t)(void *rxq, struct rte_eth_txq_data *txq_data);
+
 /**
  * @internal
  * Structure used to hold opaque pointers to internal ethdev Rx/Tx
@@ -90,6 +93,8 @@ struct rte_eth_fp_ops {
 	eth_rx_queue_count_t rx_queue_count;
 	/** Check the status of a Rx descriptor. */
 	eth_rx_descriptor_status_t rx_descriptor_status;
+	/** Put Tx buffers into Rx sw-ring and rearm descs */
+	eth_rx_direct_rearm_t rx_direct_rearm;
 	/** Rx queues data. */
 	struct rte_ethdev_qdata rxq;
 	uintptr_t reserved1[3];
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..69e4c376d3 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,10 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_tx_queue_data_get;
+	rte_eth_rx_direct_rearm;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 2/3] net/i40e: enable direct rearm mode
  2022-09-27  2:47 [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
  2022-09-27  2:47 ` [PATCH v2 1/3] ethdev: add API for direct rearm mode Feifei Wang
@ 2022-09-27  2:47 ` Feifei Wang
  2022-09-27  2:47 ` [PATCH v2 3/3] examples/l3fwd: " Feifei Wang
  2022-09-29  6:19 ` 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
  3 siblings, 0 replies; 17+ messages in thread
From: Feifei Wang @ 2022-09-27  2:47 UTC (permalink / raw)
  To: Yuying Zhang, Beilei Xing, Ruifeng Wang
  Cc: dev, nd, Feifei Wang, Honnappa Nagarahalli

For i40e driver, enable direct re-arm mode. This patch supports the case
of mapping Rx/Tx queues from the same single lcore.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 drivers/net/i40e/i40e_ethdev.c        |  1 +
 drivers/net/i40e/i40e_ethdev.h        |  2 +
 drivers/net/i40e/i40e_rxtx.c          | 19 ++++++
 drivers/net/i40e/i40e_rxtx.h          |  2 +
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 93 +++++++++++++++++++++++++++
 5 files changed, 117 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 65e689df32..649ec06f31 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -497,6 +497,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
 	.flow_ops_get                 = i40e_dev_flow_ops_get,
 	.rxq_info_get                 = i40e_rxq_info_get,
 	.txq_info_get                 = i40e_txq_info_get,
+	.txq_data_get                 = i40e_txq_data_get,
 	.rx_burst_mode_get            = i40e_rx_burst_mode_get,
 	.tx_burst_mode_get            = i40e_tx_burst_mode_get,
 	.timesync_enable              = i40e_timesync_enable,
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index fe943a45ff..ee13730917 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -1352,6 +1352,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	struct rte_eth_rxq_info *qinfo);
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
+void i40e_txq_data_get(struct rte_eth_dev *dev, uint16_t queue_id,
+	struct rte_eth_txq_data *txq_data);
 int i40e_rx_burst_mode_get(struct rte_eth_dev *dev, uint16_t queue_id,
 			   struct rte_eth_burst_mode *mode);
 int i40e_tx_burst_mode_get(struct rte_eth_dev *dev, uint16_t queue_id,
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 788ffb51c2..be92ccd38a 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -3197,6 +3197,24 @@ i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	qinfo->conf.offloads = txq->offloads;
 }
 
+void
+i40e_txq_data_get(struct rte_eth_dev *dev, uint16_t queue_id,
+			struct rte_eth_txq_data *txq_data)
+{
+	struct i40e_tx_queue *txq;
+
+	txq = dev->data->tx_queues[queue_id];
+
+	txq_data->offloads = &txq->offloads;
+	txq_data->tx_sw_ring = txq->sw_ring;
+	txq_data->tx_ring = txq->tx_ring;
+	txq_data->tx_next_dd = &txq->tx_next_dd;
+	txq_data->nb_tx_free = &txq->nb_tx_free;
+	txq_data->nb_tx_desc = txq->nb_tx_desc;
+	txq_data->tx_rs_thresh = txq->tx_rs_thresh;
+	txq_data->tx_free_thresh = txq->tx_free_thresh;
+}
+
 #ifdef RTE_ARCH_X86
 static inline bool
 get_avx_supported(bool request_avx512)
@@ -3321,6 +3339,7 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Using Vector Rx (port %d).",
 				     dev->data->port_id);
 			dev->rx_pkt_burst = i40e_recv_pkts_vec;
+			dev->rx_direct_rearm = i40e_direct_rearm_vec;
 		}
 #endif /* RTE_ARCH_X86 */
 	} else if (!dev->data->scattered_rx && ad->rx_bulk_alloc_allowed) {
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 5e6eecc501..7a8fa2d1e8 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -216,6 +216,8 @@ uint16_t i40e_simple_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts);
 uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+uint16_t i40e_direct_rearm_vec(void *rx_queue,
+			  struct rte_eth_txq_data *txq_data);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index 12e6f1cbcb..84e0159e08 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -762,3 +762,96 @@ i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev)
 {
 	return i40e_rx_vec_dev_conf_condition_check_default(dev);
 }
+
+uint16_t
+i40e_direct_rearm_vec(void *rx_queue, struct rte_eth_txq_data *txq_data)
+{
+	struct i40e_rx_queue *rxq = rx_queue;
+	struct i40e_rx_entry *rxep;
+	struct i40e_tx_entry *txep;
+	volatile union i40e_rx_desc *rxdp;
+	volatile struct i40e_tx_desc *tx_ring;
+	struct rte_mbuf *m;
+	uint16_t rx_id;
+	uint64x2_t dma_addr;
+	uint64_t paddr;
+	uint16_t i, n;
+	uint16_t nb_rearm;
+
+	if (rxq->rxrearm_nb > txq_data->tx_rs_thresh &&
+			*txq_data->nb_tx_free < txq_data->tx_free_thresh) {
+		tx_ring = txq_data->tx_ring;
+		/* check DD bits on threshold descriptor */
+		if ((tx_ring[*txq_data->tx_next_dd].cmd_type_offset_bsz &
+				rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
+				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
+			return 0;
+		}
+		n = txq_data->tx_rs_thresh;
+
+		/* first buffer to free from S/W ring is at index
+		 * tx_next_dd - (tx_rs_thresh-1)
+		 */
+		txep = txq_data->tx_sw_ring;
+		txep += *txq_data->tx_next_dd - (txq_data->tx_rs_thresh - 1);
+		rxep = &rxq->sw_ring[rxq->rxrearm_start];
+		rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+		if (*txq_data->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
+			/* directly put mbufs from Tx to Rx,
+			 * and initialize the mbufs in vector
+			 */
+			for (i = 0; i < n; i++, rxep++, txep++) {
+				rxep[0].mbuf = txep[0].mbuf;
+
+				/* Initialize rxdp descs */
+				m = txep[0].mbuf;
+				paddr = m->buf_iova + RTE_PKTMBUF_HEADROOM;
+				dma_addr = vdupq_n_u64(paddr);
+				/* flush desc with pa dma_addr */
+				vst1q_u64((uint64_t *)&rxdp++->read, dma_addr);
+			}
+		} else {
+			for (i = 0, nb_rearm = 0; i < n; i++) {
+				m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
+				if (m != NULL) {
+					rxep[i].mbuf = m;
+
+					/* Initialize rxdp descs */
+					paddr = m->buf_iova + RTE_PKTMBUF_HEADROOM;
+					dma_addr = vdupq_n_u64(paddr);
+					/* flush desc with pa dma_addr */
+					vst1q_u64((uint64_t *)&rxdp++->read, dma_addr);
+					nb_rearm++;
+				}
+			}
+			n = nb_rearm;
+		}
+
+		/* update counters for Tx */
+		*txq_data->nb_tx_free = *txq_data->nb_tx_free + txq_data->tx_rs_thresh;
+		*txq_data->tx_next_dd = *txq_data->tx_next_dd + txq_data->tx_rs_thresh;
+		if (*txq_data->tx_next_dd >= txq_data->nb_tx_desc)
+			*txq_data->tx_next_dd = txq_data->tx_rs_thresh - 1;
+
+		/* Update the descriptor initializer index */
+		rxq->rxrearm_start += n;
+		rx_id = rxq->rxrearm_start - 1;
+
+		if (unlikely(rxq->rxrearm_start >= rxq->nb_rx_desc)) {
+			rxq->rxrearm_start = rxq->rxrearm_start - rxq->nb_rx_desc;
+			if (!rxq->rxrearm_start)
+				rx_id = rxq->nb_rx_desc - 1;
+			else
+				rx_id = rxq->rxrearm_start - 1;
+		}
+		rxq->rxrearm_nb -= n;
+
+		rte_io_wmb();
+		/* Update the tail pointer on the NIC */
+		I40E_PCI_REG_WRITE_RELAXED(rxq->qrx_tail, rx_id);
+		return n;
+	}
+
+	return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
  2022-09-27  2:47 [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
  2022-09-27  2:47 ` [PATCH v2 1/3] ethdev: add API for direct rearm mode Feifei Wang
  2022-09-27  2:47 ` [PATCH v2 2/3] net/i40e: enable " Feifei Wang
@ 2022-09-27  2:47 ` Feifei Wang
  2022-09-30 11:56   ` Jerin Jacob
  2022-09-29  6:19 ` 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
  3 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-09-27  2:47 UTC (permalink / raw)
  Cc: dev, nd, Feifei Wang, Honnappa Nagarahalli, Ruifeng Wang

Enable direct rearm mode in l3fwd. Users can use parameters:
'--direct-rearm=(rx_portid,rx_queueid,tx_portid,tx_queueid)'
to enable direct rearm.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 examples/l3fwd/l3fwd.h     | 12 +++++
 examples/l3fwd/l3fwd_lpm.c | 22 +++++++++
 examples/l3fwd/main.c      | 94 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 40b5f32a9e..db097e344c 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -57,6 +57,10 @@
 #endif
 #define HASH_ENTRY_NUMBER_DEFAULT	16
 
+/* MAX number of direct rearm mapping entry */
+#define MAX_DIRECT_REARM_ENTRY_NUMBER   16
+#define MAX_DIRECT_REARM_QUEUE_PER_PORT 8
+
 struct parm_cfg {
 	const char *rule_ipv4_name;
 	const char *rule_ipv6_name;
@@ -114,6 +118,14 @@ extern struct parm_cfg parm_config;
 
 extern struct acl_algorithms acl_alg[];
 
+/* Used in direct rearm mode */
+extern bool enabled_direct_rearm;
+extern uint8_t direct_rearm_entry_number;
+extern bool queue_enabled_direct_rearm[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+extern uint16_t direct_rearm_map_tx_port[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+extern uint16_t direct_rearm_map_tx_queue[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+extern uint8_t direct_rearm_entry_idx[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+
 /* Send burst of packets on an output interface */
 static inline int
 send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index 22d7f61a42..973fe70aae 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -150,6 +150,8 @@ lpm_main_loop(__rte_unused void *dummy)
 	int i, nb_rx;
 	uint16_t portid;
 	uint8_t queueid;
+	uint8_t idx;
+	struct rte_eth_txq_data txq_data[MAX_DIRECT_REARM_ENTRY_NUMBER];
 	struct lcore_conf *qconf;
 	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) /
 		US_PER_S * BURST_TX_DRAIN_US;
@@ -175,6 +177,20 @@ lpm_main_loop(__rte_unused void *dummy)
 			lcore_id, portid, queueid);
 	}
 
+	if (enabled_direct_rearm) {
+		uint16_t tx_portid;
+		uint8_t tx_queueid;
+
+		for (i = 0; i < n_rx_q; i++) {
+			portid = qconf->rx_queue_list[i].port_id;
+			queueid = qconf->rx_queue_list[i].queue_id;
+			tx_portid = direct_rearm_map_tx_port[portid][queueid];
+			tx_queueid = direct_rearm_map_tx_queue[portid][queueid];
+			idx = direct_rearm_entry_idx[portid][queueid];
+			rte_eth_tx_queue_data_get(tx_portid, tx_queueid, &(txq_data[idx]));
+		}
+	}
+
 	cur_tsc = rte_rdtsc();
 	prev_tsc = cur_tsc;
 
@@ -205,6 +221,12 @@ lpm_main_loop(__rte_unused void *dummy)
 		for (i = 0; i < n_rx_q; ++i) {
 			portid = qconf->rx_queue_list[i].port_id;
 			queueid = qconf->rx_queue_list[i].queue_id;
+
+			if (queue_enabled_direct_rearm[portid][queueid]) {
+				idx = direct_rearm_entry_idx[portid][queueid];
+				rte_eth_rx_direct_rearm(portid, queueid, &(txq_data[idx]));
+			}
+
 			nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
 				MAX_PKT_BURST);
 			if (nb_rx == 0)
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index e090328fcc..2e9c5c0dc4 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -91,6 +91,20 @@ uint32_t enabled_port_mask;
 int ipv6; /**< ipv6 is false by default. */
 uint32_t hash_entry_number = HASH_ENTRY_NUMBER_DEFAULT;
 
+/* Used for direct rearm mode */
+bool enabled_direct_rearm;/**< Flag to enable direct rearm mode. */
+uint8_t direct_rearm_entry_number; /**< Number of entry for direct rearm map. */
+/**< Direct rearm config parameters. */
+uint16_t direct_rearm_config[MAX_DIRECT_REARM_ENTRY_NUMBER][4];
+/**< Enable direct rearm flag for Rx queue . */
+bool queue_enabled_direct_rearm[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+/**< Matrix for Rx queue mapping Tx port in direct rearm mode. */
+uint16_t direct_rearm_map_tx_port[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+/**< Matrix for Rx queue mapping Tx queue in direct rearm mode. */
+uint16_t direct_rearm_map_tx_queue[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+/**< Matrix for Rx queue mapping entry idx in direct rearm mode. */
+uint8_t direct_rearm_entry_idx[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
+
 struct lcore_conf lcore_conf[RTE_MAX_LCORE];
 
 struct parm_cfg parm_config;
@@ -403,6 +417,7 @@ print_usage(const char *prgname)
 		" [--mode]"
 		" [--eventq-sched]"
 		" [--event-vector [--event-vector-size SIZE] [--event-vector-tmo NS]]"
+		" --direct-rearm (rx_port, rx_queue, tx_port, tx_queue)[,(rx_port, rx_queue, tx_port, tx_queue)]"
 		" [-E]"
 		" [-L]\n\n"
 
@@ -436,6 +451,7 @@ print_usage(const char *prgname)
 		"  --event-vector:  Enable event vectorization.\n"
 		"  --event-vector-size: Max vector size if event vectorization is enabled.\n"
 		"  --event-vector-tmo: Max timeout to form vector in nanoseconds if event vectorization is enabled\n"
+		"  --direct-rearm (rx_port, rx_queue, tx_port, tx_queue): Put Tx queue sw-ring buffers into Rx queue\n"
 		"  -E : Enable exact match, legacy flag please use --lookup=em instead\n"
 		"  -L : Enable longest prefix match, legacy flag please use --lookup=lpm instead\n"
 		"  --rule_ipv4=FILE: Specify the ipv4 rules entries file.\n"
@@ -670,6 +686,53 @@ parse_lookup(const char *optarg)
 	return 0;
 }
 
+static int
+parse_direct_rearm(const char *q_arg)
+{
+	char s[256];
+	const char *p, *p0 = q_arg;
+	char *end;
+	enum fieldnames {
+		FLD_RX_PORT = 0,
+		FLD_RX_QUEUE,
+		FLD_TX_PORT,
+		FLD_TX_QUEUE,
+		_NUM_FLD
+	};
+	unsigned long int_fld[_NUM_FLD];
+	char *str_fld[_NUM_FLD];
+	int i;
+	unsigned int size;
+
+	while ((p = strchr(p0, '(')) != NULL) {
+		++p;
+		p0 = strchr(p, ')');
+		if (p0 == NULL)
+			return -1;
+
+		size = p0 - p;
+		if (size >= sizeof(s))
+			return -1;
+
+		snprintf(s, sizeof(s), "%.*s", size, p);
+		if (rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',') != _NUM_FLD)
+			return -1;
+		for (i = 0; i < _NUM_FLD; i++) {
+			errno = 0;
+			int_fld[i] = strtoul(str_fld[i], &end, 0);
+			if (errno != 0 || end == str_fld[i] || int_fld[i] > 255)
+				return -1;
+		}
+
+		direct_rearm_config[direct_rearm_entry_number][0] = int_fld[FLD_RX_PORT];
+		direct_rearm_config[direct_rearm_entry_number][1] = int_fld[FLD_RX_QUEUE];
+		direct_rearm_config[direct_rearm_entry_number][2] = int_fld[FLD_TX_PORT];
+		direct_rearm_config[direct_rearm_entry_number][3] = int_fld[FLD_TX_QUEUE];
+		++direct_rearm_entry_number;
+	}
+	return 0;
+}
+
 #define MAX_JUMBO_PKT_LEN  9600
 
 static const char short_options[] =
@@ -696,6 +759,7 @@ static const char short_options[] =
 #define CMD_LINE_OPT_ENABLE_VECTOR "event-vector"
 #define CMD_LINE_OPT_VECTOR_SIZE "event-vector-size"
 #define CMD_LINE_OPT_VECTOR_TMO_NS "event-vector-tmo"
+#define CMD_LINE_OPT_DIRECT_REARM "direct-rearm"
 #define CMD_LINE_OPT_RULE_IPV4 "rule_ipv4"
 #define CMD_LINE_OPT_RULE_IPV6 "rule_ipv6"
 #define CMD_LINE_OPT_ALG "alg"
@@ -725,7 +789,8 @@ enum {
 	CMD_LINE_OPT_LOOKUP_NUM,
 	CMD_LINE_OPT_ENABLE_VECTOR_NUM,
 	CMD_LINE_OPT_VECTOR_SIZE_NUM,
-	CMD_LINE_OPT_VECTOR_TMO_NS_NUM
+	CMD_LINE_OPT_VECTOR_TMO_NS_NUM,
+	CMD_LINE_OPT_DIRECT_REARM_NUM
 };
 
 static const struct option lgopts[] = {
@@ -747,6 +812,7 @@ static const struct option lgopts[] = {
 	{CMD_LINE_OPT_ENABLE_VECTOR, 0, 0, CMD_LINE_OPT_ENABLE_VECTOR_NUM},
 	{CMD_LINE_OPT_VECTOR_SIZE, 1, 0, CMD_LINE_OPT_VECTOR_SIZE_NUM},
 	{CMD_LINE_OPT_VECTOR_TMO_NS, 1, 0, CMD_LINE_OPT_VECTOR_TMO_NS_NUM},
+	{CMD_LINE_OPT_DIRECT_REARM, 1, 0, CMD_LINE_OPT_DIRECT_REARM_NUM},
 	{CMD_LINE_OPT_RULE_IPV4,   1, 0, CMD_LINE_OPT_RULE_IPV4_NUM},
 	{CMD_LINE_OPT_RULE_IPV6,   1, 0, CMD_LINE_OPT_RULE_IPV6_NUM},
 	{CMD_LINE_OPT_ALG,   1, 0, CMD_LINE_OPT_ALG_NUM},
@@ -912,6 +978,15 @@ parse_args(int argc, char **argv)
 		case CMD_LINE_OPT_VECTOR_TMO_NS_NUM:
 			evt_rsrc->vector_tmo_ns = strtoull(optarg, NULL, 10);
 			break;
+		case CMD_LINE_OPT_DIRECT_REARM_NUM:
+			enabled_direct_rearm = 1;
+			ret = parse_direct_rearm(optarg);
+			if (ret) {
+				fprintf(stderr, "Invalid direct rearm map\n");
+				print_usage(prgname);
+				return -1;
+				}
+			break;
 		case CMD_LINE_OPT_RULE_IPV4_NUM:
 			l3fwd_set_rule_ipv4_name(optarg);
 			break;
@@ -1594,6 +1669,23 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (enabled_direct_rearm) {
+		uint16_t rx_port, tx_port;
+		uint8_t rx_queue, tx_queue;
+		uint8_t m = 0;
+		while (m < direct_rearm_entry_number) {
+			rx_port = direct_rearm_config[m][0];
+			rx_queue = direct_rearm_config[m][1];
+			tx_port = direct_rearm_config[m][2];
+			tx_queue = direct_rearm_config[m][3];
+			queue_enabled_direct_rearm[rx_port][rx_queue] = 1;
+			direct_rearm_map_tx_port[rx_port][rx_queue] = tx_port;
+			direct_rearm_map_tx_queue[rx_port][rx_queue] = tx_queue;
+			direct_rearm_entry_idx[rx_port][rx_queue] = m;
+			m++;
+		}
+	}
+
 	check_all_ports_link_status(enabled_port_mask);
 
 	ret = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side
  2022-09-27  2:47 [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
                   ` (2 preceding siblings ...)
  2022-09-27  2:47 ` [PATCH v2 3/3] examples/l3fwd: " Feifei Wang
@ 2022-09-29  6:19 ` Feifei Wang
  2022-09-29 10:29   ` Ferruh Yigit
  3 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-09-29  6:19 UTC (permalink / raw)
  To: Konstantin Ananyev, Andrew Rybchenko, mb; +Cc: dev, nd, nd

> -----邮件原件-----
> 发件人: Feifei Wang <feifei.wang2@arm.com>
> 发送时间: Tuesday, September 27, 2022 10:48 AM
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>
> 主题: [PATCH v2 0/3] Direct re-arming of buffers on receive side
> 
> Currently, the transmit side frees the buffers into the lcore cache and the
> receive side allocates buffers from the lcore cache. The transmit side typically
> frees 32 buffers resulting in 32*8=256B of stores to lcore cache. The receive
> side allocates 32 buffers and stores them in the receive side software ring,
> resulting in 32*8=256B of stores and 256B of load from the lcore cache.
> 
> This patch proposes a mechanism to avoid freeing to/allocating from the
> lcore cache. i.e. the receive side will free the buffers from transmit side
> directly into it's software ring. This will avoid the 256B of loads and stores
> introduced by the lcore cache. It also frees up the cache lines used by the
> lcore cache.
> 
> However, this solution poses several constraints:
> 
> 1)The receive queue needs to know which transmit queue it should take the
> buffers from. The application logic decides which transmit port to use to send
> out the packets. In many use cases the NIC might have a single port ([1], [2],
> [3]), in which case a given transmit queue is always mapped to a single
> receive queue (1:1 Rx queue: Tx queue). This is easy to configure.
> 
> If the NIC has 2 ports (there are several references), then we will have
> 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> However, if this is generalized to 'N' ports, the configuration can be long.
> More over the PMD would have to scan a list of transmit queues to pull the
> buffers from.
> 
> 2)The other factor that needs to be considered is 'run-to-completion' vs
> 'pipeline' models. In the run-to-completion model, the receive side and the
> transmit side are running on the same lcore serially. In the pipeline model.
> The receive side and transmit side might be running on different lcores in
> parallel. This requires locking. This is not supported at this point.
> 
> 3)Tx and Rx buffers must be from the same mempool. And we also must
> ensure Tx buffer free number is equal to Rx buffer free number:
> (txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH) Thus, 'tx_next_dd'
> can be updated correctly in direct-rearm mode. This is due to tx_next_dd is a
> variable to compute tx sw-ring free location.
> Its value will be one more round than the position where next time free
> starts.
> 
> Current status in this patch:
> 1)Two APIs are added for users to enable direct-rearm mode:
>   In control plane, users can call 'rte_eth_txq_data_get' to get Tx sw_ring
>   pointer and its txq_info (This avoid Rx load Tx data directly);
> 
>   In data plane, users can  call 'rte_eth_rx_direct_rearm' to rearm Rx
>   buffers and free Tx buffers at the same time (Currently it supports 1:1
>   (rxq:txq) mapping:)
> -----------------------------------------------------------------------
>   control plane:
>   	rte_eth_txq_data_get(*txq_data);
>   data plane:
>   	loop {
>   		rte_eth_rx_direct_rearm(*txq_data){
>      			for (i = 0; i <= 32; i++) {
>        				rx.mbuf[i] = tx.mbuf[i];
>        				initialize descs[i];
>     			}
> 		}
> 		rte_eth_rx_burst;
> 		rte_eth_tx_burst;
>   	}
> -----------------------------------------------------------------------
> 2)The i40e driver is changed to do the direct re-arm of the receive
>   side.
> 3)L3fwd application is modified to enable direct rearm mode. Users can
>   enable direct-rearm and map queues by input parameters.
> 
> Testing status:
> 1.The testing results for L3fwd are as follows:
> -------------------------------------------------------------------
> enabled direct rearm
> -------------------------------------------------------------------
> Arm:
> N1SDP(neon path):
> without fast-free mode		with fast-free mode
> 	+15.09%				+4.2%
> 
> Ampere Altra(neon path):
> without fast-free mode		with fast-free mode
> 	+10.9%				+14.6%
> -------------------------------------------------------------------
> 
> 2.The testing results for VPP-L3fwd are as follows:
> -------------------------------------------------------------------
> Arm:
> N1SDP(neon path):
> with direct re-arm mode enabled
> 	+4.5%
> 
> Ampere Altra(neon path):
> with direct re-arm mode enabled
>         +6.5%
> -------------------------------------------------------------------
> 
> Reference:
> [1] https://store.nvidia.com/en-
> us/networking/store/product/MCX623105AN-
> CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECrypt
> oDisabled/
> [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel-
> ethernet-network-adapter-e810cqda1/specifications.html
> [3] https://www.broadcom.com/products/ethernet-connectivity/network-
> adapters/100gb-nic-ocp/n1100g
> 
> V2:
> 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. Add
> 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use input parameter
> to enable direct rearm in l3fwd (Konstantin) 4. Add condition detection for
> direct rearm API (Morten, Andrew Rybchenko)
> 
PING

Hi, 

Would you please give some comments for this version?
Thanks very much.

Best Regards
Feifei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side
  2022-09-29  6:19 ` 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
@ 2022-09-29 10:29   ` Ferruh Yigit
  0 siblings, 0 replies; 17+ messages in thread
From: Ferruh Yigit @ 2022-09-29 10:29 UTC (permalink / raw)
  To: Feifei Wang, Konstantin Ananyev, Andrew Rybchenko, mb; +Cc: dev, nd

On 9/29/2022 7:19 AM, Feifei Wang wrote:
>> -----邮件原件-----
>> 发件人: Feifei Wang <feifei.wang2@arm.com>
>> 发送时间: Tuesday, September 27, 2022 10:48 AM
>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
>> <Feifei.Wang2@arm.com>
>> 主题: [PATCH v2 0/3] Direct re-arming of buffers on receive side
>>
>> Currently, the transmit side frees the buffers into the lcore cache and the
>> receive side allocates buffers from the lcore cache. The transmit side typically
>> frees 32 buffers resulting in 32*8=256B of stores to lcore cache. The receive
>> side allocates 32 buffers and stores them in the receive side software ring,
>> resulting in 32*8=256B of stores and 256B of load from the lcore cache.
>>
>> This patch proposes a mechanism to avoid freeing to/allocating from the
>> lcore cache. i.e. the receive side will free the buffers from transmit side
>> directly into it's software ring. This will avoid the 256B of loads and stores
>> introduced by the lcore cache. It also frees up the cache lines used by the
>> lcore cache.
>>
>> However, this solution poses several constraints:
>>
>> 1)The receive queue needs to know which transmit queue it should take the
>> buffers from. The application logic decides which transmit port to use to send
>> out the packets. In many use cases the NIC might have a single port ([1], [2],
>> [3]), in which case a given transmit queue is always mapped to a single
>> receive queue (1:1 Rx queue: Tx queue). This is easy to configure.
>>
>> If the NIC has 2 ports (there are several references), then we will have
>> 1:2 (RX queue: TX queue) mapping which is still easy to configure.
>> However, if this is generalized to 'N' ports, the configuration can be long.
>> More over the PMD would have to scan a list of transmit queues to pull the
>> buffers from.
>>
>> 2)The other factor that needs to be considered is 'run-to-completion' vs
>> 'pipeline' models. In the run-to-completion model, the receive side and the
>> transmit side are running on the same lcore serially. In the pipeline model.
>> The receive side and transmit side might be running on different lcores in
>> parallel. This requires locking. This is not supported at this point.
>>
>> 3)Tx and Rx buffers must be from the same mempool. And we also must
>> ensure Tx buffer free number is equal to Rx buffer free number:
>> (txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH) Thus, 'tx_next_dd'
>> can be updated correctly in direct-rearm mode. This is due to tx_next_dd is a
>> variable to compute tx sw-ring free location.
>> Its value will be one more round than the position where next time free
>> starts.
>>
>> Current status in this patch:
>> 1)Two APIs are added for users to enable direct-rearm mode:
>>    In control plane, users can call 'rte_eth_txq_data_get' to get Tx sw_ring
>>    pointer and its txq_info (This avoid Rx load Tx data directly);
>>
>>    In data plane, users can  call 'rte_eth_rx_direct_rearm' to rearm Rx
>>    buffers and free Tx buffers at the same time (Currently it supports 1:1
>>    (rxq:txq) mapping:)
>> -----------------------------------------------------------------------
>>    control plane:
>>    	rte_eth_txq_data_get(*txq_data);
>>    data plane:
>>    	loop {
>>    		rte_eth_rx_direct_rearm(*txq_data){
>>       			for (i = 0; i <= 32; i++) {
>>         				rx.mbuf[i] = tx.mbuf[i];
>>         				initialize descs[i];
>>      			}
>> 		}
>> 		rte_eth_rx_burst;
>> 		rte_eth_tx_burst;
>>    	}
>> -----------------------------------------------------------------------
>> 2)The i40e driver is changed to do the direct re-arm of the receive
>>    side.
>> 3)L3fwd application is modified to enable direct rearm mode. Users can
>>    enable direct-rearm and map queues by input parameters.
>>
>> Testing status:
>> 1.The testing results for L3fwd are as follows:
>> -------------------------------------------------------------------
>> enabled direct rearm
>> -------------------------------------------------------------------
>> Arm:
>> N1SDP(neon path):
>> without fast-free mode		with fast-free mode
>> 	+15.09%				+4.2%
>>
>> Ampere Altra(neon path):
>> without fast-free mode		with fast-free mode
>> 	+10.9%				+14.6%
>> -------------------------------------------------------------------
>>
>> 2.The testing results for VPP-L3fwd are as follows:
>> -------------------------------------------------------------------
>> Arm:
>> N1SDP(neon path):
>> with direct re-arm mode enabled
>> 	+4.5%
>>
>> Ampere Altra(neon path):
>> with direct re-arm mode enabled
>>          +6.5%
>> -------------------------------------------------------------------
>>
>> Reference:
>> [1] https://store.nvidia.com/en-
>> us/networking/store/product/MCX623105AN-
>> CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECrypt
>> oDisabled/
>> [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel-
>> ethernet-network-adapter-e810cqda1/specifications.html
>> [3] https://www.broadcom.com/products/ethernet-connectivity/network-
>> adapters/100gb-nic-ocp/n1100g
>>
>> V2:
>> 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) 2. Add
>> 'txq_data_get' API to get txq info for Rx (Konstantin) 3. Use input parameter
>> to enable direct rearm in l3fwd (Konstantin) 4. Add condition detection for
>> direct rearm API (Morten, Andrew Rybchenko)
>>
> PING
> 
> Hi,
> 
> Would you please give some comments for this version?
> Thanks very much.
> 

Hi Feifei,

Because of the restrictions this feature has, although it gives a 
performance improvement on some specific tests I believe it doesn't have 
much practical usage, so I have constraints about the feature.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
  2022-09-27  2:47 ` [PATCH v2 3/3] examples/l3fwd: " Feifei Wang
@ 2022-09-30 11:56   ` Jerin Jacob
  2022-10-11  7:28     ` 回复: " Feifei Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jerin Jacob @ 2022-09-30 11:56 UTC (permalink / raw)
  To: Feifei Wang; +Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang

On Tue, Sep 27, 2022 at 8:18 AM Feifei Wang <feifei.wang2@arm.com> wrote:
>
> Enable direct rearm mode in l3fwd. Users can use parameters:
> '--direct-rearm=(rx_portid,rx_queueid,tx_portid,tx_queueid)'
> to enable direct rearm.
>
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  examples/l3fwd/l3fwd.h     | 12 +++++
>  examples/l3fwd/l3fwd_lpm.c | 22 +++++++++
>  examples/l3fwd/main.c      | 94 +++++++++++++++++++++++++++++++++++++-
>  3 files changed, 127 insertions(+), 1 deletion(-)
>
> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
> index 40b5f32a9e..db097e344c 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -57,6 +57,10 @@
>  #endif
>  #define HASH_ENTRY_NUMBER_DEFAULT      16
>
> +/* MAX number of direct rearm mapping entry */
> +#define MAX_DIRECT_REARM_ENTRY_NUMBER   16
> +#define MAX_DIRECT_REARM_QUEUE_PER_PORT 8
> +
>  struct parm_cfg {
>         const char *rule_ipv4_name;
>         const char *rule_ipv6_name;
> @@ -114,6 +118,14 @@ extern struct parm_cfg parm_config;
>
>  extern struct acl_algorithms acl_alg[];
>
> +/* Used in direct rearm mode */
> +extern bool enabled_direct_rearm;
> +extern uint8_t direct_rearm_entry_number;
> +extern bool queue_enabled_direct_rearm[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
> +extern uint16_t direct_rearm_map_tx_port[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
> +extern uint16_t direct_rearm_map_tx_queue[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
> +extern uint8_t direct_rearm_entry_idx[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUEUE_PER_PORT];
> +
>  /* Send burst of packets on an output interface */
>  static inline int
>  send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
> diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
> index 22d7f61a42..973fe70aae 100644
> --- a/examples/l3fwd/l3fwd_lpm.c
> +++ b/examples/l3fwd/l3fwd_lpm.c
> @@ -150,6 +150,8 @@ lpm_main_loop(__rte_unused void *dummy)

> @@ -205,6 +221,12 @@ lpm_main_loop(__rte_unused void *dummy)
>                 for (i = 0; i < n_rx_q; ++i) {
>                         portid = qconf->rx_queue_list[i].port_id;
>                         queueid = qconf->rx_queue_list[i].queue_id;
> +
> +                       if (queue_enabled_direct_rearm[portid][queueid]) {

IMO, We should not change fastpath code that impacts performance on
other targets for a
feature which has a very constraint use case. Also need for expressing
[1], kind of defeat the purpose
LPM table populated.

IMO, If we need to add a test case for this API, testpmd would be an
ideal place with a dedicated fwd_engine just for this.

[1]
> +               "  --direct-rearm (rx_port, rx_queue, tx_port, tx_queue): Put Tx queue sw-ring buffers into Rx queue\n"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-09-27  2:47 ` [PATCH v2 1/3] ethdev: add API for direct rearm mode Feifei Wang
@ 2022-10-03  8:58   ` Konstantin Ananyev
  2022-10-11  7:19     ` 回复: " Feifei Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ananyev @ 2022-10-03  8:58 UTC (permalink / raw)
  To: Feifei Wang, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko,
	Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang

27/09/2022 03:47, Feifei Wang пишет:
> Add API for enabling direct rearm mode and for mapping RX and TX
> queues. Currently, the API supports 1:1(txq : rxq) mapping.
> 
> Furthermore, to avoid Rx load Tx data directly, add API called
> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
> 
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   lib/ethdev/ethdev_driver.h   |  9 ++++
>   lib/ethdev/ethdev_private.c  |  1 +
>   lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
>   lib/ethdev/rte_ethdev.h      | 95 ++++++++++++++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev_core.h |  5 ++
>   lib/ethdev/version.map       |  4 ++
>   6 files changed, 151 insertions(+)
> 
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 47a55a419e..14f52907c1 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -58,6 +58,8 @@ struct rte_eth_dev {
>   	eth_rx_descriptor_status_t rx_descriptor_status;
>   	/** Check the status of a Tx descriptor */
>   	eth_tx_descriptor_status_t tx_descriptor_status;
> +	/**  Use Tx mbufs for Rx to rearm */
> +	eth_rx_direct_rearm_t rx_direct_rearm;
>   
>   	/**
>   	 * Device data that is shared between primary and secondary processes
> @@ -486,6 +488,11 @@ typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>   typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
>   				    uint16_t rx_queue_id);
>   
> +/**< @internal Get Tx information of a transmit queue of an Ethernet device. */
> +typedef void (*eth_txq_data_get_t)(struct rte_eth_dev *dev,
> +				      uint16_t tx_queue_id,
> +				      struct rte_eth_txq_data *txq_data);
> +
>   /** @internal Release memory resources allocated by given Rx/Tx queue. */
>   typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
>   				    uint16_t queue_id);
> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
>   	eth_rxq_info_get_t         rxq_info_get;
>   	/** Retrieve Tx queue information */
>   	eth_txq_info_get_t         txq_info_get;
> +	/** Get the address where Tx data is stored */
> +	eth_txq_data_get_t         txq_data_get;
>   	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst mode */
>   	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst mode */
>   	eth_fw_version_get_t       fw_version_get; /**< Get firmware version */
> diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
> index 48090c879a..bfe16c7d77 100644
> --- a/lib/ethdev/ethdev_private.c
> +++ b/lib/ethdev/ethdev_private.c
> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
>   	fpo->rx_queue_count = dev->rx_queue_count;
>   	fpo->rx_descriptor_status = dev->rx_descriptor_status;
>   	fpo->tx_descriptor_status = dev->tx_descriptor_status;
> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
>   
>   	fpo->rxq.data = dev->data->rx_queues;
>   	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 0c2c1088c0..0dccec2e4b 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   	return ret;
>   }
>   
> +int
> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> +			struct rte_eth_txq_data *txq_data)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (queue_id >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n", queue_id);
> +		return -EINVAL;
> +	}
> +
> +	if (txq_data == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx queue %u data to NULL\n",
> +			port_id, queue_id);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->tx_queues == NULL ||
> +			dev->data->tx_queues[queue_id] == NULL) {
> +		RTE_ETHDEV_LOG(ERR,
> +			   "Tx queue %"PRIu16" of device with port_id=%"
> +			   PRIu16" has not been setup\n",
> +			   queue_id, port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (*dev->dev_ops->txq_data_get == NULL)
> +		return -ENOTSUP;
> +
> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
> +
> +	return 0;
> +}
> +
>   static int
>   rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   			     uint16_t n_seg, uint32_t *mbp_buf_size,
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 2e783536c1..daf7f05d62 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
>   	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
>   } __rte_cache_min_aligned;
>   
> +/**
> + * @internal
> + * Structure used to hold pointers to internal ethdev Tx data.
> + * The main purpose is to load and store Tx queue data in direct rearm mode.
> + */
> +struct rte_eth_txq_data {
> +	uint64_t *offloads;
> +	void *tx_sw_ring;
> +	volatile void *tx_ring;
> +	uint16_t *tx_next_dd;
> +	uint16_t *nb_tx_free;
> +	uint16_t nb_tx_desc;
> +	uint16_t tx_rs_thresh;
> +	uint16_t tx_free_thresh;
> +} __rte_cache_min_aligned;
> +

first of all it is not clear why this struct has to be in public header,
why it can't be in on of ethdev 'private' headers.
Second it looks like a snippet from private txq fields for
some Intel (and alike) PMDs (i40e, ice, etc.).
How it supposed to to be universal and be applicable
for any PMD that decides to implement this new API?


>   /* Generic Burst mode flag definition, values can be ORed. */
>   
>   /**
> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t port_id, uint16_t queue_id,
>   int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
>   		const struct rte_eth_rxtx_callback *user_cb);
>   
> +/**
> + * Get the address which Tx data is stored.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The Tx queue on the Ethernet device for which information
> + *   will be retrieved.
> + * @param txq_data
> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> + *
> + * @return
> + *   - 0: Success
> + *   - -ENODEV:  If *port_id* is invalid.
> + *   - -ENOTSUP: routine is not supported by the device PMD.
> + *   - -EINVAL:  The queue_id is out of range.
> + */
> +__rte_experimental
> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> +		struct rte_eth_txq_data *txq_data);
> +
>   /**
>    * Retrieve information about given port's Rx queue.
>    *
> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
>   	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>   }
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Put Tx buffers into Rx sw-ring and rearm descs.
> + *
> + * @param port_id
> + *   Port identifying the receive side.
> + * @param queue_id
> + *   The index of the transmit queue identifying the receive side.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param txq_data
> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> + * @return
> + *   The number of direct-rearmed buffers.
> + */
> +__rte_experimental
> +static __rte_always_inline uint16_t
> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
> +		struct rte_eth_txq_data *txq_data)
> +{
> +	uint16_t nb_rearm;
> +	struct rte_eth_fp_ops *p;
> +	void *qd;
> +
> +#ifdef RTE_ETHDEV_DEBUG_RX
> +	if (port_id >= RTE_MAX_ETHPORTS ||
> +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid port_id=%u or queue_id=%u\n",
> +			port_id, queue_id);
> +		return 0;
> +	}
> +#endif
> +
> +	p = &rte_eth_fp_ops[port_id];
> +	qd = p->rxq.data[queue_id];
> +
> +#ifdef RTE_ETHDEV_DEBUG_RX
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> +
> +	if (qd == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for port_id=%u\n",
> +			queue_id, port_id);
> +		return 0;
> +	}
> +
> +	if (!p->rx_direct_rearm)

This check should be done always (unconditionally).
it is not a mandatory function for the driver (it can safely skip to 
implement it).

> +		return -ENOTSUP;

This function returns uint16_t, why signed integers here?


> +#endif
> +
> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);

So rx_direct_rearm() function knows how to extract data from TX queue?
As I understand that is possible only in one case:
rx_direct_rearm() has full knowledge and acess of txq internals, etc.
That means that rxq and txq have to belong to the same driver and device 
type.

First of all, I still think it is not the best design choice.
If we going ahead with introducing this feature, it better be as generic 
as possible.
Plus it mixes TX and RX code-paths together,
while it would be much better to to keep them independent as they
are right now.

Another thing with such approach - even for the same PMD,
both TXQ and RXQ can have different internal data format and
behavior logic (depending on port/queue configuration).
So rx_direct_rearm() function selection have to be done
based on both RXQ and TXQ config.
So instead of rte_eth_tx_queue_data_get(), you'll probably need:
eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port, 
rx_queue, tx_port, tx_queue);
Then, it will be user responsibility to store it somewhere
and call periodically:

control_path:
	...
	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
		 txport, txqueue);
data-path:
	while(...) {
		rearm_func(rxport, txport, rxqueue, txqueue);
		rte_eth_rx_burst(rxport, rxqueue, ....);
		rte_eth_tx_burst(txport, txqueue, ....);
	}
	

In that case there seems absolutely no point to introduce
struct rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
directly anyway.

Another way - make rte_eth_txq_data totally opaque and allow PMD to
store there some data that will help it to distinguish expected TXQ
format. That will allow PMD to keep rx_direct_rearm() the same for
all supported TXQ formats (it will make decision internally based
on data stored in txq_data).
Though in that case you'll probably need one more dev-op to free 
txq_data.


> +
> +	return nb_rearm;
> +}
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
> index dcf8adab92..6a328e2481 100644
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -56,6 +56,9 @@ typedef int (*eth_rx_descriptor_status_t)(void *rxq, uint16_t offset);
>   /** @internal Check the status of a Tx descriptor */
>   typedef int (*eth_tx_descriptor_status_t)(void *txq, uint16_t offset);
>   
> +/** @internal Put Tx buffers into Rx sw-ring and rearm descs */
> +typedef uint16_t (*eth_rx_direct_rearm_t)(void *rxq, struct rte_eth_txq_data *txq_data);
> +
>   /**
>    * @internal
>    * Structure used to hold opaque pointers to internal ethdev Rx/Tx
> @@ -90,6 +93,8 @@ struct rte_eth_fp_ops {
>   	eth_rx_queue_count_t rx_queue_count;
>   	/** Check the status of a Rx descriptor. */
>   	eth_rx_descriptor_status_t rx_descriptor_status;
> +	/** Put Tx buffers into Rx sw-ring and rearm descs */
> +	eth_rx_direct_rearm_t rx_direct_rearm;
>   	/** Rx queues data. */
>   	struct rte_ethdev_qdata rxq;
>   	uintptr_t reserved1[3];
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 03f52fee91..69e4c376d3 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -285,6 +285,10 @@ EXPERIMENTAL {
>   	rte_mtr_color_in_protocol_priority_get;
>   	rte_mtr_color_in_protocol_set;
>   	rte_mtr_meter_vlan_table_update;
> +
> +	# added in 22.11
> +	rte_eth_tx_queue_data_get;
> +	rte_eth_rx_direct_rearm;
>   };
>   
>   INTERNAL {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-03  8:58   ` Konstantin Ananyev
@ 2022-10-11  7:19     ` Feifei Wang
  2022-10-11 22:21       ` Konstantin Ananyev
  0 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-10-11  7:19 UTC (permalink / raw)
  To: Konstantin Ananyev, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang, nd


> -----邮件原件-----
> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 发送时间: Monday, October 3, 2022 4:58 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: Re: [PATCH v2 1/3] ethdev: add API for direct rearm mode
> 
> 27/09/2022 03:47, Feifei Wang пишет:
> > Add API for enabling direct rearm mode and for mapping RX and TX
> > queues. Currently, the API supports 1:1(txq : rxq) mapping.
> >
> > Furthermore, to avoid Rx load Tx data directly, add API called
> > 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   lib/ethdev/ethdev_driver.h   |  9 ++++
> >   lib/ethdev/ethdev_private.c  |  1 +
> >   lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
> >   lib/ethdev/rte_ethdev.h      | 95
> ++++++++++++++++++++++++++++++++++++
> >   lib/ethdev/rte_ethdev_core.h |  5 ++
> >   lib/ethdev/version.map       |  4 ++
> >   6 files changed, 151 insertions(+)
> >
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 47a55a419e..14f52907c1 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -58,6 +58,8 @@ struct rte_eth_dev {
> >   	eth_rx_descriptor_status_t rx_descriptor_status;
> >   	/** Check the status of a Tx descriptor */
> >   	eth_tx_descriptor_status_t tx_descriptor_status;
> > +	/**  Use Tx mbufs for Rx to rearm */
> > +	eth_rx_direct_rearm_t rx_direct_rearm;
> >
> >   	/**
> >   	 * Device data that is shared between primary and secondary
> > processes @@ -486,6 +488,11 @@ typedef int
> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >   typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> >   				    uint16_t rx_queue_id);
> >
> > +/**< @internal Get Tx information of a transmit queue of an Ethernet
> > +device. */ typedef void (*eth_txq_data_get_t)(struct rte_eth_dev *dev,
> > +				      uint16_t tx_queue_id,
> > +				      struct rte_eth_txq_data *txq_data);
> > +
> >   /** @internal Release memory resources allocated by given Rx/Tx queue.
> */
> >   typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
> >   				    uint16_t queue_id);
> > @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
> >   	eth_rxq_info_get_t         rxq_info_get;
> >   	/** Retrieve Tx queue information */
> >   	eth_txq_info_get_t         txq_info_get;
> > +	/** Get the address where Tx data is stored */
> > +	eth_txq_data_get_t         txq_data_get;
> >   	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst
> mode */
> >   	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst
> mode */
> >   	eth_fw_version_get_t       fw_version_get; /**< Get firmware
> version */
> > diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
> > index 48090c879a..bfe16c7d77 100644
> > --- a/lib/ethdev/ethdev_private.c
> > +++ b/lib/ethdev/ethdev_private.c
> > @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops
> *fpo,
> >   	fpo->rx_queue_count = dev->rx_queue_count;
> >   	fpo->rx_descriptor_status = dev->rx_descriptor_status;
> >   	fpo->tx_descriptor_status = dev->tx_descriptor_status;
> > +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
> >
> >   	fpo->rxq.data = dev->data->rx_queues;
> >   	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 0c2c1088c0..0dccec2e4b 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   	return ret;
> >   }
> >
> > +int
> > +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> > +			struct rte_eth_txq_data *txq_data) {
> > +	struct rte_eth_dev *dev;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	if (queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
> queue_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (txq_data == NULL) {
> > +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx
> queue %u data to NULL\n",
> > +			port_id, queue_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dev->data->tx_queues == NULL ||
> > +			dev->data->tx_queues[queue_id] == NULL) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			   "Tx queue %"PRIu16" of device with port_id=%"
> > +			   PRIu16" has not been setup\n",
> > +			   queue_id, port_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (*dev->dev_ops->txq_data_get == NULL)
> > +		return -ENOTSUP;
> > +
> > +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
> > +
> > +	return 0;
> > +}
> > +
> >   static int
> >   rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> >   			     uint16_t n_seg, uint32_t *mbp_buf_size, diff --git
> > a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 2e783536c1..daf7f05d62 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
> >   	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
> >   } __rte_cache_min_aligned;
> >
> > +/**
> > + * @internal
> > + * Structure used to hold pointers to internal ethdev Tx data.
> > + * The main purpose is to load and store Tx queue data in direct rearm
> mode.
> > + */
> > +struct rte_eth_txq_data {
> > +	uint64_t *offloads;
> > +	void *tx_sw_ring;
> > +	volatile void *tx_ring;
> > +	uint16_t *tx_next_dd;
> > +	uint16_t *nb_tx_free;
> > +	uint16_t nb_tx_desc;
> > +	uint16_t tx_rs_thresh;
> > +	uint16_t tx_free_thresh;
> > +} __rte_cache_min_aligned;
> > +
> 
> first of all it is not clear why this struct has to be in public header, why it can't
> be in on of ethdev 'private' headers.
> Second it looks like a snippet from private txq fields for some Intel (and alike)
> PMDs (i40e, ice, etc.).
> How it supposed to to be universal and be applicable for any PMD that
> decides to implement this new API?
> 
> 
> >   /* Generic Burst mode flag definition, values can be ORed. */
> >
> >   /**
> > @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
> port_id, uint16_t queue_id,
> >   int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
> >   		const struct rte_eth_rxtx_callback *user_cb);
> >
> > +/**
> > + * Get the address which Tx data is stored.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *   The Tx queue on the Ethernet device for which information
> > + *   will be retrieved.
> > + * @param txq_data
> > + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> > + *
> > + * @return
> > + *   - 0: Success
> > + *   - -ENODEV:  If *port_id* is invalid.
> > + *   - -ENOTSUP: routine is not supported by the device PMD.
> > + *   - -EINVAL:  The queue_id is out of range.
> > + */
> > +__rte_experimental
> > +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> > +		struct rte_eth_txq_data *txq_data);
> > +
> >   /**
> >    * Retrieve information about given port's Rx queue.
> >    *
> > @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t
> queue_id,
> >   	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >   }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Put Tx buffers into Rx sw-ring and rearm descs.
> > + *
> > + * @param port_id
> > + *   Port identifying the receive side.
> > + * @param queue_id
> > + *   The index of the transmit queue identifying the receive side.
> > + *   The value must be in the range [0, nb_rx_queue - 1] previously
> supplied
> > + *   to rte_eth_dev_configure().
> > + * @param txq_data
> > + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> > + * @return
> > + *   The number of direct-rearmed buffers.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline uint16_t
> > +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
> > +		struct rte_eth_txq_data *txq_data)
> > +{
> > +	uint16_t nb_rearm;
> > +	struct rte_eth_fp_ops *p;
> > +	void *qd;
> > +
> > +#ifdef RTE_ETHDEV_DEBUG_RX
> > +	if (port_id >= RTE_MAX_ETHPORTS ||
> > +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid port_id=%u or queue_id=%u\n",
> > +			port_id, queue_id);
> > +		return 0;
> > +	}
> > +#endif
> > +
> > +	p = &rte_eth_fp_ops[port_id];
> > +	qd = p->rxq.data[queue_id];
> > +
> > +#ifdef RTE_ETHDEV_DEBUG_RX
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> > +
> > +	if (qd == NULL) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
> port_id=%u\n",
> > +			queue_id, port_id);
> > +		return 0;
> > +	}
> > +
> > +	if (!p->rx_direct_rearm)
> 
> This check should be done always (unconditionally).
> it is not a mandatory function for the driver (it can safely skip to implement it).
> 
> > +		return -ENOTSUP;
> 
> This function returns uint16_t, why signed integers here?
> 
> 
> > +#endif
> > +
> > +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
> 
> So rx_direct_rearm() function knows how to extract data from TX queue?
> As I understand that is possible only in one case:
> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
> That means that rxq and txq have to belong to the same driver and device
> type.
> 
Thanks for the comments, and I have some questions for this.

> First of all, I still think it is not the best design choice.
> If we going ahead with introducing this feature, it better be as generic as
> possible.
> Plus it mixes TX and RX code-paths together, while it would be much better
> to to keep them independent as they are right now.
> 
> Another thing with such approach - even for the same PMD, both TXQ and
> RXQ can have different internal data format and behavior logic (depending
> on port/queue configuration).
1. Here TXQ and RXQ have different internal format means the queue type 
and  descs can be different, right? If I understand correctly, based on your 
first strategy, is it means we will need different 'rearm_func' for different
queue type in the same PMD?

> So rx_direct_rearm() function selection have to be done based on both RXQ
> and TXQ config.
> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
> rx_queue, tx_port, tx_queue);
> Then, it will be user responsibility to store it somewhere and call periodically:
> 
> control_path:
> 	...
> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
> 		 txport, txqueue);
> data-path:
> 	while(...) {
> 		rearm_func(rxport, txport, rxqueue, txqueue);
> 		rte_eth_rx_burst(rxport, rxqueue, ....);
> 		rte_eth_tx_burst(txport, txqueue, ....);
> 	}
> 
> 
> In that case there seems absolutely no point to introduce struct
> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data directly
> anyway.
2. This is a very good proposal and it will be our first choice. Before working on it,
I have a few questions about how to implement 'rearm_func'. 
Like you say above, mixed Rx and Tx path code in 'rearm_func' means the hard-code
is mixed like:
rearm_func(...) {
     ...
    txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
    for (...) {
       rxep[i].mbuf = txep[i].mbuf;
       mb0 = txep[i].mbuf;
       paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
      dma_addr0 = vdupq_n_u64(paddr);
      vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
    }
}
Is my understanding is right?

> 
> Another way - make rte_eth_txq_data totally opaque and allow PMD to
> store there some data that will help it to distinguish expected TXQ format.
> That will allow PMD to keep rx_direct_rearm() the same for all supported
> TXQ formats (it will make decision internally based on data stored in
> txq_data).
> Though in that case you'll probably need one more dev-op to free txq_data.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
  2022-09-30 11:56   ` Jerin Jacob
@ 2022-10-11  7:28     ` Feifei Wang
  2022-10-11  9:38       ` Jerin Jacob
  0 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-10-11  7:28 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang, nd



> -----邮件原件-----
> 发件人: Jerin Jacob <jerinjacobk@gmail.com>
> 发送时间: Friday, September 30, 2022 7:56 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: Re: [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
> 
> On Tue, Sep 27, 2022 at 8:18 AM Feifei Wang <feifei.wang2@arm.com>
> wrote:
> >
> > Enable direct rearm mode in l3fwd. Users can use parameters:
> > '--direct-rearm=(rx_portid,rx_queueid,tx_portid,tx_queueid)'
> > to enable direct rearm.
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  examples/l3fwd/l3fwd.h     | 12 +++++
> >  examples/l3fwd/l3fwd_lpm.c | 22 +++++++++
> >  examples/l3fwd/main.c      | 94
> +++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 127 insertions(+), 1 deletion(-)
> >
> > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> > 40b5f32a9e..db097e344c 100644
> > --- a/examples/l3fwd/l3fwd.h
> > +++ b/examples/l3fwd/l3fwd.h
> > @@ -57,6 +57,10 @@
> >  #endif
> >  #define HASH_ENTRY_NUMBER_DEFAULT      16
> >
> > +/* MAX number of direct rearm mapping entry */
> > +#define MAX_DIRECT_REARM_ENTRY_NUMBER   16
> > +#define MAX_DIRECT_REARM_QUEUE_PER_PORT 8
> > +
> >  struct parm_cfg {
> >         const char *rule_ipv4_name;
> >         const char *rule_ipv6_name;
> > @@ -114,6 +118,14 @@ extern struct parm_cfg parm_config;
> >
> >  extern struct acl_algorithms acl_alg[];
> >
> > +/* Used in direct rearm mode */
> > +extern bool enabled_direct_rearm;
> > +extern uint8_t direct_rearm_entry_number; extern bool
> >
> +queue_enabled_direct_rearm[RTE_MAX_ETHPORTS][MAX_DIRECT_REAR
> M_QUEUE_P
> > +ER_PORT]; extern uint16_t
> >
> +direct_rearm_map_tx_port[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_
> QUEUE_PER
> > +_PORT]; extern uint16_t
> >
> +direct_rearm_map_tx_queue[RTE_MAX_ETHPORTS][MAX_DIRECT_REAR
> M_QUEUE_PE
> > +R_PORT]; extern uint8_t
> >
> +direct_rearm_entry_idx[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUE
> UE_PER_P
> > +ORT];
> > +
> >  /* Send burst of packets on an output interface */  static inline int
> > send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port) diff
> > --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index
> > 22d7f61a42..973fe70aae 100644
> > --- a/examples/l3fwd/l3fwd_lpm.c
> > +++ b/examples/l3fwd/l3fwd_lpm.c
> > @@ -150,6 +150,8 @@ lpm_main_loop(__rte_unused void *dummy)
> 
> > @@ -205,6 +221,12 @@ lpm_main_loop(__rte_unused void *dummy)
> >                 for (i = 0; i < n_rx_q; ++i) {
> >                         portid = qconf->rx_queue_list[i].port_id;
> >                         queueid = qconf->rx_queue_list[i].queue_id;
> > +
> > +                       if
> > + (queue_enabled_direct_rearm[portid][queueid]) {
> 
> IMO, We should not change fastpath code that impacts performance on
> other targets for a feature which has a very constraint use case. Also need
> for expressing [1], kind of defeat the purpose LPM table populated.
> 
> IMO, If we need to add a test case for this API, testpmd would be an ideal
> place with a dedicated fwd_engine just for this.
> 
> [1]
> > +               "  --direct-rearm (rx_port, rx_queue, tx_port, tx_queue): Put Tx
> queue sw-ring buffers into Rx queue\n"
Thanks very much for your comments. This patch here is mainly to facilitate verification test
and show how to use direct-rearm API.  

After 'direct-rearm' matures, we will consider to enable direct-rearm in testpmd or other
application in dpdk.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
  2022-10-11  7:28     ` 回复: " Feifei Wang
@ 2022-10-11  9:38       ` Jerin Jacob
  0 siblings, 0 replies; 17+ messages in thread
From: Jerin Jacob @ 2022-10-11  9:38 UTC (permalink / raw)
  To: Feifei Wang; +Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang

On Tue, Oct 11, 2022 at 12:58 PM Feifei Wang <Feifei.Wang2@arm.com> wrote:
>
>
>
> > -----邮件原件-----
> > 发件人: Jerin Jacob <jerinjacobk@gmail.com>
> > 发送时间: Friday, September 30, 2022 7:56 PM
> > 收件人: Feifei Wang <Feifei.Wang2@arm.com>
> > 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>
> > 主题: Re: [PATCH v2 3/3] examples/l3fwd: enable direct rearm mode
> >
> > On Tue, Sep 27, 2022 at 8:18 AM Feifei Wang <feifei.wang2@arm.com>
> > wrote:
> > >
> > > Enable direct rearm mode in l3fwd. Users can use parameters:
> > > '--direct-rearm=(rx_portid,rx_queueid,tx_portid,tx_queueid)'
> > > to enable direct rearm.
> > >
> > > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >  examples/l3fwd/l3fwd.h     | 12 +++++
> > >  examples/l3fwd/l3fwd_lpm.c | 22 +++++++++
> > >  examples/l3fwd/main.c      | 94
> > +++++++++++++++++++++++++++++++++++++-
> > >  3 files changed, 127 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index
> > > 40b5f32a9e..db097e344c 100644
> > > --- a/examples/l3fwd/l3fwd.h
> > > +++ b/examples/l3fwd/l3fwd.h
> > > @@ -57,6 +57,10 @@
> > >  #endif
> > >  #define HASH_ENTRY_NUMBER_DEFAULT      16
> > >
> > > +/* MAX number of direct rearm mapping entry */
> > > +#define MAX_DIRECT_REARM_ENTRY_NUMBER   16
> > > +#define MAX_DIRECT_REARM_QUEUE_PER_PORT 8
> > > +
> > >  struct parm_cfg {
> > >         const char *rule_ipv4_name;
> > >         const char *rule_ipv6_name;
> > > @@ -114,6 +118,14 @@ extern struct parm_cfg parm_config;
> > >
> > >  extern struct acl_algorithms acl_alg[];
> > >
> > > +/* Used in direct rearm mode */
> > > +extern bool enabled_direct_rearm;
> > > +extern uint8_t direct_rearm_entry_number; extern bool
> > >
> > +queue_enabled_direct_rearm[RTE_MAX_ETHPORTS][MAX_DIRECT_REAR
> > M_QUEUE_P
> > > +ER_PORT]; extern uint16_t
> > >
> > +direct_rearm_map_tx_port[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_
> > QUEUE_PER
> > > +_PORT]; extern uint16_t
> > >
> > +direct_rearm_map_tx_queue[RTE_MAX_ETHPORTS][MAX_DIRECT_REAR
> > M_QUEUE_PE
> > > +R_PORT]; extern uint8_t
> > >
> > +direct_rearm_entry_idx[RTE_MAX_ETHPORTS][MAX_DIRECT_REARM_QUE
> > UE_PER_P
> > > +ORT];
> > > +
> > >  /* Send burst of packets on an output interface */  static inline int
> > > send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port) diff
> > > --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index
> > > 22d7f61a42..973fe70aae 100644
> > > --- a/examples/l3fwd/l3fwd_lpm.c
> > > +++ b/examples/l3fwd/l3fwd_lpm.c
> > > @@ -150,6 +150,8 @@ lpm_main_loop(__rte_unused void *dummy)
> >
> > > @@ -205,6 +221,12 @@ lpm_main_loop(__rte_unused void *dummy)
> > >                 for (i = 0; i < n_rx_q; ++i) {
> > >                         portid = qconf->rx_queue_list[i].port_id;
> > >                         queueid = qconf->rx_queue_list[i].queue_id;
> > > +
> > > +                       if
> > > + (queue_enabled_direct_rearm[portid][queueid]) {
> >
> > IMO, We should not change fastpath code that impacts performance on
> > other targets for a feature which has a very constraint use case. Also need
> > for expressing [1], kind of defeat the purpose LPM table populated.
> >
> > IMO, If we need to add a test case for this API, testpmd would be an ideal
> > place with a dedicated fwd_engine just for this.
> >
> > [1]
> > > +               "  --direct-rearm (rx_port, rx_queue, tx_port, tx_queue): Put Tx
> > queue sw-ring buffers into Rx queue\n"
> Thanks very much for your comments. This patch here is mainly to facilitate verification test
> and show how to use direct-rearm API.

IMO, it should be other way around. for PMD and ethdev API
verification, Please use testpmd.

>
> After 'direct-rearm' matures, we will consider to enable direct-rearm in testpmd or other
> application in dpdk.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-11  7:19     ` 回复: " Feifei Wang
@ 2022-10-11 22:21       ` Konstantin Ananyev
  2022-10-12 13:38         ` 回复: " Feifei Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ananyev @ 2022-10-11 22:21 UTC (permalink / raw)
  To: Feifei Wang, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang



>>> Add API for enabling direct rearm mode and for mapping RX and TX
>>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
>>>
>>> Furthermore, to avoid Rx load Tx data directly, add API called
>>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
>>>
>>> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> ---
>>>    lib/ethdev/ethdev_driver.h   |  9 ++++
>>>    lib/ethdev/ethdev_private.c  |  1 +
>>>    lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
>>>    lib/ethdev/rte_ethdev.h      | 95
>> ++++++++++++++++++++++++++++++++++++
>>>    lib/ethdev/rte_ethdev_core.h |  5 ++
>>>    lib/ethdev/version.map       |  4 ++
>>>    6 files changed, 151 insertions(+)
>>>
>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>> index 47a55a419e..14f52907c1 100644
>>> --- a/lib/ethdev/ethdev_driver.h
>>> +++ b/lib/ethdev/ethdev_driver.h
>>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
>>>    	eth_rx_descriptor_status_t rx_descriptor_status;
>>>    	/** Check the status of a Tx descriptor */
>>>    	eth_tx_descriptor_status_t tx_descriptor_status;
>>> +	/**  Use Tx mbufs for Rx to rearm */
>>> +	eth_rx_direct_rearm_t rx_direct_rearm;
>>>
>>>    	/**
>>>    	 * Device data that is shared between primary and secondary
>>> processes @@ -486,6 +488,11 @@ typedef int
>> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>>>    typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
>>>    				    uint16_t rx_queue_id);
>>>
>>> +/**< @internal Get Tx information of a transmit queue of an Ethernet
>>> +device. */ typedef void (*eth_txq_data_get_t)(struct rte_eth_dev *dev,
>>> +				      uint16_t tx_queue_id,
>>> +				      struct rte_eth_txq_data *txq_data);
>>> +
>>>    /** @internal Release memory resources allocated by given Rx/Tx queue.
>> */
>>>    typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
>>>    				    uint16_t queue_id);
>>> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
>>>    	eth_rxq_info_get_t         rxq_info_get;
>>>    	/** Retrieve Tx queue information */
>>>    	eth_txq_info_get_t         txq_info_get;
>>> +	/** Get the address where Tx data is stored */
>>> +	eth_txq_data_get_t         txq_data_get;
>>>    	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst
>> mode */
>>>    	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst
>> mode */
>>>    	eth_fw_version_get_t       fw_version_get; /**< Get firmware
>> version */
>>> diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
>>> index 48090c879a..bfe16c7d77 100644
>>> --- a/lib/ethdev/ethdev_private.c
>>> +++ b/lib/ethdev/ethdev_private.c
>>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops
>> *fpo,
>>>    	fpo->rx_queue_count = dev->rx_queue_count;
>>>    	fpo->rx_descriptor_status = dev->rx_descriptor_status;
>>>    	fpo->tx_descriptor_status = dev->tx_descriptor_status;
>>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
>>>
>>>    	fpo->rxq.data = dev->data->rx_queues;
>>>    	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 0c2c1088c0..0dccec2e4b 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
>>>    	return ret;
>>>    }
>>>
>>> +int
>>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
>>> +			struct rte_eth_txq_data *txq_data) {
>>> +	struct rte_eth_dev *dev;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> +	dev = &rte_eth_devices[port_id];
>>> +
>>> +	if (queue_id >= dev->data->nb_tx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
>> queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (txq_data == NULL) {
>>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx
>> queue %u data to NULL\n",
>>> +			port_id, queue_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (dev->data->tx_queues == NULL ||
>>> +			dev->data->tx_queues[queue_id] == NULL) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			   "Tx queue %"PRIu16" of device with port_id=%"
>>> +			   PRIu16" has not been setup\n",
>>> +			   queue_id, port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (*dev->dev_ops->txq_data_get == NULL)
>>> +		return -ENOTSUP;
>>> +
>>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    static int
>>>    rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>>>    			     uint16_t n_seg, uint32_t *mbp_buf_size, diff --git
>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> 2e783536c1..daf7f05d62 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
>>>    	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
>>>    } __rte_cache_min_aligned;
>>>
>>> +/**
>>> + * @internal
>>> + * Structure used to hold pointers to internal ethdev Tx data.
>>> + * The main purpose is to load and store Tx queue data in direct rearm
>> mode.
>>> + */
>>> +struct rte_eth_txq_data {
>>> +	uint64_t *offloads;
>>> +	void *tx_sw_ring;
>>> +	volatile void *tx_ring;
>>> +	uint16_t *tx_next_dd;
>>> +	uint16_t *nb_tx_free;
>>> +	uint16_t nb_tx_desc;
>>> +	uint16_t tx_rs_thresh;
>>> +	uint16_t tx_free_thresh;
>>> +} __rte_cache_min_aligned;
>>> +
>>
>> first of all it is not clear why this struct has to be in public header, why it can't
>> be in on of ethdev 'private' headers.
>> Second it looks like a snippet from private txq fields for some Intel (and alike)
>> PMDs (i40e, ice, etc.).
>> How it supposed to to be universal and be applicable for any PMD that
>> decides to implement this new API?
>>
>>
>>>    /* Generic Burst mode flag definition, values can be ORed. */
>>>
>>>    /**
>>> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
>> port_id, uint16_t queue_id,
>>>    int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
>>>    		const struct rte_eth_rxtx_callback *user_cb);
>>>
>>> +/**
>>> + * Get the address which Tx data is stored.
>>> + *
>>> + * @param port_id
>>> + *   The port identifier of the Ethernet device.
>>> + * @param queue_id
>>> + *   The Tx queue on the Ethernet device for which information
>>> + *   will be retrieved.
>>> + * @param txq_data
>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>> + *
>>> + * @return
>>> + *   - 0: Success
>>> + *   - -ENODEV:  If *port_id* is invalid.
>>> + *   - -ENOTSUP: routine is not supported by the device PMD.
>>> + *   - -EINVAL:  The queue_id is out of range.
>>> + */
>>> +__rte_experimental
>>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
>>> +		struct rte_eth_txq_data *txq_data);
>>> +
>>>    /**
>>>     * Retrieve information about given port's Rx queue.
>>>     *
>>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t
>> queue_id,
>>>    	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>>>    }
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>>> +notice
>>> + *
>>> + * Put Tx buffers into Rx sw-ring and rearm descs.
>>> + *
>>> + * @param port_id
>>> + *   Port identifying the receive side.
>>> + * @param queue_id
>>> + *   The index of the transmit queue identifying the receive side.
>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>> supplied
>>> + *   to rte_eth_dev_configure().
>>> + * @param txq_data
>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>> + * @return
>>> + *   The number of direct-rearmed buffers.
>>> + */
>>> +__rte_experimental
>>> +static __rte_always_inline uint16_t
>>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
>>> +		struct rte_eth_txq_data *txq_data)
>>> +{
>>> +	uint16_t nb_rearm;
>>> +	struct rte_eth_fp_ops *p;
>>> +	void *qd;
>>> +
>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>> +	if (port_id >= RTE_MAX_ETHPORTS ||
>>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid port_id=%u or queue_id=%u\n",
>>> +			port_id, queue_id);
>>> +		return 0;
>>> +	}
>>> +#endif
>>> +
>>> +	p = &rte_eth_fp_ops[port_id];
>>> +	qd = p->rxq.data[queue_id];
>>> +
>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
>>> +
>>> +	if (qd == NULL) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
>> port_id=%u\n",
>>> +			queue_id, port_id);
>>> +		return 0;
>>> +	}
>>> +
>>> +	if (!p->rx_direct_rearm)
>>
>> This check should be done always (unconditionally).
>> it is not a mandatory function for the driver (it can safely skip to implement it).
>>
>>> +		return -ENOTSUP;
>>
>> This function returns uint16_t, why signed integers here?
>>
>>
>>> +#endif
>>> +
>>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
>>
>> So rx_direct_rearm() function knows how to extract data from TX queue?
>> As I understand that is possible only in one case:
>> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
>> That means that rxq and txq have to belong to the same driver and device
>> type.
>>
> Thanks for the comments, and I have some questions for this.
> 
>> First of all, I still think it is not the best design choice.
>> If we going ahead with introducing this feature, it better be as generic as
>> possible.
>> Plus it mixes TX and RX code-paths together, while it would be much better
>> to to keep them independent as they are right now.
>>
>> Another thing with such approach - even for the same PMD, both TXQ and
>> RXQ can have different internal data format and behavior logic (depending
>> on port/queue configuration).
> 1. Here TXQ and RXQ have different internal format means the queue type
> and  descs can be different, right? If I understand correctly, based on your
> first strategy, is it means we will need different 'rearm_func' for different
> queue type in the same PMD?

Yes, I think so.
If let say we have some PMD where depending on the config,
there cpuld be 2 different RXQ formats: rxq_a and rxq_b,
and 2 different txq formats: txq_c, txq_d.
Then assuming PMD would like to support direct-rearm mode for all four 
combinations, it needs 4 different rearm functions:

rearm_txq_c_to_rxq_a()
rearm_txq_c_to_rxq_b()
rearm_txq_d_to_rxq_a()
rearm_txq_d_to_rxq_b()


> 
>> So rx_direct_rearm() function selection have to be done based on both RXQ
>> and TXQ config.
>> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
>> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
>> rx_queue, tx_port, tx_queue);
>> Then, it will be user responsibility to store it somewhere and call periodically:
>>
>> control_path:
>> 	...
>> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
>> 		 txport, txqueue);
>> data-path:
>> 	while(...) {
>> 		rearm_func(rxport, txport, rxqueue, txqueue);
>> 		rte_eth_rx_burst(rxport, rxqueue, ....);
>> 		rte_eth_tx_burst(txport, txqueue, ....);
>> 	}
>>
>>
>> In that case there seems absolutely no point to introduce struct
>> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data directly
>> anyway.
> 2. This is a very good proposal and it will be our first choice. Before working on it,
> I have a few questions about how to implement 'rearm_func'.
> Like you say above, mixed Rx and Tx path code in 'rearm_func' means the hard-code
> is mixed like:
> rearm_func(...) {
>       ...
>      txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
>      for (...) {
>         rxep[i].mbuf = txep[i].mbuf;
>         mb0 = txep[i].mbuf;
>         paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
>        dma_addr0 = vdupq_n_u64(paddr);
>        vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
>      }
> }
> Is my understanding is right?


Sorry, I don't understand the question.
Can you probably elaborate a bit?

> 
>>
>> Another way - make rte_eth_txq_data totally opaque and allow PMD to
>> store there some data that will help it to distinguish expected TXQ format.
>> That will allow PMD to keep rx_direct_rearm() the same for all supported
>> TXQ formats (it will make decision internally based on data stored in
>> txq_data).
>> Though in that case you'll probably need one more dev-op to free txq_data.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-11 22:21       ` Konstantin Ananyev
@ 2022-10-12 13:38         ` Feifei Wang
  2022-10-13  9:48           ` Konstantin Ananyev
  0 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-10-12 13:38 UTC (permalink / raw)
  To: Konstantin Ananyev, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang, nd



> -----邮件原件-----
> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 发送时间: Wednesday, October 12, 2022 6:21 AM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: Re: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
> 
> 
> 
> >>> Add API for enabling direct rearm mode and for mapping RX and TX
> >>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
> >>>
> >>> Furthermore, to avoid Rx load Tx data directly, add API called
> >>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
> >>>
> >>> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>> ---
> >>>    lib/ethdev/ethdev_driver.h   |  9 ++++
> >>>    lib/ethdev/ethdev_private.c  |  1 +
> >>>    lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
> >>>    lib/ethdev/rte_ethdev.h      | 95
> >> ++++++++++++++++++++++++++++++++++++
> >>>    lib/ethdev/rte_ethdev_core.h |  5 ++
> >>>    lib/ethdev/version.map       |  4 ++
> >>>    6 files changed, 151 insertions(+)
> >>>
> >>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>> index 47a55a419e..14f52907c1 100644
> >>> --- a/lib/ethdev/ethdev_driver.h
> >>> +++ b/lib/ethdev/ethdev_driver.h
> >>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
> >>>    	eth_rx_descriptor_status_t rx_descriptor_status;
> >>>    	/** Check the status of a Tx descriptor */
> >>>    	eth_tx_descriptor_status_t tx_descriptor_status;
> >>> +	/**  Use Tx mbufs for Rx to rearm */
> >>> +	eth_rx_direct_rearm_t rx_direct_rearm;
> >>>
> >>>    	/**
> >>>    	 * Device data that is shared between primary and secondary
> >>> processes @@ -486,6 +488,11 @@ typedef int
> >> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >>>    typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> >>>    				    uint16_t rx_queue_id);
> >>>
> >>> +/**< @internal Get Tx information of a transmit queue of an
> >>> +Ethernet device. */ typedef void (*eth_txq_data_get_t)(struct
> rte_eth_dev *dev,
> >>> +				      uint16_t tx_queue_id,
> >>> +				      struct rte_eth_txq_data *txq_data);
> >>> +
> >>>    /** @internal Release memory resources allocated by given Rx/Tx
> queue.
> >> */
> >>>    typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
> >>>    				    uint16_t queue_id);
> >>> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
> >>>    	eth_rxq_info_get_t         rxq_info_get;
> >>>    	/** Retrieve Tx queue information */
> >>>    	eth_txq_info_get_t         txq_info_get;
> >>> +	/** Get the address where Tx data is stored */
> >>> +	eth_txq_data_get_t         txq_data_get;
> >>>    	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst
> >> mode */
> >>>    	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst
> >> mode */
> >>>    	eth_fw_version_get_t       fw_version_get; /**< Get firmware
> >> version */
> >>> diff --git a/lib/ethdev/ethdev_private.c
> >>> b/lib/ethdev/ethdev_private.c index 48090c879a..bfe16c7d77 100644
> >>> --- a/lib/ethdev/ethdev_private.c
> >>> +++ b/lib/ethdev/ethdev_private.c
> >>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops
> >> *fpo,
> >>>    	fpo->rx_queue_count = dev->rx_queue_count;
> >>>    	fpo->rx_descriptor_status = dev->rx_descriptor_status;
> >>>    	fpo->tx_descriptor_status = dev->tx_descriptor_status;
> >>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
> >>>
> >>>    	fpo->rxq.data = dev->data->rx_queues;
> >>>    	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 0c2c1088c0..0dccec2e4b 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >>>    	return ret;
> >>>    }
> >>>
> >>> +int
> >>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> >>> +			struct rte_eth_txq_data *txq_data) {
> >>> +	struct rte_eth_dev *dev;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +
> >>> +	if (queue_id >= dev->data->nb_tx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
> >> queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (txq_data == NULL) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx
> >> queue %u data to NULL\n",
> >>> +			port_id, queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (dev->data->tx_queues == NULL ||
> >>> +			dev->data->tx_queues[queue_id] == NULL) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			   "Tx queue %"PRIu16" of device with port_id=%"
> >>> +			   PRIu16" has not been setup\n",
> >>> +			   queue_id, port_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (*dev->dev_ops->txq_data_get == NULL)
> >>> +		return -ENOTSUP;
> >>> +
> >>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    static int
> >>>    rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> *rx_seg,
> >>>    			     uint16_t n_seg, uint32_t *mbp_buf_size, diff --git
> >>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> 2e783536c1..daf7f05d62 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
> >>>    	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
> >>>    } __rte_cache_min_aligned;
> >>>
> >>> +/**
> >>> + * @internal
> >>> + * Structure used to hold pointers to internal ethdev Tx data.
> >>> + * The main purpose is to load and store Tx queue data in direct
> >>> +rearm
> >> mode.
> >>> + */
> >>> +struct rte_eth_txq_data {
> >>> +	uint64_t *offloads;
> >>> +	void *tx_sw_ring;
> >>> +	volatile void *tx_ring;
> >>> +	uint16_t *tx_next_dd;
> >>> +	uint16_t *nb_tx_free;
> >>> +	uint16_t nb_tx_desc;
> >>> +	uint16_t tx_rs_thresh;
> >>> +	uint16_t tx_free_thresh;
> >>> +} __rte_cache_min_aligned;
> >>> +
> >>
> >> first of all it is not clear why this struct has to be in public
> >> header, why it can't be in on of ethdev 'private' headers.
> >> Second it looks like a snippet from private txq fields for some Intel
> >> (and alike) PMDs (i40e, ice, etc.).
> >> How it supposed to to be universal and be applicable for any PMD that
> >> decides to implement this new API?
> >>
> >>
> >>>    /* Generic Burst mode flag definition, values can be ORed. */
> >>>
> >>>    /**
> >>> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
> >> port_id, uint16_t queue_id,
> >>>    int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
> >>>    		const struct rte_eth_rxtx_callback *user_cb);
> >>>
> >>> +/**
> >>> + * Get the address which Tx data is stored.
> >>> + *
> >>> + * @param port_id
> >>> + *   The port identifier of the Ethernet device.
> >>> + * @param queue_id
> >>> + *   The Tx queue on the Ethernet device for which information
> >>> + *   will be retrieved.
> >>> + * @param txq_data
> >>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> >>> + *
> >>> + * @return
> >>> + *   - 0: Success
> >>> + *   - -ENODEV:  If *port_id* is invalid.
> >>> + *   - -ENOTSUP: routine is not supported by the device PMD.
> >>> + *   - -EINVAL:  The queue_id is out of range.
> >>> + */
> >>> +__rte_experimental
> >>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> >>> +		struct rte_eth_txq_data *txq_data);
> >>> +
> >>>    /**
> >>>     * Retrieve information about given port's Rx queue.
> >>>     *
> >>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t
> >> queue_id,
> >>>    	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >>>    }
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change, or be removed, without
> >>> +prior notice
> >>> + *
> >>> + * Put Tx buffers into Rx sw-ring and rearm descs.
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifying the receive side.
> >>> + * @param queue_id
> >>> + *   The index of the transmit queue identifying the receive side.
> >>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> >> supplied
> >>> + *   to rte_eth_dev_configure().
> >>> + * @param txq_data
> >>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> >>> + * @return
> >>> + *   The number of direct-rearmed buffers.
> >>> + */
> >>> +__rte_experimental
> >>> +static __rte_always_inline uint16_t
> >>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
> >>> +		struct rte_eth_txq_data *txq_data) {
> >>> +	uint16_t nb_rearm;
> >>> +	struct rte_eth_fp_ops *p;
> >>> +	void *qd;
> >>> +
> >>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>> +	if (port_id >= RTE_MAX_ETHPORTS ||
> >>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid port_id=%u or queue_id=%u\n",
> >>> +			port_id, queue_id);
> >>> +		return 0;
> >>> +	}
> >>> +#endif
> >>> +
> >>> +	p = &rte_eth_fp_ops[port_id];
> >>> +	qd = p->rxq.data[queue_id];
> >>> +
> >>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> >>> +
> >>> +	if (qd == NULL) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
> >> port_id=%u\n",
> >>> +			queue_id, port_id);
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	if (!p->rx_direct_rearm)
> >>
> >> This check should be done always (unconditionally).
> >> it is not a mandatory function for the driver (it can safely skip to
> implement it).
> >>
> >>> +		return -ENOTSUP;
> >>
> >> This function returns uint16_t, why signed integers here?
> >>
> >>
> >>> +#endif
> >>> +
> >>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
> >>
> >> So rx_direct_rearm() function knows how to extract data from TX queue?
> >> As I understand that is possible only in one case:
> >> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
> >> That means that rxq and txq have to belong to the same driver and
> >> device type.
> >>
> > Thanks for the comments, and I have some questions for this.
> >
> >> First of all, I still think it is not the best design choice.
> >> If we going ahead with introducing this feature, it better be as
> >> generic as possible.
> >> Plus it mixes TX and RX code-paths together, while it would be much
> >> better to to keep them independent as they are right now.
> >>
> >> Another thing with such approach - even for the same PMD, both TXQ
> >> and RXQ can have different internal data format and behavior logic
> >> (depending on port/queue configuration).
> > 1. Here TXQ and RXQ have different internal format means the queue
> > type and  descs can be different, right? If I understand correctly,
> > based on your first strategy, is it means we will need different
> > 'rearm_func' for different queue type in the same PMD?
> 
> Yes, I think so.
> If let say we have some PMD where depending on the config, there cpuld be
> 2 different RXQ formats: rxq_a and rxq_b, and 2 different txq formats: txq_c,
> txq_d.
> Then assuming PMD would like to support direct-rearm mode for all four
> combinations, it needs 4 different rearm functions:
> 
> rearm_txq_c_to_rxq_a()
> rearm_txq_c_to_rxq_b()
> rearm_txq_d_to_rxq_a()
> rearm_txq_d_to_rxq_b()
> 
Thank you for your detailed explanation, I can understand this.
> 
> >
> >> So rx_direct_rearm() function selection have to be done based on both
> >> RXQ and TXQ config.
> >> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
> >> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
> >> rx_queue, tx_port, tx_queue);
> >> Then, it will be user responsibility to store it somewhere and call
> periodically:
> >>
> >> control_path:
> >> 	...
> >> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
> >> 		 txport, txqueue);
> >> data-path:
> >> 	while(...) {
> >> 		rearm_func(rxport, txport, rxqueue, txqueue);
> >> 		rte_eth_rx_burst(rxport, rxqueue, ....);
> >> 		rte_eth_tx_burst(txport, txqueue, ....);
> >> 	}
> >>
> >>
> >> In that case there seems absolutely no point to introduce struct
> >> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
> >> directly anyway.
> > 2. This is a very good proposal and it will be our first choice.
> > Before working on it, I have a few questions about how to implement
> 'rearm_func'.
> > Like you say above, mixed Rx and Tx path code in 'rearm_func' means
> > the hard-code is mixed like:
> > rearm_func(...) {
> >       ...
> >      txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
> >      for (...) {
> >         rxep[i].mbuf = txep[i].mbuf;
> >         mb0 = txep[i].mbuf;
> >         paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
> >        dma_addr0 = vdupq_n_u64(paddr);
> >        vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
> >      }
> > }
> > Is my understanding is right?
> 
> 
> Sorry, I don't understand the question.
> Can you probably elaborate a bit?

Sorry for my unclear expression.

I mean if we need two func which contains tx and rx paths code respectively in rearm_func, like:
rearm_func(...) {
         rte_tx_fill_sw_ring;
         rte_rx_rearm_descs;
}

Or just mixed tx and rx path code like I said before. I prefer 'rx and tx hard code mixed', 
because from the performance perspective, this can reduce the cost of function calls.
> 
> >
> >>
> >> Another way - make rte_eth_txq_data totally opaque and allow PMD to
> >> store there some data that will help it to distinguish expected TXQ format.
> >> That will allow PMD to keep rx_direct_rearm() the same for all
> >> supported TXQ formats (it will make decision internally based on data
> >> stored in txq_data).
> >> Though in that case you'll probably need one more dev-op to free
> txq_data.
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-12 13:38         ` 回复: " Feifei Wang
@ 2022-10-13  9:48           ` Konstantin Ananyev
  2022-10-26  7:35             ` 回复: " Feifei Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ananyev @ 2022-10-13  9:48 UTC (permalink / raw)
  To: Feifei Wang, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang

12/10/2022 14:38, Feifei Wang пишет:
> 
> 
>> -----邮件原件-----
>> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>> 发送时间: Wednesday, October 12, 2022 6:21 AM
>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
>> Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
>> <Ruifeng.Wang@arm.com>
>> 主题: Re: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
>>
>>
>>
>>>>> Add API for enabling direct rearm mode and for mapping RX and TX
>>>>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
>>>>>
>>>>> Furthermore, to avoid Rx load Tx data directly, add API called
>>>>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
>>>>>
>>>>> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>>>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>>>> ---
>>>>>     lib/ethdev/ethdev_driver.h   |  9 ++++
>>>>>     lib/ethdev/ethdev_private.c  |  1 +
>>>>>     lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
>>>>>     lib/ethdev/rte_ethdev.h      | 95
>>>> ++++++++++++++++++++++++++++++++++++
>>>>>     lib/ethdev/rte_ethdev_core.h |  5 ++
>>>>>     lib/ethdev/version.map       |  4 ++
>>>>>     6 files changed, 151 insertions(+)
>>>>>
>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>> index 47a55a419e..14f52907c1 100644
>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
>>>>>     	eth_rx_descriptor_status_t rx_descriptor_status;
>>>>>     	/** Check the status of a Tx descriptor */
>>>>>     	eth_tx_descriptor_status_t tx_descriptor_status;
>>>>> +	/**  Use Tx mbufs for Rx to rearm */
>>>>> +	eth_rx_direct_rearm_t rx_direct_rearm;
>>>>>
>>>>>     	/**
>>>>>     	 * Device data that is shared between primary and secondary
>>>>> processes @@ -486,6 +488,11 @@ typedef int
>>>> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>>>>>     typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
>>>>>     				    uint16_t rx_queue_id);
>>>>>
>>>>> +/**< @internal Get Tx information of a transmit queue of an
>>>>> +Ethernet device. */ typedef void (*eth_txq_data_get_t)(struct
>> rte_eth_dev *dev,
>>>>> +				      uint16_t tx_queue_id,
>>>>> +				      struct rte_eth_txq_data *txq_data);
>>>>> +
>>>>>     /** @internal Release memory resources allocated by given Rx/Tx
>> queue.
>>>> */
>>>>>     typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
>>>>>     				    uint16_t queue_id);
>>>>> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
>>>>>     	eth_rxq_info_get_t         rxq_info_get;
>>>>>     	/** Retrieve Tx queue information */
>>>>>     	eth_txq_info_get_t         txq_info_get;
>>>>> +	/** Get the address where Tx data is stored */
>>>>> +	eth_txq_data_get_t         txq_data_get;
>>>>>     	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx burst
>>>> mode */
>>>>>     	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx burst
>>>> mode */
>>>>>     	eth_fw_version_get_t       fw_version_get; /**< Get firmware
>>>> version */
>>>>> diff --git a/lib/ethdev/ethdev_private.c
>>>>> b/lib/ethdev/ethdev_private.c index 48090c879a..bfe16c7d77 100644
>>>>> --- a/lib/ethdev/ethdev_private.c
>>>>> +++ b/lib/ethdev/ethdev_private.c
>>>>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops
>>>> *fpo,
>>>>>     	fpo->rx_queue_count = dev->rx_queue_count;
>>>>>     	fpo->rx_descriptor_status = dev->rx_descriptor_status;
>>>>>     	fpo->tx_descriptor_status = dev->tx_descriptor_status;
>>>>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
>>>>>
>>>>>     	fpo->rxq.data = dev->data->rx_queues;
>>>>>     	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>>>> 0c2c1088c0..0dccec2e4b 100644
>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
>>>>>     	return ret;
>>>>>     }
>>>>>
>>>>> +int
>>>>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
>>>>> +			struct rte_eth_txq_data *txq_data) {
>>>>> +	struct rte_eth_dev *dev;
>>>>> +
>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>> +	dev = &rte_eth_devices[port_id];
>>>>> +
>>>>> +	if (queue_id >= dev->data->nb_tx_queues) {
>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
>>>> queue_id);
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	if (txq_data == NULL) {
>>>>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u Tx
>>>> queue %u data to NULL\n",
>>>>> +			port_id, queue_id);
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	if (dev->data->tx_queues == NULL ||
>>>>> +			dev->data->tx_queues[queue_id] == NULL) {
>>>>> +		RTE_ETHDEV_LOG(ERR,
>>>>> +			   "Tx queue %"PRIu16" of device with port_id=%"
>>>>> +			   PRIu16" has not been setup\n",
>>>>> +			   queue_id, port_id);
>>>>> +		return -EINVAL;
>>>>> +	}
>>>>> +
>>>>> +	if (*dev->dev_ops->txq_data_get == NULL)
>>>>> +		return -ENOTSUP;
>>>>> +
>>>>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>>     static int
>>>>>     rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
>> *rx_seg,
>>>>>     			     uint16_t n_seg, uint32_t *mbp_buf_size, diff --git
>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>>> 2e783536c1..daf7f05d62 100644
>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
>>>>>     	uint8_t queue_state;        /**< one of RTE_ETH_QUEUE_STATE_*. */
>>>>>     } __rte_cache_min_aligned;
>>>>>
>>>>> +/**
>>>>> + * @internal
>>>>> + * Structure used to hold pointers to internal ethdev Tx data.
>>>>> + * The main purpose is to load and store Tx queue data in direct
>>>>> +rearm
>>>> mode.
>>>>> + */
>>>>> +struct rte_eth_txq_data {
>>>>> +	uint64_t *offloads;
>>>>> +	void *tx_sw_ring;
>>>>> +	volatile void *tx_ring;
>>>>> +	uint16_t *tx_next_dd;
>>>>> +	uint16_t *nb_tx_free;
>>>>> +	uint16_t nb_tx_desc;
>>>>> +	uint16_t tx_rs_thresh;
>>>>> +	uint16_t tx_free_thresh;
>>>>> +} __rte_cache_min_aligned;
>>>>> +
>>>>
>>>> first of all it is not clear why this struct has to be in public
>>>> header, why it can't be in on of ethdev 'private' headers.
>>>> Second it looks like a snippet from private txq fields for some Intel
>>>> (and alike) PMDs (i40e, ice, etc.).
>>>> How it supposed to to be universal and be applicable for any PMD that
>>>> decides to implement this new API?
>>>>
>>>>
>>>>>     /* Generic Burst mode flag definition, values can be ORed. */
>>>>>
>>>>>     /**
>>>>> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
>>>> port_id, uint16_t queue_id,
>>>>>     int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t queue_id,
>>>>>     		const struct rte_eth_rxtx_callback *user_cb);
>>>>>
>>>>> +/**
>>>>> + * Get the address which Tx data is stored.
>>>>> + *
>>>>> + * @param port_id
>>>>> + *   The port identifier of the Ethernet device.
>>>>> + * @param queue_id
>>>>> + *   The Tx queue on the Ethernet device for which information
>>>>> + *   will be retrieved.
>>>>> + * @param txq_data
>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>>>> + *
>>>>> + * @return
>>>>> + *   - 0: Success
>>>>> + *   - -ENODEV:  If *port_id* is invalid.
>>>>> + *   - -ENOTSUP: routine is not supported by the device PMD.
>>>>> + *   - -EINVAL:  The queue_id is out of range.
>>>>> + */
>>>>> +__rte_experimental
>>>>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
>>>>> +		struct rte_eth_txq_data *txq_data);
>>>>> +
>>>>>     /**
>>>>>      * Retrieve information about given port's Rx queue.
>>>>>      *
>>>>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t
>>>> queue_id,
>>>>>     	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>>>>>     }
>>>>>
>>>>> +/**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without
>>>>> +prior notice
>>>>> + *
>>>>> + * Put Tx buffers into Rx sw-ring and rearm descs.
>>>>> + *
>>>>> + * @param port_id
>>>>> + *   Port identifying the receive side.
>>>>> + * @param queue_id
>>>>> + *   The index of the transmit queue identifying the receive side.
>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>>>> supplied
>>>>> + *   to rte_eth_dev_configure().
>>>>> + * @param txq_data
>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>>>> + * @return
>>>>> + *   The number of direct-rearmed buffers.
>>>>> + */
>>>>> +__rte_experimental
>>>>> +static __rte_always_inline uint16_t
>>>>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
>>>>> +		struct rte_eth_txq_data *txq_data) {
>>>>> +	uint16_t nb_rearm;
>>>>> +	struct rte_eth_fp_ops *p;
>>>>> +	void *qd;
>>>>> +
>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>>>> +	if (port_id >= RTE_MAX_ETHPORTS ||
>>>>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
>>>>> +		RTE_ETHDEV_LOG(ERR,
>>>>> +			"Invalid port_id=%u or queue_id=%u\n",
>>>>> +			port_id, queue_id);
>>>>> +		return 0;
>>>>> +	}
>>>>> +#endif
>>>>> +
>>>>> +	p = &rte_eth_fp_ops[port_id];
>>>>> +	qd = p->rxq.data[queue_id];
>>>>> +
>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
>>>>> +
>>>>> +	if (qd == NULL) {
>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
>>>> port_id=%u\n",
>>>>> +			queue_id, port_id);
>>>>> +		return 0;
>>>>> +	}
>>>>> +
>>>>> +	if (!p->rx_direct_rearm)
>>>>
>>>> This check should be done always (unconditionally).
>>>> it is not a mandatory function for the driver (it can safely skip to
>> implement it).
>>>>
>>>>> +		return -ENOTSUP;
>>>>
>>>> This function returns uint16_t, why signed integers here?
>>>>
>>>>
>>>>> +#endif
>>>>> +
>>>>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
>>>>
>>>> So rx_direct_rearm() function knows how to extract data from TX queue?
>>>> As I understand that is possible only in one case:
>>>> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
>>>> That means that rxq and txq have to belong to the same driver and
>>>> device type.
>>>>
>>> Thanks for the comments, and I have some questions for this.
>>>
>>>> First of all, I still think it is not the best design choice.
>>>> If we going ahead with introducing this feature, it better be as
>>>> generic as possible.
>>>> Plus it mixes TX and RX code-paths together, while it would be much
>>>> better to to keep them independent as they are right now.
>>>>
>>>> Another thing with such approach - even for the same PMD, both TXQ
>>>> and RXQ can have different internal data format and behavior logic
>>>> (depending on port/queue configuration).
>>> 1. Here TXQ and RXQ have different internal format means the queue
>>> type and  descs can be different, right? If I understand correctly,
>>> based on your first strategy, is it means we will need different
>>> 'rearm_func' for different queue type in the same PMD?
>>
>> Yes, I think so.
>> If let say we have some PMD where depending on the config, there cpuld be
>> 2 different RXQ formats: rxq_a and rxq_b, and 2 different txq formats: txq_c,
>> txq_d.
>> Then assuming PMD would like to support direct-rearm mode for all four
>> combinations, it needs 4 different rearm functions:
>>
>> rearm_txq_c_to_rxq_a()
>> rearm_txq_c_to_rxq_b()
>> rearm_txq_d_to_rxq_a()
>> rearm_txq_d_to_rxq_b()
>>
> Thank you for your detailed explanation, I can understand this.
>>
>>>
>>>> So rx_direct_rearm() function selection have to be done based on both
>>>> RXQ and TXQ config.
>>>> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
>>>> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
>>>> rx_queue, tx_port, tx_queue);
>>>> Then, it will be user responsibility to store it somewhere and call
>> periodically:
>>>>
>>>> control_path:
>>>> 	...
>>>> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
>>>> 		 txport, txqueue);
>>>> data-path:
>>>> 	while(...) {
>>>> 		rearm_func(rxport, txport, rxqueue, txqueue);
>>>> 		rte_eth_rx_burst(rxport, rxqueue, ....);
>>>> 		rte_eth_tx_burst(txport, txqueue, ....);
>>>> 	}
>>>>
>>>>
>>>> In that case there seems absolutely no point to introduce struct
>>>> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
>>>> directly anyway.
>>> 2. This is a very good proposal and it will be our first choice.
>>> Before working on it, I have a few questions about how to implement
>> 'rearm_func'.
>>> Like you say above, mixed Rx and Tx path code in 'rearm_func' means
>>> the hard-code is mixed like:
>>> rearm_func(...) {
>>>        ...
>>>       txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
>>>       for (...) {
>>>          rxep[i].mbuf = txep[i].mbuf;
>>>          mb0 = txep[i].mbuf;
>>>          paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
>>>         dma_addr0 = vdupq_n_u64(paddr);
>>>         vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
>>>       }
>>> }
>>> Is my understanding is right?
>>
>>
>> Sorry, I don't understand the question.
>> Can you probably elaborate a bit?
> 
> Sorry for my unclear expression.
> 
> I mean if we need two func which contains tx and rx paths code respectively in rearm_func, like:
> rearm_func(...) {
>           rte_tx_fill_sw_ring;
>           rte_rx_rearm_descs;
> }
> 
> Or just mixed tx and rx path code like I said before. I prefer 'rx and tx hard code mixed',
> because from the performance perspective, this can reduce the cost of function calls.

I suppose it depends on what we choose:
If we restrict it in a way that rxq and txq have to belong to
the same PMD, then I suppose this decision could be left to each
particular PMD.
If we'd like to allow rearm to work accross different PMDs
(i.e. it would allow to rearm mlx with ice and visa-versa),
then yes we need PMDs somehow to expose rx_sw_ring abstraction
to each other.
My preference would be the second one - as it will make this feature
more flexible and would help to adopt it more widely.
Though the first one is probably easier to implement,
and as I udnerstand you are leaning towards the first one.

Konstantin



>>
>>>
>>>>
>>>> Another way - make rte_eth_txq_data totally opaque and allow PMD to
>>>> store there some data that will help it to distinguish expected TXQ format.
>>>> That will allow PMD to keep rx_direct_rearm() the same for all
>>>> supported TXQ formats (it will make decision internally based on data
>>>> stored in txq_data).
>>>> Though in that case you'll probably need one more dev-op to free
>> txq_data.
>>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-13  9:48           ` Konstantin Ananyev
@ 2022-10-26  7:35             ` Feifei Wang
  2022-10-31 16:36               ` Konstantin Ananyev
  0 siblings, 1 reply; 17+ messages in thread
From: Feifei Wang @ 2022-10-26  7:35 UTC (permalink / raw)
  To: Konstantin Ananyev, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang, nd



> -----邮件原件-----
> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 发送时间: Thursday, October 13, 2022 5:49 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: Re: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
> 
> 12/10/2022 14:38, Feifei Wang пишет:
> >
> >
> >> -----邮件原件-----
> >> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> >> 发送时间: Wednesday, October 12, 2022 6:21 AM
> >> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh
> >> Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
> >> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> >> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> >> 主题: Re: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
> >>
> >>
> >>
> >>>>> Add API for enabling direct rearm mode and for mapping RX and TX
> >>>>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
> >>>>>
> >>>>> Furthermore, to avoid Rx load Tx data directly, add API called
> >>>>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
> >>>>>
> >>>>> Suggested-by: Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>
> >>>>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>>>> Reviewed-by: Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>
> >>>>> ---
> >>>>>     lib/ethdev/ethdev_driver.h   |  9 ++++
> >>>>>     lib/ethdev/ethdev_private.c  |  1 +
> >>>>>     lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
> >>>>>     lib/ethdev/rte_ethdev.h      | 95
> >>>> ++++++++++++++++++++++++++++++++++++
> >>>>>     lib/ethdev/rte_ethdev_core.h |  5 ++
> >>>>>     lib/ethdev/version.map       |  4 ++
> >>>>>     6 files changed, 151 insertions(+)
> >>>>>
> >>>>> diff --git a/lib/ethdev/ethdev_driver.h
> >>>>> b/lib/ethdev/ethdev_driver.h index 47a55a419e..14f52907c1 100644
> >>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
> >>>>>     	eth_rx_descriptor_status_t rx_descriptor_status;
> >>>>>     	/** Check the status of a Tx descriptor */
> >>>>>     	eth_tx_descriptor_status_t tx_descriptor_status;
> >>>>> +	/**  Use Tx mbufs for Rx to rearm */
> >>>>> +	eth_rx_direct_rearm_t rx_direct_rearm;
> >>>>>
> >>>>>     	/**
> >>>>>     	 * Device data that is shared between primary and secondary
> >>>>> processes @@ -486,6 +488,11 @@ typedef int
> >>>> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >>>>>     typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> >>>>>     				    uint16_t rx_queue_id);
> >>>>>
> >>>>> +/**< @internal Get Tx information of a transmit queue of an
> >>>>> +Ethernet device. */ typedef void (*eth_txq_data_get_t)(struct
> >> rte_eth_dev *dev,
> >>>>> +				      uint16_t tx_queue_id,
> >>>>> +				      struct rte_eth_txq_data
> *txq_data);
> >>>>> +
> >>>>>     /** @internal Release memory resources allocated by given
> >>>>> Rx/Tx
> >> queue.
> >>>> */
> >>>>>     typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
> >>>>>     				    uint16_t queue_id);
> >>>>> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
> >>>>>     	eth_rxq_info_get_t         rxq_info_get;
> >>>>>     	/** Retrieve Tx queue information */
> >>>>>     	eth_txq_info_get_t         txq_info_get;
> >>>>> +	/** Get the address where Tx data is stored */
> >>>>> +	eth_txq_data_get_t         txq_data_get;
> >>>>>     	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx
> burst
> >>>> mode */
> >>>>>     	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx
> burst
> >>>> mode */
> >>>>>     	eth_fw_version_get_t       fw_version_get; /**< Get
> firmware
> >>>> version */
> >>>>> diff --git a/lib/ethdev/ethdev_private.c
> >>>>> b/lib/ethdev/ethdev_private.c index 48090c879a..bfe16c7d77 100644
> >>>>> --- a/lib/ethdev/ethdev_private.c
> >>>>> +++ b/lib/ethdev/ethdev_private.c
> >>>>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct
> rte_eth_fp_ops
> >>>> *fpo,
> >>>>>     	fpo->rx_queue_count = dev->rx_queue_count;
> >>>>>     	fpo->rx_descriptor_status = dev->rx_descriptor_status;
> >>>>>     	fpo->tx_descriptor_status = dev->tx_descriptor_status;
> >>>>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
> >>>>>
> >>>>>     	fpo->rxq.data = dev->data->rx_queues;
> >>>>>     	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
> >>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>>> index 0c2c1088c0..0dccec2e4b 100644
> >>>>> --- a/lib/ethdev/rte_ethdev.c
> >>>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >>>>>     	return ret;
> >>>>>     }
> >>>>>
> >>>>> +int
> >>>>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> >>>>> +			struct rte_eth_txq_data *txq_data) {
> >>>>> +	struct rte_eth_dev *dev;
> >>>>> +
> >>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>>>> +	dev = &rte_eth_devices[port_id];
> >>>>> +
> >>>>> +	if (queue_id >= dev->data->nb_tx_queues) {
> >>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
> >>>> queue_id);
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (txq_data == NULL) {
> >>>>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u
> Tx
> >>>> queue %u data to NULL\n",
> >>>>> +			port_id, queue_id);
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (dev->data->tx_queues == NULL ||
> >>>>> +			dev->data->tx_queues[queue_id] == NULL) {
> >>>>> +		RTE_ETHDEV_LOG(ERR,
> >>>>> +			   "Tx queue %"PRIu16" of device with
> port_id=%"
> >>>>> +			   PRIu16" has not been setup\n",
> >>>>> +			   queue_id, port_id);
> >>>>> +		return -EINVAL;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (*dev->dev_ops->txq_data_get == NULL)
> >>>>> +		return -ENOTSUP;
> >>>>> +
> >>>>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
> >>>>> +
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>>     static int
> >>>>>     rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> >> *rx_seg,
> >>>>>     			     uint16_t n_seg, uint32_t *mbp_buf_size,
> diff --git
> >>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>>>> 2e783536c1..daf7f05d62 100644
> >>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
> >>>>>     	uint8_t queue_state;        /**< one of
> RTE_ETH_QUEUE_STATE_*. */
> >>>>>     } __rte_cache_min_aligned;
> >>>>>
> >>>>> +/**
> >>>>> + * @internal
> >>>>> + * Structure used to hold pointers to internal ethdev Tx data.
> >>>>> + * The main purpose is to load and store Tx queue data in direct
> >>>>> +rearm
> >>>> mode.
> >>>>> + */
> >>>>> +struct rte_eth_txq_data {
> >>>>> +	uint64_t *offloads;
> >>>>> +	void *tx_sw_ring;
> >>>>> +	volatile void *tx_ring;
> >>>>> +	uint16_t *tx_next_dd;
> >>>>> +	uint16_t *nb_tx_free;
> >>>>> +	uint16_t nb_tx_desc;
> >>>>> +	uint16_t tx_rs_thresh;
> >>>>> +	uint16_t tx_free_thresh;
> >>>>> +} __rte_cache_min_aligned;
> >>>>> +
> >>>>
> >>>> first of all it is not clear why this struct has to be in public
> >>>> header, why it can't be in on of ethdev 'private' headers.
> >>>> Second it looks like a snippet from private txq fields for some
> >>>> Intel (and alike) PMDs (i40e, ice, etc.).
> >>>> How it supposed to to be universal and be applicable for any PMD
> >>>> that decides to implement this new API?
> >>>>
> >>>>
> >>>>>     /* Generic Burst mode flag definition, values can be ORed. */
> >>>>>
> >>>>>     /**
> >>>>> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
> >>>> port_id, uint16_t queue_id,
> >>>>>     int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t
> queue_id,
> >>>>>     		const struct rte_eth_rxtx_callback *user_cb);
> >>>>>
> >>>>> +/**
> >>>>> + * Get the address which Tx data is stored.
> >>>>> + *
> >>>>> + * @param port_id
> >>>>> + *   The port identifier of the Ethernet device.
> >>>>> + * @param queue_id
> >>>>> + *   The Tx queue on the Ethernet device for which information
> >>>>> + *   will be retrieved.
> >>>>> + * @param txq_data
> >>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> >>>>> + *
> >>>>> + * @return
> >>>>> + *   - 0: Success
> >>>>> + *   - -ENODEV:  If *port_id* is invalid.
> >>>>> + *   - -ENOTSUP: routine is not supported by the device PMD.
> >>>>> + *   - -EINVAL:  The queue_id is out of range.
> >>>>> + */
> >>>>> +__rte_experimental
> >>>>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t
> queue_id,
> >>>>> +		struct rte_eth_txq_data *txq_data);
> >>>>> +
> >>>>>     /**
> >>>>>      * Retrieve information about given port's Rx queue.
> >>>>>      *
> >>>>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id,
> >>>>> uint16_t
> >>>> queue_id,
> >>>>>     	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >>>>>     }
> >>>>>
> >>>>> +/**
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without
> >>>>> +prior notice
> >>>>> + *
> >>>>> + * Put Tx buffers into Rx sw-ring and rearm descs.
> >>>>> + *
> >>>>> + * @param port_id
> >>>>> + *   Port identifying the receive side.
> >>>>> + * @param queue_id
> >>>>> + *   The index of the transmit queue identifying the receive side.
> >>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> >>>> supplied
> >>>>> + *   to rte_eth_dev_configure().
> >>>>> + * @param txq_data
> >>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
> >>>>> + * @return
> >>>>> + *   The number of direct-rearmed buffers.
> >>>>> + */
> >>>>> +__rte_experimental
> >>>>> +static __rte_always_inline uint16_t
> >>>>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
> >>>>> +		struct rte_eth_txq_data *txq_data) {
> >>>>> +	uint16_t nb_rearm;
> >>>>> +	struct rte_eth_fp_ops *p;
> >>>>> +	void *qd;
> >>>>> +
> >>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>>>> +	if (port_id >= RTE_MAX_ETHPORTS ||
> >>>>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT)
> {
> >>>>> +		RTE_ETHDEV_LOG(ERR,
> >>>>> +			"Invalid port_id=%u or queue_id=%u\n",
> >>>>> +			port_id, queue_id);
> >>>>> +		return 0;
> >>>>> +	}
> >>>>> +#endif
> >>>>> +
> >>>>> +	p = &rte_eth_fp_ops[port_id];
> >>>>> +	qd = p->rxq.data[queue_id];
> >>>>> +
> >>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> >>>>> +
> >>>>> +	if (qd == NULL) {
> >>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
> >>>> port_id=%u\n",
> >>>>> +			queue_id, port_id);
> >>>>> +		return 0;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (!p->rx_direct_rearm)
> >>>>
> >>>> This check should be done always (unconditionally).
> >>>> it is not a mandatory function for the driver (it can safely skip
> >>>> to
> >> implement it).
> >>>>
> >>>>> +		return -ENOTSUP;
> >>>>
> >>>> This function returns uint16_t, why signed integers here?
> >>>>
> >>>>
> >>>>> +#endif
> >>>>> +
> >>>>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
> >>>>
> >>>> So rx_direct_rearm() function knows how to extract data from TX
> queue?
> >>>> As I understand that is possible only in one case:
> >>>> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
> >>>> That means that rxq and txq have to belong to the same driver and
> >>>> device type.
> >>>>
> >>> Thanks for the comments, and I have some questions for this.
> >>>
> >>>> First of all, I still think it is not the best design choice.
> >>>> If we going ahead with introducing this feature, it better be as
> >>>> generic as possible.
> >>>> Plus it mixes TX and RX code-paths together, while it would be much
> >>>> better to to keep them independent as they are right now.
> >>>>
> >>>> Another thing with such approach - even for the same PMD, both TXQ
> >>>> and RXQ can have different internal data format and behavior logic
> >>>> (depending on port/queue configuration).
> >>> 1. Here TXQ and RXQ have different internal format means the queue
> >>> type and  descs can be different, right? If I understand correctly,
> >>> based on your first strategy, is it means we will need different
> >>> 'rearm_func' for different queue type in the same PMD?
> >>
> >> Yes, I think so.
> >> If let say we have some PMD where depending on the config, there
> >> cpuld be
> >> 2 different RXQ formats: rxq_a and rxq_b, and 2 different txq
> >> formats: txq_c, txq_d.
> >> Then assuming PMD would like to support direct-rearm mode for all
> >> four combinations, it needs 4 different rearm functions:
> >>
> >> rearm_txq_c_to_rxq_a()
> >> rearm_txq_c_to_rxq_b()
> >> rearm_txq_d_to_rxq_a()
> >> rearm_txq_d_to_rxq_b()
> >>
> > Thank you for your detailed explanation, I can understand this.

> >>
> >>>
> >>>> So rx_direct_rearm() function selection have to be done based on
> >>>> both RXQ and TXQ config.
> >>>> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
> >>>> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
> >>>> rx_queue, tx_port, tx_queue);
> >>>> Then, it will be user responsibility to store it somewhere and call
> >> periodically:
> >>>>
> >>>> control_path:
> >>>> 	...
> >>>> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
> >>>> 		 txport, txqueue);
> >>>> data-path:
> >>>> 	while(...) {
> >>>> 		rearm_func(rxport, txport, rxqueue, txqueue);
> >>>> 		rte_eth_rx_burst(rxport, rxqueue, ....);
> >>>> 		rte_eth_tx_burst(txport, txqueue, ....);
> >>>> 	}
> >>>>
> >>>>
> >>>> In that case there seems absolutely no point to introduce struct
> >>>> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
> >>>> directly anyway.
> >>> 2. This is a very good proposal and it will be our first choice.
> >>> Before working on it, I have a few questions about how to implement
> >> 'rearm_func'.
> >>> Like you say above, mixed Rx and Tx path code in 'rearm_func' means
> >>> the hard-code is mixed like:
> >>> rearm_func(...) {
> >>>        ...
> >>>       txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
> >>>       for (...) {
> >>>          rxep[i].mbuf = txep[i].mbuf;
> >>>          mb0 = txep[i].mbuf;
> >>>          paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
> >>>         dma_addr0 = vdupq_n_u64(paddr);
> >>>         vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
> >>>       }
> >>> }
> >>> Is my understanding is right?
> >>
> >>
> >> Sorry, I don't understand the question.
> >> Can you probably elaborate a bit?
> >
> > Sorry for my unclear expression.
> >
> > I mean if we need two func which contains tx and rx paths code
> respectively in rearm_func, like:
> > rearm_func(...) {
> >           rte_tx_fill_sw_ring;
> >           rte_rx_rearm_descs;
> > }
> >
> > Or just mixed tx and rx path code like I said before. I prefer 'rx and
> > tx hard code mixed', because from the performance perspective, this can
> reduce the cost of function calls.
> 
> I suppose it depends on what we choose:
> If we restrict it in a way that rxq and txq have to belong to the same PMD,
> then I suppose this decision could be left to each particular PMD.
> If we'd like to allow rearm to work accross different PMDs (i.e. it would allow
> to rearm mlx with ice and visa-versa), then yes we need PMDs somehow to
> expose rx_sw_ring abstraction to each other.
> My preference would be the second one - as it will make this feature more
> flexible and would help to adopt it more widely.
> Though the first one is probably easier to implement, and as I udnerstand
> you are leaning towards the first one.
> 
> Konstantin

Hi, Konstantin

1. After further consideration, I think we should give up  'rte_eth_get_rx_direct_rearm'  functions.
 And just keep one API 'rte_eth_direct_rearm' which contains different queue types path in the pmd driver.

This is due to that application can dynamically configure queue-mapping in the data path,
and thus expand direct-rearm usage scenarios based on this way. For example, consider a flow
which received by one rx_queue(rxq_1) and sent by two different types tx_queues( type_a txq_1 and type_b txq_2), 
Users can dynamically map direct_rearm according to tx path.
--------------------------------------------------------------------------------------
rte_eth_rx_burst(rxq_1);

%routing table lookup to decide tx queue%
txq_n = txq_1 or txq_2?

rte_eth_tx_burst(txq_n);
rte_eth_direct_rearm(txq_n);
--------------------------------------------------------------------------------------
Thus, this can avoid repeatedly calling 'rte_eth_get_rx_direct_rearm' in the data path to reduce performance.

Furthermore, rte_eth_direct_rearm can be implemented as:
--------------------------------------------------------------------------------------
rte_eth_direct_rearm = i40e_eth_direct_rearm;
i40e_eth_direct_rearm {
	rearm_queue_config_a path;
	rearm_queue_config_b path;
	rearm_queue_config_c path;
	rearm_queue_config_d path;
}
--------------------------------------------------------------------------------------
This implementation refers to 'rte_eth_rx_burst' and 'rte_eth_tx_burst'. If there will be different
queue types, the implementation will be in pmd layer.

2. I'm not sure whether cross pmd usage is realistic. 
Consider the tradeoff between performance and different pmd map, maybe we can provide two
different direct rearm mode for users. One is mixed Rx and Tx path and is used for the same pmd.
The other is separate Rx and Tx path, and is used for different pmd. 

Best Regards
Feifei
> 
> 
> >>
> >>>
> >>>>
> >>>> Another way - make rte_eth_txq_data totally opaque and allow PMD
> to
> >>>> store there some data that will help it to distinguish expected TXQ
> format.
> >>>> That will allow PMD to keep rx_direct_rearm() the same for all
> >>>> supported TXQ formats (it will make decision internally based on
> >>>> data stored in txq_data).
> >>>> Though in that case you'll probably need one more dev-op to free
> >> txq_data.
> >>>
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 回复: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-26  7:35             ` 回复: " Feifei Wang
@ 2022-10-31 16:36               ` Konstantin Ananyev
  2023-01-04  7:39                 ` 回复: " Feifei Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ananyev @ 2022-10-31 16:36 UTC (permalink / raw)
  To: Feifei Wang, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang



Hi Feifei,


>>>>>>> Add API for enabling direct rearm mode and for mapping RX and TX
>>>>>>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
>>>>>>>
>>>>>>> Furthermore, to avoid Rx load Tx data directly, add API called
>>>>>>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
>>>>>>>
>>>>>>> Suggested-by: Honnappa Nagarahalli
>> <honnappa.nagarahalli@arm.com>
>>>>>>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>>>> Reviewed-by: Honnappa Nagarahalli
>> <honnappa.nagarahalli@arm.com>
>>>>>>> ---
>>>>>>>      lib/ethdev/ethdev_driver.h   |  9 ++++
>>>>>>>      lib/ethdev/ethdev_private.c  |  1 +
>>>>>>>      lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
>>>>>>>      lib/ethdev/rte_ethdev.h      | 95
>>>>>> ++++++++++++++++++++++++++++++++++++
>>>>>>>      lib/ethdev/rte_ethdev_core.h |  5 ++
>>>>>>>      lib/ethdev/version.map       |  4 ++
>>>>>>>      6 files changed, 151 insertions(+)
>>>>>>>
>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h
>>>>>>> b/lib/ethdev/ethdev_driver.h index 47a55a419e..14f52907c1 100644
>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
>>>>>>>      	eth_rx_descriptor_status_t rx_descriptor_status;
>>>>>>>      	/** Check the status of a Tx descriptor */
>>>>>>>      	eth_tx_descriptor_status_t tx_descriptor_status;
>>>>>>> +	/**  Use Tx mbufs for Rx to rearm */
>>>>>>> +	eth_rx_direct_rearm_t rx_direct_rearm;
>>>>>>>
>>>>>>>      	/**
>>>>>>>      	 * Device data that is shared between primary and secondary
>>>>>>> processes @@ -486,6 +488,11 @@ typedef int
>>>>>> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
>>>>>>>      typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
>>>>>>>      				    uint16_t rx_queue_id);
>>>>>>>
>>>>>>> +/**< @internal Get Tx information of a transmit queue of an
>>>>>>> +Ethernet device. */ typedef void (*eth_txq_data_get_t)(struct
>>>> rte_eth_dev *dev,
>>>>>>> +				      uint16_t tx_queue_id,
>>>>>>> +				      struct rte_eth_txq_data
>> *txq_data);
>>>>>>> +
>>>>>>>      /** @internal Release memory resources allocated by given
>>>>>>> Rx/Tx
>>>> queue.
>>>>>> */
>>>>>>>      typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
>>>>>>>      				    uint16_t queue_id);
>>>>>>> @@ -1138,6 +1145,8 @@ struct eth_dev_ops {
>>>>>>>      	eth_rxq_info_get_t         rxq_info_get;
>>>>>>>      	/** Retrieve Tx queue information */
>>>>>>>      	eth_txq_info_get_t         txq_info_get;
>>>>>>> +	/** Get the address where Tx data is stored */
>>>>>>> +	eth_txq_data_get_t         txq_data_get;
>>>>>>>      	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx
>> burst
>>>>>> mode */
>>>>>>>      	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx
>> burst
>>>>>> mode */
>>>>>>>      	eth_fw_version_get_t       fw_version_get; /**< Get
>> firmware
>>>>>> version */
>>>>>>> diff --git a/lib/ethdev/ethdev_private.c
>>>>>>> b/lib/ethdev/ethdev_private.c index 48090c879a..bfe16c7d77 100644
>>>>>>> --- a/lib/ethdev/ethdev_private.c
>>>>>>> +++ b/lib/ethdev/ethdev_private.c
>>>>>>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct
>> rte_eth_fp_ops
>>>>>> *fpo,
>>>>>>>      	fpo->rx_queue_count = dev->rx_queue_count;
>>>>>>>      	fpo->rx_descriptor_status = dev->rx_descriptor_status;
>>>>>>>      	fpo->tx_descriptor_status = dev->tx_descriptor_status;
>>>>>>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
>>>>>>>
>>>>>>>      	fpo->rxq.data = dev->data->rx_queues;
>>>>>>>      	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
>>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>>>>>> index 0c2c1088c0..0dccec2e4b 100644
>>>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>>>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t port_id)
>>>>>>>      	return ret;
>>>>>>>      }
>>>>>>>
>>>>>>> +int
>>>>>>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
>>>>>>> +			struct rte_eth_txq_data *txq_data) {
>>>>>>> +	struct rte_eth_dev *dev;
>>>>>>> +
>>>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>>>> +	dev = &rte_eth_devices[port_id];
>>>>>>> +
>>>>>>> +	if (queue_id >= dev->data->nb_tx_queues) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
>>>>>> queue_id);
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (txq_data == NULL) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u
>> Tx
>>>>>> queue %u data to NULL\n",
>>>>>>> +			port_id, queue_id);
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (dev->data->tx_queues == NULL ||
>>>>>>> +			dev->data->tx_queues[queue_id] == NULL) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR,
>>>>>>> +			   "Tx queue %"PRIu16" of device with
>> port_id=%"
>>>>>>> +			   PRIu16" has not been setup\n",
>>>>>>> +			   queue_id, port_id);
>>>>>>> +		return -EINVAL;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (*dev->dev_ops->txq_data_get == NULL)
>>>>>>> +		return -ENOTSUP;
>>>>>>> +
>>>>>>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>      static int
>>>>>>>      rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
>>>> *rx_seg,
>>>>>>>      			     uint16_t n_seg, uint32_t *mbp_buf_size,
>> diff --git
>>>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>>>>> 2e783536c1..daf7f05d62 100644
>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
>>>>>>>      	uint8_t queue_state;        /**< one of
>> RTE_ETH_QUEUE_STATE_*. */
>>>>>>>      } __rte_cache_min_aligned;
>>>>>>>
>>>>>>> +/**
>>>>>>> + * @internal
>>>>>>> + * Structure used to hold pointers to internal ethdev Tx data.
>>>>>>> + * The main purpose is to load and store Tx queue data in direct
>>>>>>> +rearm
>>>>>> mode.
>>>>>>> + */
>>>>>>> +struct rte_eth_txq_data {
>>>>>>> +	uint64_t *offloads;
>>>>>>> +	void *tx_sw_ring;
>>>>>>> +	volatile void *tx_ring;
>>>>>>> +	uint16_t *tx_next_dd;
>>>>>>> +	uint16_t *nb_tx_free;
>>>>>>> +	uint16_t nb_tx_desc;
>>>>>>> +	uint16_t tx_rs_thresh;
>>>>>>> +	uint16_t tx_free_thresh;
>>>>>>> +} __rte_cache_min_aligned;
>>>>>>> +
>>>>>>
>>>>>> first of all it is not clear why this struct has to be in public
>>>>>> header, why it can't be in on of ethdev 'private' headers.
>>>>>> Second it looks like a snippet from private txq fields for some
>>>>>> Intel (and alike) PMDs (i40e, ice, etc.).
>>>>>> How it supposed to to be universal and be applicable for any PMD
>>>>>> that decides to implement this new API?
>>>>>>
>>>>>>
>>>>>>>      /* Generic Burst mode flag definition, values can be ORed. */
>>>>>>>
>>>>>>>      /**
>>>>>>> @@ -4718,6 +4735,27 @@ int rte_eth_remove_rx_callback(uint16_t
>>>>>> port_id, uint16_t queue_id,
>>>>>>>      int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t
>> queue_id,
>>>>>>>      		const struct rte_eth_rxtx_callback *user_cb);
>>>>>>>
>>>>>>> +/**
>>>>>>> + * Get the address which Tx data is stored.
>>>>>>> + *
>>>>>>> + * @param port_id
>>>>>>> + *   The port identifier of the Ethernet device.
>>>>>>> + * @param queue_id
>>>>>>> + *   The Tx queue on the Ethernet device for which information
>>>>>>> + *   will be retrieved.
>>>>>>> + * @param txq_data
>>>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>>>>>> + *
>>>>>>> + * @return
>>>>>>> + *   - 0: Success
>>>>>>> + *   - -ENODEV:  If *port_id* is invalid.
>>>>>>> + *   - -ENOTSUP: routine is not supported by the device PMD.
>>>>>>> + *   - -EINVAL:  The queue_id is out of range.
>>>>>>> + */
>>>>>>> +__rte_experimental
>>>>>>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t
>> queue_id,
>>>>>>> +		struct rte_eth_txq_data *txq_data);
>>>>>>> +
>>>>>>>      /**
>>>>>>>       * Retrieve information about given port's Rx queue.
>>>>>>>       *
>>>>>>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id,
>>>>>>> uint16_t
>>>>>> queue_id,
>>>>>>>      	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>>>>>>>      }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * @warning
>>>>>>> + * @b EXPERIMENTAL: this API may change, or be removed, without
>>>>>>> +prior notice
>>>>>>> + *
>>>>>>> + * Put Tx buffers into Rx sw-ring and rearm descs.
>>>>>>> + *
>>>>>>> + * @param port_id
>>>>>>> + *   Port identifying the receive side.
>>>>>>> + * @param queue_id
>>>>>>> + *   The index of the transmit queue identifying the receive side.
>>>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
>>>>>> supplied
>>>>>>> + *   to rte_eth_dev_configure().
>>>>>>> + * @param txq_data
>>>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be filled.
>>>>>>> + * @return
>>>>>>> + *   The number of direct-rearmed buffers.
>>>>>>> + */
>>>>>>> +__rte_experimental
>>>>>>> +static __rte_always_inline uint16_t
>>>>>>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
>>>>>>> +		struct rte_eth_txq_data *txq_data) {
>>>>>>> +	uint16_t nb_rearm;
>>>>>>> +	struct rte_eth_fp_ops *p;
>>>>>>> +	void *qd;
>>>>>>> +
>>>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>>>>>> +	if (port_id >= RTE_MAX_ETHPORTS ||
>>>>>>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT)
>> {
>>>>>>> +		RTE_ETHDEV_LOG(ERR,
>>>>>>> +			"Invalid port_id=%u or queue_id=%u\n",
>>>>>>> +			port_id, queue_id);
>>>>>>> +		return 0;
>>>>>>> +	}
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +	p = &rte_eth_fp_ops[port_id];
>>>>>>> +	qd = p->rxq.data[queue_id];
>>>>>>> +
>>>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
>>>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
>>>>>>> +
>>>>>>> +	if (qd == NULL) {
>>>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
>>>>>> port_id=%u\n",
>>>>>>> +			queue_id, port_id);
>>>>>>> +		return 0;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	if (!p->rx_direct_rearm)
>>>>>>
>>>>>> This check should be done always (unconditionally).
>>>>>> it is not a mandatory function for the driver (it can safely skip
>>>>>> to
>>>> implement it).
>>>>>>
>>>>>>> +		return -ENOTSUP;
>>>>>>
>>>>>> This function returns uint16_t, why signed integers here?
>>>>>>
>>>>>>
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
>>>>>>
>>>>>> So rx_direct_rearm() function knows how to extract data from TX
>> queue?
>>>>>> As I understand that is possible only in one case:
>>>>>> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
>>>>>> That means that rxq and txq have to belong to the same driver and
>>>>>> device type.
>>>>>>
>>>>> Thanks for the comments, and I have some questions for this.
>>>>>
>>>>>> First of all, I still think it is not the best design choice.
>>>>>> If we going ahead with introducing this feature, it better be as
>>>>>> generic as possible.
>>>>>> Plus it mixes TX and RX code-paths together, while it would be much
>>>>>> better to to keep them independent as they are right now.
>>>>>>
>>>>>> Another thing with such approach - even for the same PMD, both TXQ
>>>>>> and RXQ can have different internal data format and behavior logic
>>>>>> (depending on port/queue configuration).
>>>>> 1. Here TXQ and RXQ have different internal format means the queue
>>>>> type and  descs can be different, right? If I understand correctly,
>>>>> based on your first strategy, is it means we will need different
>>>>> 'rearm_func' for different queue type in the same PMD?
>>>>
>>>> Yes, I think so.
>>>> If let say we have some PMD where depending on the config, there
>>>> cpuld be
>>>> 2 different RXQ formats: rxq_a and rxq_b, and 2 different txq
>>>> formats: txq_c, txq_d.
>>>> Then assuming PMD would like to support direct-rearm mode for all
>>>> four combinations, it needs 4 different rearm functions:
>>>>
>>>> rearm_txq_c_to_rxq_a()
>>>> rearm_txq_c_to_rxq_b()
>>>> rearm_txq_d_to_rxq_a()
>>>> rearm_txq_d_to_rxq_b()
>>>>
>>> Thank you for your detailed explanation, I can understand this.
> 
>>>>
>>>>>
>>>>>> So rx_direct_rearm() function selection have to be done based on
>>>>>> both RXQ and TXQ config.
>>>>>> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
>>>>>> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
>>>>>> rx_queue, tx_port, tx_queue);
>>>>>> Then, it will be user responsibility to store it somewhere and call
>>>> periodically:
>>>>>>
>>>>>> control_path:
>>>>>> 	...
>>>>>> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport, rxqueue,
>>>>>> 		 txport, txqueue);
>>>>>> data-path:
>>>>>> 	while(...) {
>>>>>> 		rearm_func(rxport, txport, rxqueue, txqueue);
>>>>>> 		rte_eth_rx_burst(rxport, rxqueue, ....);
>>>>>> 		rte_eth_tx_burst(txport, txqueue, ....);
>>>>>> 	}
>>>>>>
>>>>>>
>>>>>> In that case there seems absolutely no point to introduce struct
>>>>>> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
>>>>>> directly anyway.
>>>>> 2. This is a very good proposal and it will be our first choice.
>>>>> Before working on it, I have a few questions about how to implement
>>>> 'rearm_func'.
>>>>> Like you say above, mixed Rx and Tx path code in 'rearm_func' means
>>>>> the hard-code is mixed like:
>>>>> rearm_func(...) {
>>>>>         ...
>>>>>        txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
>>>>>        for (...) {
>>>>>           rxep[i].mbuf = txep[i].mbuf;
>>>>>           mb0 = txep[i].mbuf;
>>>>>           paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
>>>>>          dma_addr0 = vdupq_n_u64(paddr);
>>>>>          vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
>>>>>        }
>>>>> }
>>>>> Is my understanding is right?
>>>>
>>>>
>>>> Sorry, I don't understand the question.
>>>> Can you probably elaborate a bit?
>>>
>>> Sorry for my unclear expression.
>>>
>>> I mean if we need two func which contains tx and rx paths code
>> respectively in rearm_func, like:
>>> rearm_func(...) {
>>>            rte_tx_fill_sw_ring;
>>>            rte_rx_rearm_descs;
>>> }
>>>
>>> Or just mixed tx and rx path code like I said before. I prefer 'rx and
>>> tx hard code mixed', because from the performance perspective, this can
>> reduce the cost of function calls.
>>
>> I suppose it depends on what we choose:
>> If we restrict it in a way that rxq and txq have to belong to the same PMD,
>> then I suppose this decision could be left to each particular PMD.
>> If we'd like to allow rearm to work accross different PMDs (i.e. it would allow
>> to rearm mlx with ice and visa-versa), then yes we need PMDs somehow to
>> expose rx_sw_ring abstraction to each other.
>> My preference would be the second one - as it will make this feature more
>> flexible and would help to adopt it more widely.
>> Though the first one is probably easier to implement, and as I udnerstand
>> you are leaning towards the first one.
>>
>> Konstantin
> 
> Hi, Konstantin
> 
> 1. After further consideration, I think we should give up  'rte_eth_get_rx_direct_rearm'  functions.
>   And just keep one API 'rte_eth_direct_rearm' which contains different queue types path in the pmd driver.
> 
> This is due to that application can dynamically configure queue-mapping in the data path,
> and thus expand direct-rearm usage scenarios based on this way. For example, consider a flow
> which received by one rx_queue(rxq_1) and sent by two different types tx_queues( type_a txq_1 and type_b txq_2),
> Users can dynamically map direct_rearm according to tx path.

That's good point, indeed for such case special function pointer doesn't 
look very conveniet.
I still think that the best way to deal with multiple sources would be
to expose some sort of rx_sw_ring abstraction.
Then it could be obtaiend once and passed as a paramter to actual rearm 
function.
In that case, user can even re-arm RX queue not from TX queue only,
but from other sources too (freed mbufs, etc.).
As I understand what you suggest is one unified function:
int rte_eth_rx_direct_rearm(rx_port, rx_queue, tx_port, tx_queue), correct?
I am not against it in principle, but does it mean at each invocation,
inside PMD layer, first thing you would have to do -
check that rx_queue and tx_queue are from the same device type and 
select proper behaviour (for different queue configs)?

> --------------------------------------------------------------------------------------
> rte_eth_rx_burst(rxq_1);
> 
> %routing table lookup to decide tx queue%
> txq_n = txq_1 or txq_2?
> 
> rte_eth_tx_burst(txq_n);
> rte_eth_direct_rearm(txq_n);
> --------------------------------------------------------------------------------------
> Thus, this can avoid repeatedly calling 'rte_eth_get_rx_direct_rearm' in the data path to reduce performance.
> 
> Furthermore, rte_eth_direct_rearm can be implemented as:
> --------------------------------------------------------------------------------------
> rte_eth_direct_rearm = i40e_eth_direct_rearm;
> i40e_eth_direct_rearm {
> 	rearm_queue_config_a path;
> 	rearm_queue_config_b path;
> 	rearm_queue_config_c path;
> 	rearm_queue_config_d path;
> }
> --------------------------------------------------------------------------------------
> This implementation refers to 'rte_eth_rx_burst' and 'rte_eth_tx_burst'. If there will be different
> queue types, the implementation will be in pmd layer.
> 
> 2. I'm not sure whether cross pmd usage is realistic.
> Consider the tradeoff between performance and different pmd map, maybe we can provide two
> different direct rearm mode for users. One is mixed Rx and Tx path and is used for the same pmd.
> The other is separate Rx and Tx path, and is used for different pmd.
> Best Regards
> Feifei
>>
>>
>>>>
>>>>>
>>>>>>
>>>>>> Another way - make rte_eth_txq_data totally opaque and allow PMD
>> to
>>>>>> store there some data that will help it to distinguish expected TXQ
>> format.
>>>>>> That will allow PMD to keep rx_direct_rearm() the same for all
>>>>>> supported TXQ formats (it will make decision internally based on
>>>>>> data stored in txq_data).
>>>>>> Though in that case you'll probably need one more dev-op to free
>>>> txq_data.
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* 回复: 回复: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm mode
  2022-10-31 16:36               ` Konstantin Ananyev
@ 2023-01-04  7:39                 ` Feifei Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Feifei Wang @ 2023-01-04  7:39 UTC (permalink / raw)
  To: Konstantin Ananyev, thomas, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: dev, nd, Honnappa Nagarahalli, Ruifeng Wang, nd

Hi, Konstantin

> -----邮件原件-----
> 发件人: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 发送时间: Tuesday, November 1, 2022 12:37 AM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ray Kinsella <mdr@ashroe.eu>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> 主题: Re: 回复: 回复: 回复: [PATCH v2 1/3] ethdev: add API for direct rearm
> mode
> 
> 
> 
> Hi Feifei,
> 
> 
> >>>>>>> Add API for enabling direct rearm mode and for mapping RX and TX
> >>>>>>> queues. Currently, the API supports 1:1(txq : rxq) mapping.
> >>>>>>>
> >>>>>>> Furthermore, to avoid Rx load Tx data directly, add API called
> >>>>>>> 'rte_eth_txq_data_get' to get Tx sw_ring and its information.
> >>>>>>>
> >>>>>>> Suggested-by: Honnappa Nagarahalli
> >> <honnappa.nagarahalli@arm.com>
> >>>>>>> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>>>>>> Reviewed-by: Honnappa Nagarahalli
> >> <honnappa.nagarahalli@arm.com>
> >>>>>>> ---
> >>>>>>>      lib/ethdev/ethdev_driver.h   |  9 ++++
> >>>>>>>      lib/ethdev/ethdev_private.c  |  1 +
> >>>>>>>      lib/ethdev/rte_ethdev.c      | 37 ++++++++++++++
> >>>>>>>      lib/ethdev/rte_ethdev.h      | 95
> >>>>>> ++++++++++++++++++++++++++++++++++++
> >>>>>>>      lib/ethdev/rte_ethdev_core.h |  5 ++
> >>>>>>>      lib/ethdev/version.map       |  4 ++
> >>>>>>>      6 files changed, 151 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/lib/ethdev/ethdev_driver.h
> >>>>>>> b/lib/ethdev/ethdev_driver.h index 47a55a419e..14f52907c1
> 100644
> >>>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>>> @@ -58,6 +58,8 @@ struct rte_eth_dev {
> >>>>>>>      	eth_rx_descriptor_status_t rx_descriptor_status;
> >>>>>>>      	/** Check the status of a Tx descriptor */
> >>>>>>>      	eth_tx_descriptor_status_t tx_descriptor_status;
> >>>>>>> +	/**  Use Tx mbufs for Rx to rearm */
> >>>>>>> +	eth_rx_direct_rearm_t rx_direct_rearm;
> >>>>>>>
> >>>>>>>      	/**
> >>>>>>>      	 * Device data that is shared between primary and
> >>>>>>> secondary processes @@ -486,6 +488,11 @@ typedef int
> >>>>>> (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> >>>>>>>      typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> >>>>>>>      				    uint16_t rx_queue_id);
> >>>>>>>
> >>>>>>> +/**< @internal Get Tx information of a transmit queue of an
> >>>>>>> +Ethernet device. */ typedef void (*eth_txq_data_get_t)(struct
> >>>> rte_eth_dev *dev,
> >>>>>>> +				      uint16_t tx_queue_id,
> >>>>>>> +				      struct rte_eth_txq_data
> >> *txq_data);
> >>>>>>> +
> >>>>>>>      /** @internal Release memory resources allocated by given
> >>>>>>> Rx/Tx
> >>>> queue.
> >>>>>> */
> >>>>>>>      typedef void (*eth_queue_release_t)(struct rte_eth_dev *dev,
> >>>>>>>      				    uint16_t queue_id); @@ -1138,6
> +1145,8 @@ struct
> >>>>>>> eth_dev_ops {
> >>>>>>>      	eth_rxq_info_get_t         rxq_info_get;
> >>>>>>>      	/** Retrieve Tx queue information */
> >>>>>>>      	eth_txq_info_get_t         txq_info_get;
> >>>>>>> +	/** Get the address where Tx data is stored */
> >>>>>>> +	eth_txq_data_get_t         txq_data_get;
> >>>>>>>      	eth_burst_mode_get_t       rx_burst_mode_get; /**< Get Rx
> >> burst
> >>>>>> mode */
> >>>>>>>      	eth_burst_mode_get_t       tx_burst_mode_get; /**< Get Tx
> >> burst
> >>>>>> mode */
> >>>>>>>      	eth_fw_version_get_t       fw_version_get; /**< Get
> >> firmware
> >>>>>> version */
> >>>>>>> diff --git a/lib/ethdev/ethdev_private.c
> >>>>>>> b/lib/ethdev/ethdev_private.c index 48090c879a..bfe16c7d77
> >>>>>>> 100644
> >>>>>>> --- a/lib/ethdev/ethdev_private.c
> >>>>>>> +++ b/lib/ethdev/ethdev_private.c
> >>>>>>> @@ -276,6 +276,7 @@ eth_dev_fp_ops_setup(struct
> >> rte_eth_fp_ops
> >>>>>> *fpo,
> >>>>>>>      	fpo->rx_queue_count = dev->rx_queue_count;
> >>>>>>>      	fpo->rx_descriptor_status = dev->rx_descriptor_status;
> >>>>>>>      	fpo->tx_descriptor_status = dev->tx_descriptor_status;
> >>>>>>> +	fpo->rx_direct_rearm = dev->rx_direct_rearm;
> >>>>>>>
> >>>>>>>      	fpo->rxq.data = dev->data->rx_queues;
> >>>>>>>      	fpo->rxq.clbk = (void
> >>>>>>> **)(uintptr_t)dev->post_rx_burst_cbs;
> >>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>>>>> index 0c2c1088c0..0dccec2e4b 100644
> >>>>>>> --- a/lib/ethdev/rte_ethdev.c
> >>>>>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>>>>> @@ -1648,6 +1648,43 @@ rte_eth_dev_is_removed(uint16_t
> port_id)
> >>>>>>>      	return ret;
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +int
> >>>>>>> +rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t queue_id,
> >>>>>>> +			struct rte_eth_txq_data *txq_data) {
> >>>>>>> +	struct rte_eth_dev *dev;
> >>>>>>> +
> >>>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>>>>>> +	dev = &rte_eth_devices[port_id];
> >>>>>>> +
> >>>>>>> +	if (queue_id >= dev->data->nb_tx_queues) {
> >>>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u\n",
> >>>>>> queue_id);
> >>>>>>> +		return -EINVAL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	if (txq_data == NULL) {
> >>>>>>> +		RTE_ETHDEV_LOG(ERR, "Cannot get ethdev port %u
> >> Tx
> >>>>>> queue %u data to NULL\n",
> >>>>>>> +			port_id, queue_id);
> >>>>>>> +		return -EINVAL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	if (dev->data->tx_queues == NULL ||
> >>>>>>> +			dev->data->tx_queues[queue_id] == NULL) {
> >>>>>>> +		RTE_ETHDEV_LOG(ERR,
> >>>>>>> +			   "Tx queue %"PRIu16" of device with
> >> port_id=%"
> >>>>>>> +			   PRIu16" has not been setup\n",
> >>>>>>> +			   queue_id, port_id);
> >>>>>>> +		return -EINVAL;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	if (*dev->dev_ops->txq_data_get == NULL)
> >>>>>>> +		return -ENOTSUP;
> >>>>>>> +
> >>>>>>> +	dev->dev_ops->txq_data_get(dev, queue_id, txq_data);
> >>>>>>> +
> >>>>>>> +	return 0;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>      static int
> >>>>>>>      rte_eth_rx_queue_check_split(const struct
> >>>>>>> rte_eth_rxseg_split
> >>>> *rx_seg,
> >>>>>>>      			     uint16_t n_seg, uint32_t *mbp_buf_size,
> >> diff --git
> >>>>>>> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>>>>>> 2e783536c1..daf7f05d62 100644
> >>>>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>>>> @@ -1949,6 +1949,23 @@ struct rte_eth_txq_info {
> >>>>>>>      	uint8_t queue_state;        /**< one of
> >> RTE_ETH_QUEUE_STATE_*. */
> >>>>>>>      } __rte_cache_min_aligned;
> >>>>>>>
> >>>>>>> +/**
> >>>>>>> + * @internal
> >>>>>>> + * Structure used to hold pointers to internal ethdev Tx data.
> >>>>>>> + * The main purpose is to load and store Tx queue data in
> >>>>>>> +direct rearm
> >>>>>> mode.
> >>>>>>> + */
> >>>>>>> +struct rte_eth_txq_data {
> >>>>>>> +	uint64_t *offloads;
> >>>>>>> +	void *tx_sw_ring;
> >>>>>>> +	volatile void *tx_ring;
> >>>>>>> +	uint16_t *tx_next_dd;
> >>>>>>> +	uint16_t *nb_tx_free;
> >>>>>>> +	uint16_t nb_tx_desc;
> >>>>>>> +	uint16_t tx_rs_thresh;
> >>>>>>> +	uint16_t tx_free_thresh;
> >>>>>>> +} __rte_cache_min_aligned;
> >>>>>>> +
> >>>>>>
> >>>>>> first of all it is not clear why this struct has to be in public
> >>>>>> header, why it can't be in on of ethdev 'private' headers.
> >>>>>> Second it looks like a snippet from private txq fields for some
> >>>>>> Intel (and alike) PMDs (i40e, ice, etc.).
> >>>>>> How it supposed to to be universal and be applicable for any PMD
> >>>>>> that decides to implement this new API?
> >>>>>>
> >>>>>>
> >>>>>>>      /* Generic Burst mode flag definition, values can be ORed.
> >>>>>>> */
> >>>>>>>
> >>>>>>>      /**
> >>>>>>> @@ -4718,6 +4735,27 @@ int
> rte_eth_remove_rx_callback(uint16_t
> >>>>>> port_id, uint16_t queue_id,
> >>>>>>>      int rte_eth_remove_tx_callback(uint16_t port_id, uint16_t
> >> queue_id,
> >>>>>>>      		const struct rte_eth_rxtx_callback *user_cb);
> >>>>>>>
> >>>>>>> +/**
> >>>>>>> + * Get the address which Tx data is stored.
> >>>>>>> + *
> >>>>>>> + * @param port_id
> >>>>>>> + *   The port identifier of the Ethernet device.
> >>>>>>> + * @param queue_id
> >>>>>>> + *   The Tx queue on the Ethernet device for which information
> >>>>>>> + *   will be retrieved.
> >>>>>>> + * @param txq_data
> >>>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be
> filled.
> >>>>>>> + *
> >>>>>>> + * @return
> >>>>>>> + *   - 0: Success
> >>>>>>> + *   - -ENODEV:  If *port_id* is invalid.
> >>>>>>> + *   - -ENOTSUP: routine is not supported by the device PMD.
> >>>>>>> + *   - -EINVAL:  The queue_id is out of range.
> >>>>>>> + */
> >>>>>>> +__rte_experimental
> >>>>>>> +int rte_eth_tx_queue_data_get(uint16_t port_id, uint16_t
> >> queue_id,
> >>>>>>> +		struct rte_eth_txq_data *txq_data);
> >>>>>>> +
> >>>>>>>      /**
> >>>>>>>       * Retrieve information about given port's Rx queue.
> >>>>>>>       *
> >>>>>>> @@ -6209,6 +6247,63 @@ rte_eth_tx_buffer(uint16_t port_id,
> >>>>>>> uint16_t
> >>>>>> queue_id,
> >>>>>>>      	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >>>>>>>      }
> >>>>>>>
> >>>>>>> +/**
> >>>>>>> + * @warning
> >>>>>>> + * @b EXPERIMENTAL: this API may change, or be removed,
> without
> >>>>>>> +prior notice
> >>>>>>> + *
> >>>>>>> + * Put Tx buffers into Rx sw-ring and rearm descs.
> >>>>>>> + *
> >>>>>>> + * @param port_id
> >>>>>>> + *   Port identifying the receive side.
> >>>>>>> + * @param queue_id
> >>>>>>> + *   The index of the transmit queue identifying the receive side.
> >>>>>>> + *   The value must be in the range [0, nb_rx_queue - 1] previously
> >>>>>> supplied
> >>>>>>> + *   to rte_eth_dev_configure().
> >>>>>>> + * @param txq_data
> >>>>>>> + *   A pointer to a structure of type *rte_eth_txq_data* to be
> filled.
> >>>>>>> + * @return
> >>>>>>> + *   The number of direct-rearmed buffers.
> >>>>>>> + */
> >>>>>>> +__rte_experimental
> >>>>>>> +static __rte_always_inline uint16_t
> >>>>>>> +rte_eth_rx_direct_rearm(uint16_t port_id, uint16_t queue_id,
> >>>>>>> +		struct rte_eth_txq_data *txq_data) {
> >>>>>>> +	uint16_t nb_rearm;
> >>>>>>> +	struct rte_eth_fp_ops *p;
> >>>>>>> +	void *qd;
> >>>>>>> +
> >>>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>>>>>> +	if (port_id >= RTE_MAX_ETHPORTS ||
> >>>>>>> +			queue_id >= RTE_MAX_QUEUES_PER_PORT)
> >> {
> >>>>>>> +		RTE_ETHDEV_LOG(ERR,
> >>>>>>> +			"Invalid port_id=%u or queue_id=%u\n",
> >>>>>>> +			port_id, queue_id);
> >>>>>>> +		return 0;
> >>>>>>> +	}
> >>>>>>> +#endif
> >>>>>>> +
> >>>>>>> +	p = &rte_eth_fp_ops[port_id];
> >>>>>>> +	qd = p->rxq.data[queue_id];
> >>>>>>> +
> >>>>>>> +#ifdef RTE_ETHDEV_DEBUG_RX
> >>>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> >>>>>>> +
> >>>>>>> +	if (qd == NULL) {
> >>>>>>> +		RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for
> >>>>>> port_id=%u\n",
> >>>>>>> +			queue_id, port_id);
> >>>>>>> +		return 0;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	if (!p->rx_direct_rearm)
> >>>>>>
> >>>>>> This check should be done always (unconditionally).
> >>>>>> it is not a mandatory function for the driver (it can safely skip
> >>>>>> to
> >>>> implement it).
> >>>>>>
> >>>>>>> +		return -ENOTSUP;
> >>>>>>
> >>>>>> This function returns uint16_t, why signed integers here?
> >>>>>>
> >>>>>>
> >>>>>>> +#endif
> >>>>>>> +
> >>>>>>> +	nb_rearm = p->rx_direct_rearm(qd, txq_data);
> >>>>>>
> >>>>>> So rx_direct_rearm() function knows how to extract data from TX
> >> queue?
> >>>>>> As I understand that is possible only in one case:
> >>>>>> rx_direct_rearm() has full knowledge and acess of txq internals, etc.
> >>>>>> That means that rxq and txq have to belong to the same driver and
> >>>>>> device type.
> >>>>>>
> >>>>> Thanks for the comments, and I have some questions for this.
> >>>>>
> >>>>>> First of all, I still think it is not the best design choice.
> >>>>>> If we going ahead with introducing this feature, it better be as
> >>>>>> generic as possible.
> >>>>>> Plus it mixes TX and RX code-paths together, while it would be
> >>>>>> much better to to keep them independent as they are right now.
> >>>>>>
> >>>>>> Another thing with such approach - even for the same PMD, both
> >>>>>> TXQ and RXQ can have different internal data format and behavior
> >>>>>> logic (depending on port/queue configuration).
> >>>>> 1. Here TXQ and RXQ have different internal format means the queue
> >>>>> type and  descs can be different, right? If I understand
> >>>>> correctly, based on your first strategy, is it means we will need
> >>>>> different 'rearm_func' for different queue type in the same PMD?
> >>>>
> >>>> Yes, I think so.
> >>>> If let say we have some PMD where depending on the config, there
> >>>> cpuld be
> >>>> 2 different RXQ formats: rxq_a and rxq_b, and 2 different txq
> >>>> formats: txq_c, txq_d.
> >>>> Then assuming PMD would like to support direct-rearm mode for all
> >>>> four combinations, it needs 4 different rearm functions:
> >>>>
> >>>> rearm_txq_c_to_rxq_a()
> >>>> rearm_txq_c_to_rxq_b()
> >>>> rearm_txq_d_to_rxq_a()
> >>>> rearm_txq_d_to_rxq_b()
> >>>>
> >>> Thank you for your detailed explanation, I can understand this.
> >
> >>>>
> >>>>>
> >>>>>> So rx_direct_rearm() function selection have to be done based on
> >>>>>> both RXQ and TXQ config.
> >>>>>> So instead of rte_eth_tx_queue_data_get(), you'll probably need:
> >>>>>> eth_rx_direct_rearm_t rte_eth_get_rx_direct_rearm_func(rx_port,
> >>>>>> rx_queue, tx_port, tx_queue);
> >>>>>> Then, it will be user responsibility to store it somewhere and
> >>>>>> call
> >>>> periodically:
> >>>>>>
> >>>>>> control_path:
> >>>>>> 	...
> >>>>>> 	rearm_func = rte_eth_get_rx_direct_rearm_func(rxport,
> rxqueue,
> >>>>>> 		 txport, txqueue);
> >>>>>> data-path:
> >>>>>> 	while(...) {
> >>>>>> 		rearm_func(rxport, txport, rxqueue, txqueue);
> >>>>>> 		rte_eth_rx_burst(rxport, rxqueue, ....);
> >>>>>> 		rte_eth_tx_burst(txport, txqueue, ....);
> >>>>>> 	}
> >>>>>>
> >>>>>>
> >>>>>> In that case there seems absolutely no point to introduce struct
> >>>>>> rte_eth_txq_data. rx_direct_rearm() accesses TXQ private data
> >>>>>> directly anyway.
> >>>>> 2. This is a very good proposal and it will be our first choice.
> >>>>> Before working on it, I have a few questions about how to
> >>>>> implement
> >>>> 'rearm_func'.
> >>>>> Like you say above, mixed Rx and Tx path code in 'rearm_func'
> >>>>> means the hard-code is mixed like:
> >>>>> rearm_func(...) {
> >>>>>         ...
> >>>>>        txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
> >>>>>        for (...) {
> >>>>>           rxep[i].mbuf = txep[i].mbuf;
> >>>>>           mb0 = txep[i].mbuf;
> >>>>>           paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM;
> >>>>>          dma_addr0 = vdupq_n_u64(paddr);
> >>>>>          vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0);
> >>>>>        }
> >>>>> }
> >>>>> Is my understanding is right?
> >>>>
> >>>>
> >>>> Sorry, I don't understand the question.
> >>>> Can you probably elaborate a bit?
> >>>
> >>> Sorry for my unclear expression.
> >>>
> >>> I mean if we need two func which contains tx and rx paths code
> >> respectively in rearm_func, like:
> >>> rearm_func(...) {
> >>>            rte_tx_fill_sw_ring;
> >>>            rte_rx_rearm_descs;
> >>> }
> >>>
> >>> Or just mixed tx and rx path code like I said before. I prefer 'rx
> >>> and tx hard code mixed', because from the performance perspective,
> >>> this can
> >> reduce the cost of function calls.
> >>
> >> I suppose it depends on what we choose:
> >> If we restrict it in a way that rxq and txq have to belong to the
> >> same PMD, then I suppose this decision could be left to each particular
> PMD.
> >> If we'd like to allow rearm to work accross different PMDs (i.e. it
> >> would allow to rearm mlx with ice and visa-versa), then yes we need
> >> PMDs somehow to expose rx_sw_ring abstraction to each other.
> >> My preference would be the second one - as it will make this feature
> >> more flexible and would help to adopt it more widely.
> >> Though the first one is probably easier to implement, and as I
> >> udnerstand you are leaning towards the first one.
> >>
> >> Konstantin
> >
> > Hi, Konstantin
> >
> > 1. After further consideration, I think we should give up
> 'rte_eth_get_rx_direct_rearm'  functions.
> >   And just keep one API 'rte_eth_direct_rearm' which contains different
> queue types path in the pmd driver.
> >
> > This is due to that application can dynamically configure
> > queue-mapping in the data path, and thus expand direct-rearm usage
> > scenarios based on this way. For example, consider a flow which
> > received by one rx_queue(rxq_1) and sent by two different types
> tx_queues( type_a txq_1 and type_b txq_2), Users can dynamically map
> direct_rearm according to tx path.
> 
> That's good point, indeed for such case special function pointer doesn't look
> very conveniet.
> I still think that the best way to deal with multiple sources would be to
> expose some sort of rx_sw_ring abstraction.
> Then it could be obtaiend once and passed as a paramter to actual rearm
> function.
> In that case, user can even re-arm RX queue not from TX queue only, but
> from other sources too (freed mbufs, etc.).
> As I understand what you suggest is one unified function:
> int rte_eth_rx_direct_rearm(rx_port, rx_queue, tx_port, tx_queue), correct?
> I am not against it in principle, but does it mean at each invocation, inside
> PMD layer, first thing you would have to do - check that rx_queue and
> tx_queue are from the same device type and select proper behaviour (for
> different queue configs)?

Sorry for my late reply. We just used direct-rearm to deal with multiple sources by expose
some sort of rx_sw_ring abstraction. (For example, Rx: ixgbe   Tx:i40e):
http://patches.dpdk.org/project/dpdk/cover/20230104073043.1120168-1-feifei.wang2@arm.com/
Following are the  test results:
(1) dpdk l3fwd test with multiple drivers:  port 0: 82599 NIC   port 1: XL710 NIC
-------------------------------------------------------------------------------------------------------
		Without fast free	With fast free
Thunderx2:         +9.44%			+7.14%
-------------------------------------------------------------------------------------------------------

(2) dpdk l3fwd test with same driver: port 0 && 1 XL710 NIC
-------------------------------------------------------------------------------------------------------
*Direct rearm with exposing rx_sw_ring:
		Without fast free	With fast free
Ampere altra:     +14.98%		+15.77%
n1sdp:		 +6.47%			+0.52%

*Direct rearm with one unified function (community patch):
		Without fast free	With fast free
Ampere altra:     +17.18%		+16.92%
n1sdp:		 +15.09%		+4.2%
-------------------------------------------------------------------------------------------------------

(3) VPP test with same driver: port 0 && 1 XL710 NIC
-------------------------------------------------------------------------------------------------------
*Direct rearm with exposing rx_sw_ring:
Ampere altra:     +4.59%
n1sdp:		 +5.4%

*Direct rearm with one unified function (community patch):
Ampere altra:     +4.50%
n1sdp:		 +4.5%
-------------------------------------------------------------------------------------------------------
According to the test results, we can see direct-rearm mode by expose rx_sw_ring can
improve the performance significantly for the same driver and different driver.

Compared with direct-rearm with one unified function in community, there will be some
performance degradation in n1sdp for new direct-rearm mode. This is due to we separate fill sw_ring
and flush descriptors operation which may increase cache-misses. For other servers, performance is Ok.
I think performance degradation in n1sdp can be achieved and new scheme can bring better expansibility.

Best Regards
Feifei
> 
> > ----------------------------------------------------------------------
> > ----------------
> > rte_eth_rx_burst(rxq_1);
> >
> > %routing table lookup to decide tx queue% txq_n = txq_1 or txq_2?
> >
> > rte_eth_tx_burst(txq_n);
> > rte_eth_direct_rearm(txq_n);
> > ----------------------------------------------------------------------
> > ---------------- Thus, this can avoid repeatedly calling
> > 'rte_eth_get_rx_direct_rearm' in the data path to reduce performance.
> >
> > Furthermore, rte_eth_direct_rearm can be implemented as:
> > ----------------------------------------------------------------------
> > ---------------- rte_eth_direct_rearm = i40e_eth_direct_rearm;
> > i40e_eth_direct_rearm {
> > 	rearm_queue_config_a path;
> > 	rearm_queue_config_b path;
> > 	rearm_queue_config_c path;
> > 	rearm_queue_config_d path;
> > }
> > ----------------------------------------------------------------------
> > ---------------- This implementation refers to 'rte_eth_rx_burst' and
> > 'rte_eth_tx_burst'. If there will be different queue types, the
> > implementation will be in pmd layer.
> >
> > 2. I'm not sure whether cross pmd usage is realistic.
> > Consider the tradeoff between performance and different pmd map,
> maybe
> > we can provide two different direct rearm mode for users. One is mixed Rx
> and Tx path and is used for the same pmd.
> > The other is separate Rx and Tx path, and is used for different pmd.
> > Best Regards
> > Feifei
> >>
> >>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Another way - make rte_eth_txq_data totally opaque and allow
> PMD
> >> to
> >>>>>> store there some data that will help it to distinguish expected
> >>>>>> TXQ
> >> format.
> >>>>>> That will allow PMD to keep rx_direct_rearm() the same for all
> >>>>>> supported TXQ formats (it will make decision internally based on
> >>>>>> data stored in txq_data).
> >>>>>> Though in that case you'll probably need one more dev-op to free
> >>>> txq_data.
> >>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-04  7:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27  2:47 [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
2022-09-27  2:47 ` [PATCH v2 1/3] ethdev: add API for direct rearm mode Feifei Wang
2022-10-03  8:58   ` Konstantin Ananyev
2022-10-11  7:19     ` 回复: " Feifei Wang
2022-10-11 22:21       ` Konstantin Ananyev
2022-10-12 13:38         ` 回复: " Feifei Wang
2022-10-13  9:48           ` Konstantin Ananyev
2022-10-26  7:35             ` 回复: " Feifei Wang
2022-10-31 16:36               ` Konstantin Ananyev
2023-01-04  7:39                 ` 回复: " Feifei Wang
2022-09-27  2:47 ` [PATCH v2 2/3] net/i40e: enable " Feifei Wang
2022-09-27  2:47 ` [PATCH v2 3/3] examples/l3fwd: " Feifei Wang
2022-09-30 11:56   ` Jerin Jacob
2022-10-11  7:28     ` 回复: " Feifei Wang
2022-10-11  9:38       ` Jerin Jacob
2022-09-29  6:19 ` 回复: [PATCH v2 0/3] Direct re-arming of buffers on receive side Feifei Wang
2022-09-29 10:29   ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).