DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC v2] ethdev: an API for cache stashing hints
@ 2024-07-15 22:11 Wathsala Vithanage
  2024-07-17  2:27 ` Stephen Hemminger
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Wathsala Vithanage @ 2024-07-15 22:11 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: nd, Wathsala Vithanage, Dhruv Tripathi

An application provides cache stashing hints to the ethernet devices to
improve memory access latencies from the CPU and the NIC. This patch
introduces three distinct hints for this purpose.

The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
(CPU) requires the data written by the NIC immediately. This implies
that the CPU expects to read data from its local cache rather than LLC
or main memory if possible. This would improve memory access latency in
the Rx path. For PCI devices with TPH capability, these hints translate
into DWHR (Device Writes Host Reads) access pattern. This hint is only
valid for receive queues.

The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
the device access the data structure equally. Rx/Tx queue descriptors
fit the description of such data. This hint applies to both Rx and Tx
directions.  In the PCI TPH context, this hint translates into a
Bi-Directional access pattern.

RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
involved in a given device's receive or transmit paths. This implies
that only devices are involved in the IO path. Depending on the
implementation, this hint may result in data getting placed in a cache
close to the device or not cached at all. For PCI devices with TPH
capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
access patterns. This is a bidirectional hint, and it can be applied to
both Rx and Tx queues.  

The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
reads data written by the host (CPU) that may still be in the host's
local cache but is not required by the host anytime soon. This hint is
intended to prevent unnecessary cache invalidations that cause
interconnect latencies when a device writes to a buffer already in host
cache memory. In DPDK, this could happen with the recycling of mbufs
where a mbuf is placed in the Tx queue that then gets back into mempool
and gets recycled back into the Rx queue, all while a copy is being held
in the CPU's local cache unnecessarily. By using this hint on supported
platforms, the mbuf will be invalidated after the device completes the
buffer reading, but it will be well before the buffer gets recycled and
updated in the Rx path. This hint is only valid for transmit queues. 

Applications use three main interfaces in the ethdev library to discover
and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
is used to set hints on an Rx queue. Both of these functions take the
following parameters as inputs: a port_id (the id of the ethernet
device), a cpu_id (the target CPU), a cache_level (the level of the
cache hierarchy the data should be stashed into), a queue_id (the queue
the hints are applied to). In addition to the above list of parameters,
a type parameter indicates the type of the object the application
expects to be stashed by the hardware. Depending on the hardware, these
may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
packet headers, and packet payloads. These are indicated by the macros
RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
type. When an offset is used, the offset parameter in the above two
functions should be set appropriately.

rte_eth_dev_stashing_hints_discover is used to discover the object types
and hints supported in the platform and the device. The function takes
types and hints pointers used as a bit vector to indicate hints and
types supported by the NIC. An application that intends to use stashing
hints should first discover supported hints and types and then use the
functions rte_eth_dev_stashing_hints_tx and
rte_eth_dev_stashing_hints_rx as required to set stashing hints
accordingly. eth_dev_ops structure has been updated with two new ops
that a PMD should implement to support cache stashing hints. A PMD that
intends to support cache stashing hints should initialize the
set_stashing_hints function pointer to a function that issues hints to
the underlying hardware in compliance with platform capabilities. The
same PMD should also implement a function that can return two-bit fields
indicating supported types and hints and then initialize the
discover_stashing_hints function pointer with it. If the NIC supports
cache stashing hints, the NIC should always set the
RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 .mailmap                   |   1 +
 lib/ethdev/ethdev_driver.h |  67 +++++++++++
 lib/ethdev/rte_ethdev.c    | 153 +++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h    | 225 +++++++++++++++++++++++++++++++++++++
 lib/ethdev/version.map     |   6 +
 5 files changed, 452 insertions(+)

diff --git a/.mailmap b/.mailmap
index f1e64286a1..9c28b74655 100644
--- a/.mailmap
+++ b/.mailmap
@@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 883e59a927..b90dc8793b 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1235,6 +1235,70 @@ typedef int (*eth_count_aggr_ports_t)(struct rte_eth_dev *dev);
 typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 					  uint8_t affinity);
 
+/**
+ * @internal
+ * Set cache stashing hint in the ethernet device.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param cpuid
+ *   ID of the targeted CPU.
+ * @param cache_level
+ *   Level of the cache to stash data.
+ * @param queue_id
+ *   List of receive queue ids used in rte_eth_rx_burst().
+ * @param queue_direction
+ *   RTE_ETH_DEV_QUEUE_TYPE_RX if queue that corresponds to queue_id is an
+ *   rx queue.
+ *   RTE_ETH_DEV_QUEUE_TYPE_TX if queue that corresponds to queue_id is a
+ *   tx queue.
+ * @param types
+ *   A vector of stashing types to apply hints on a given queue direction.
+ *   hints are applied on the types specified in types vector.
+ *   types can include queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
+ *   packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
+ *   packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
+ *   to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to a packet.
+ *   types have to be compatible with the queue_direction or an -EINVAL will
+ *   be returned.
+ * @param hints
+ *   Cache stashing hints
+ * @param offset
+ *   Offset into the packet if RTE_ETH_DEV_STASH_TYPE_OFFSET is set in hints.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_set_stashing_hints_t)(struct rte_eth_dev *dev, uint16_t cpuid,
+					uint8_t cache_level,
+					uint16_t queue_id, uint8_t queue_direction,
+					uint16_t types, uint8_t hints, off_t offset);
+
+/**
+ * @internal
+ * Discover cache stashing hints and object types supported in the ethernet device.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param types
+ *   Set bits for supported object types.
+ * @param hints
+ *   Set bits for supported stashing hints.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on NULL values for types or hints parameters.
+ *   On return, types and hints parameters will have bits set for supported
+ *   object types and hints.
+ *   0 on success.
+ */
+typedef int (*eth_discover_stashing_hints_t)(struct rte_eth_dev *dev,
+					     uint16_t *types, uint16_t *hints);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -1257,6 +1321,9 @@ struct eth_dev_ops {
 	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
 	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
 	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
+	eth_set_stashing_hints_t   set_stashing_hints; /**< Set cache stashing*/
+	/**Discover supported stashing hints*/
+	eth_discover_stashing_hints_t discover_stashing_hints;
 	/** Set list of multicast addresses */
 	eth_set_mc_addr_list_t     set_mc_addr_list;
 	mtu_set_t                  mtu_set;       /**< Set MTU */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..fafc94223e 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -153,6 +153,7 @@ static const struct {
 	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
 	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
 	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
+	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
 };
 
 enum {
@@ -7008,4 +7009,156 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
 	return ret;
 }
 
+int
+rte_eth_dev_validate_stashing_hints(uint16_t port_id, uint16_t queue_id,
+				    uint8_t queue_direction, uint16_t types,
+				    uint16_t hints)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	uint16_t nb_queues;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	/*
+	 * Check for invalid types
+	 */
+	if (!RTE_ETH_DEV_STASH_TYPE_VALID(types)) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing type");
+		return -EINVAL;
+	}
+
+	/*
+	 * Ensure that hints (HOST_DONOTNEED, HOST_WILLNEED, BI_DIR_DATA, and
+	 * DEV_ONLY etc.) are not mixed incorrectly in the hint argument.
+	 * Only hints of one queue direction (Rx or Tx) can be combined in the
+	 * hint argument. If the hint argument contains hint types compatible
+	 * with both Rx and Tx directions it can be applied to any queue of the
+	 * two queue types.
+	 */
+	if (!RTE_ETH_DEV_STASH_HINT_IS_RXTX(hints)) {
+		/*
+		 * This is not a Rx and a Tx hint.
+		 * Therefore it can only be applied to single queue direction.
+		 */
+		if (RTE_ETH_DEV_STASH_HINT_IS_TX(hints) ==
+		    RTE_ETH_DEV_STASH_HINT_IS_RX(hints)) {
+			RTE_ETHDEV_LOG_LINE(ERR, "This hint is not compatible "
+					    "with both Rx and Tx paths");
+			return -EINVAL;
+		}
+		/*
+		 * Ensure that hint is compatible with the specified queue
+		 * direction in the queue_direction argument.
+		 */
+		if (((queue_direction == RTE_ETH_DEV_QUEUE_TYPE_TX) &&
+		    RTE_ETH_DEV_STASH_HINT_IS_RX(hints)) ||
+		    ((queue_direction == RTE_ETH_DEV_QUEUE_TYPE_RX) &&
+		    RTE_ETH_DEV_STASH_HINT_IS_TX(hints))) {
+			RTE_ETHDEV_LOG_LINE(ERR, "Hints are not applicable to "
+					    "this queue type");
+			return -EINVAL;
+		}
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	nb_queues = (queue_direction == RTE_ETH_DEV_QUEUE_TYPE_RX) ?
+				      dev->data->nb_rx_queues :
+				      dev->data->nb_tx_queues;
+
+	if (queue_id >= nb_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
+		return -EINVAL;
+	}
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
+		return -ENOTSUP;
+
+	if (*dev->dev_ops->set_stashing_hints == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		return -ENOSYS;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_stashing_hints_rx(uint16_t port_id, uint16_t cpuid,
+			      uint8_t cache_level, uint16_t queue_id,
+			      uint16_t types, off_t offset,
+			      uint16_t hints)
+{
+	struct rte_eth_dev *dev;
+
+	int ret = rte_eth_dev_validate_stashing_hints(port_id, queue_id,
+						      RTE_ETH_DEV_QUEUE_TYPE_RX,
+						      types, hints);
+	if (ret < 0)
+		return ret;
+
+	dev = &rte_eth_devices[port_id];
+
+	return eth_err(port_id, (*dev->dev_ops->set_stashing_hints)(dev, cpuid,
+		       cache_level, queue_id, RTE_ETH_DEV_QUEUE_TYPE_RX,
+		       types, hints, offset));
+}
+
+int
+rte_eth_dev_stashing_hints_tx(uint16_t port_id, uint16_t cpuid,
+			      uint8_t cache_level, uint16_t queue_id,
+			      uint16_t types, off_t offset,
+			      uint16_t hints)
+{
+	struct rte_eth_dev *dev;
+
+	int ret = rte_eth_dev_validate_stashing_hints(port_id, queue_id,
+						      RTE_ETH_DEV_QUEUE_TYPE_TX,
+						      types, hints);
+	if (ret < 0)
+		return ret;
+
+	dev = &rte_eth_devices[port_id];
+
+	return eth_err(port_id,
+		       (*dev->dev_ops->set_stashing_hints) (dev, cpuid,
+		       cache_level, queue_id, RTE_ETH_DEV_QUEUE_TYPE_TX, types,
+		       hints, offset));
+}
+
+int
+rte_eth_dev_stashing_hints_discover(uint16_t port_id, uint16_t *types,
+				    uint16_t *hints)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!types || !hints)
+		return -EINVAL;
+
+	dev = &rte_eth_devices[port_id];
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
+		return -ENOTSUP;
+
+	if (*dev->dev_ops->discover_stashing_hints == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		return -ENOSYS;
+	}
+	return eth_err(port_id,
+		       (*dev->dev_ops->discover_stashing_hints)
+		       (dev, types, hints));
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7..a42f272885 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1648,6 +1648,9 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
+/** Device supports stashing to CPU/system caches. */
+#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
+
 /*
  * Fallback default preferred Rx/Tx port parameters.
  * These are used if an application requests default parameters
@@ -1819,6 +1822,8 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_txportconf;
 	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
 	uint64_t dev_capa;
+	uint16_t stashing_hints_capa;
+	uint16_t stashing_types_capa;
 	/**
 	 * Switching information for ports on a device with a
 	 * embedded managed interconnect/switch.
@@ -5964,6 +5969,226 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
 __rte_experimental
 int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
 
+
+
+/** Queue type is RX. */
+#define RTE_ETH_DEV_QUEUE_TYPE_RX		0
+/** Queue type is TX. */
+#define RTE_ETH_DEV_QUEUE_TYPE_TX		1
+
+/**@{@name Ethernet device cache stashing hints
+ *@see rte_eth_dev_stashing_hints_discover
+ *@see rte_eth_dev_stashing_hints_rx
+ *@see rte_eth_dev_stashing_hints_tx
+ */
+/**
+ * Data read by the device could still be in a CPU local cache memory but
+ * not required by the CPU before ethernet device is done with Tx.
+ * In other words CPU does not mind evicting the relevant cache line(s)
+ * from it's local cache.
+ */
+#define RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED	0x001
+
+/**
+ * Data is read and written equally by the CPU and the NIC.
+ */
+#define RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA	0x100
+
+/**
+ * Data written by the device is read by a CPU immediately. CPU prefers
+ * availability of the data in it's local cache memory by the time read
+ * takes place.
+ */
+#define RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED	0x010
+
+/**
+ * Data written by the device is only read by device.
+ * Host CPUs do not read this data or write to the location of the data.
+ */
+#define RTE_ETH_DEV_STASH_HINT_DEV_ONLY		0x200
+
+
+#define __RTE_ETH_DEV_STASH_HINT_TX_MASK	0x00f
+
+#define __RTE_ETH_DEV_STASH_HINT_RX_MASK	0x0f0
+
+#define __RTE_ETH_DEV_STASH_HINT_RXTX_MASK	0xf00
+
+
+/**@}*/
+
+#define RTE_ETH_DEV_STASH_HINT_IS_TX(h)				\
+	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_TX_MASK))) && (h))
+
+#define RTE_ETH_DEV_STASH_HINT_IS_RX(h)				\
+	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_RX_MASK))) && (h))
+
+#define RTE_ETH_DEV_STASH_HINT_IS_RXTX(h)		\
+	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_RXTX_MASK))) && (h))
+
+/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
+ *@see rte_eth_dev_stashing_hints_discover
+ *@see rte_eth_dev_stashing_hints_rx
+ *@see rte_eth_dev_stashing_hints_tx
+ */
+
+/**
+ * Apply stashing hint to data at a given offset from the start of a
+ * received packet.
+ */
+#define RTE_ETH_DEV_STASH_TYPE_OFFSET	0x0001
+
+/** Apply stashing hint to an rx descriptor. */
+#define RTE_ETH_DEV_STASH_TYPE_DESC	0x0002
+
+/** Apply stashing hint to a header of a received packet. */
+#define RTE_ETH_DEV_STASH_TYPE_HEADER	0x0004
+
+/** Apply stashing hint to a payload of a received packet. */
+#define RTE_ETH_DEV_STASH_TYPE_PAYLOAD	0x0008
+#define __RTE_ETH_DEV_STASH_TYPE_MASK	0x000f
+/**@}*/
+
+#define RTE_ETH_DEV_STASH_TYPE_VALID(t)				\
+	((!((t) & (~__RTE_ETH_DEV_STASH_TYPE_MASK))) && (t))
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * @internal
+ * Helper function to validate stashing hints.
+ */
+__rte_experimental
+int rte_eth_dev_validate_stashing_hints(uint16_t port_id, uint16_t queue_id,
+					uint8_t queue_direction, uint16_t type,
+					uint16_t hint);
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Provide cache stashing hints for improved memory access latencies for
+ * packets received by the NIC. Hints the underlying hardware that CPU indicated
+ * in cpuid parameter prefers to have the data specified in the type parameter
+ * at a level in the memory hierarchy specified in cache_level parameter for
+ * access pattern(s) specified in hints parameter.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param cpuid
+ *  ID of the targeted CPU for the hint.
+ * @param cache_level
+ *  The preferred level of the cache the packets are expected at the time of
+ *  retrieval.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param types
+ *  A vector of stashing types to apply hints on receive queue.
+ *  Hints are applied on the types specified in types vector.
+ *  types can include receive queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
+ *  packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
+ *  packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
+ *  to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to packet.
+ *  Types used should be compatible with RX queues, if not -EINVAL will be
+ *  returned.
+ * @param offset
+ *  Offset into the packet if RTE_ETH_DEV_STASH_TYPE_RX_OFFSET is set in hints.
+ * @param hints
+ *  A vector of stashing hints to the device and the platform.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX types are used in conjuection in type
+ *  parameter.
+ *  - (-EINVAL) if hints are incompatible with RX queues.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_hints_rx(uint16_t port_id, uint16_t cpuid,
+				 uint8_t cache_level, uint16_t queue_id,
+				 uint16_t types, off_t offset, uint16_t hints);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Provide cache stashing hints for improved memory access latencies for
+ * packets being transmitted by the NIC. Hints the underlying hardware that CPU
+ * prefers to have the data specified in the type parameter at a level in the
+ * memory hierarchy specified in cache_level parameter for an access pattern
+ * specified in hints parameter.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param cpuid
+ *  ID of the targeted CPU for the hint.
+ * @param cache_level
+ *  The preferred level of the cache the packets are expected at the time of
+ *  transmission.
+ * @param queue_id
+ *  The index of the transmit queue which hints are applied to.
+ * @param types
+ *  A vector of stashing types to apply hints on transmit queue.
+ *  hints are applied on types specified in types vector.
+ *  types can innclude transmit queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
+ *  packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
+ *  packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
+ *  to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to packet.
+ *  Types used should be compatible with TX queues, if not -EINVAL will be
+ *  returned.
+ * @param offset
+ *  Offset into the packet if RTE_ETH_DEV_STASH_TYPE_RX_OFFSET is set in hints.
+ * @param hints
+ *  A vector of stashing hints to the device and the platform.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX types are used in conjuection in type
+ *  parameter.
+ *  - (-EINVAL) if hints are incompatible with TX queues.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_hints_tx(uint16_t port_id, uint16_t cpuid,
+				 uint8_t cache_level, uint16_t queue_id,
+				 uint16_t types, off_t offset, uint16_t hints);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Discover cache stashing hints and object types supported in the ethernet
+ * device.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param types
+ *  Supported types vector set by the ethernet device.
+ * @param hints
+ *  Supported hints vector set by the ethernet device.
+ * @return
+ *  On return types and hints parameters will have bits set for supported
+ *  object types.
+ *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
+ *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
+ *  feature.
+ *  - (-EINVAL)  on NULL values for types or hints parameters.
+ *  - (0) on success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_hints_discover(uint16_t port_id, uint16_t *types,
+					uint16_t *hints);
+
 #include <rte_ethdev_core.h>
 
 /**
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..5eef0b4540 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -325,6 +325,12 @@ EXPERIMENTAL {
 	rte_flow_template_table_resizable;
 	rte_flow_template_table_resize;
 	rte_flow_template_table_resize_complete;
+
+	# added in 24.07
+	rte_eth_dev_stashing_hints_rx;
+	rte_eth_dev_stashing_hints_tx;
+	rte_eth_dev_stashing_hints_discover;
+	rte_eth_dev_validate_stashing_hints;
 };
 
 INTERNAL {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
@ 2024-07-17  2:27 ` Stephen Hemminger
  2024-07-18 18:48   ` Wathsala Wathawana Vithanage
  2024-07-20  3:05   ` Honnappa Nagarahalli
  2024-07-17 10:32 ` Konstantin Ananyev
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Stephen Hemminger @ 2024-07-17  2:27 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi

On Mon, 15 Jul 2024 22:11:41 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
> 
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions.  In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
> 
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.  
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues. 
> 
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
> 
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

My initial reaction is negative on this. The DPDK does not need more nerd knobs
for performance. If it is a performance win, it should be automatic and handled
by the driver.

If you absolutely have to have another flag, then it should be in existing config
(yes, extend the ABI) rather than adding more flags and calls in ethdev.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
  2024-07-17  2:27 ` Stephen Hemminger
@ 2024-07-17 10:32 ` Konstantin Ananyev
  2024-07-22 11:18 ` Ferruh Yigit
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Konstantin Ananyev @ 2024-07-17 10:32 UTC (permalink / raw)
  To: Wathsala Vithanage, dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi



> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
> 
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions.  In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
> 
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues.
> 
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
> 
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.

Sounds like an interesting idea...
Do you plan to have a reference implementation in one (or few) actual PMDs?
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-17  2:27 ` Stephen Hemminger
@ 2024-07-18 18:48   ` Wathsala Wathawana Vithanage
  2024-07-20  3:05   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-18 18:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, nd

> 
> My initial reaction is negative on this. The DPDK does not need more nerd
> knobs for performance. If it is a performance win, it should be automatic and
> handled by the driver.
> 
> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.


Thanks, Steve, for the feedback. My thesis is that in a DPDK-based packet processing system,
the application is more knowledgeable of memory buffer (packets) usage than the generic
underlying hardware or the PMD (I have provided some examples below with the hint they
would map into). Recognizing such cases, PCI SIG introduced TLP Packet Processing Hints (TPH).
Consequently, many interconnect designers enabled support for TPH in their interconnects so
that based on steering tags provided by an application to a NIC, which sets them in the TLP
header, memory buffers can be targeted toward a CPU at the desired level in the cache hierarchy.
With this proposed API, applications provide cache-stashing hints to ethernet devices to improve
memory access latencies from the CPU and the NIC to improve system performance.

Listed below are some use cases.

- A run-to-completion application may not need the next packet immediately in L1D. It may rather
issue a prefetch and do other work with packet and application data already in L1D before it needs
the next packet. A generic PMD will not know such subtleties in the application endpoint, and it
would resolve to stash buffers into the L1D indiscriminately or not do it at all. But, with a hint from
the application that buffers of the packets will be stashed at a cache level suitable for the
application. (like UNIX MADV_DONOTNEED but for mbufs at cache line granularity)

- Similarly, a pipelined application may use a hint that advice the buffers are needed in L1D as soon
as they arrive. (parallels MADV_WILLNEED)

- Let's call the time between a mbuf being allocated into an Rx queue, freed back into mempool in
the Tx path, and once again reallocated back in the Same Rx queue the "buffer recycle window". 
The length of the buffer recycle window is a function of the application in question; the PMD or the
NIC has no prior knowledge of this property of an application. A buffer may stay in the L1D of a CPU
throughout the entire recycle window if the window is short enough for that application.
An application with a short buffer recycle window may hint to the platform that the Tx buffer is not
needed anytime soon in the CPU cache via a hint to avoid unnecessary cache invalidations when
the buffer gets written by the Rx packet for the second time. (parallels MADV_DONOTNEED)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-17  2:27 ` Stephen Hemminger
  2024-07-18 18:48   ` Wathsala Wathawana Vithanage
@ 2024-07-20  3:05   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 27+ messages in thread
From: Honnappa Nagarahalli @ 2024-07-20  3:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Wathsala Wathawana Vithanage, dev, thomas, Ferruh Yigit,
	Andrew Rybchenko, nd, Dhruv Tripathi



> On Jul 16, 2024, at 9:27 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2024 22:11:41 +0000
> Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
>> An application provides cache stashing hints to the ethernet devices to
>> improve memory access latencies from the CPU and the NIC. This patch
>> introduces three distinct hints for this purpose.
>> 
>> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
>> (CPU) requires the data written by the NIC immediately. This implies
>> that the CPU expects to read data from its local cache rather than LLC
>> or main memory if possible. This would improve memory access latency in
>> the Rx path. For PCI devices with TPH capability, these hints translate
>> into DWHR (Device Writes Host Reads) access pattern. This hint is only
>> valid for receive queues.
>> 
>> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
>> the device access the data structure equally. Rx/Tx queue descriptors
>> fit the description of such data. This hint applies to both Rx and Tx
>> directions.  In the PCI TPH context, this hint translates into a
>> Bi-Directional access pattern.
>> 
>> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
>> involved in a given device's receive or transmit paths. This implies
>> that only devices are involved in the IO path. Depending on the
>> implementation, this hint may result in data getting placed in a cache
>> close to the device or not cached at all. For PCI devices with TPH
>> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
>> access patterns. This is a bidirectional hint, and it can be applied to
>> both Rx and Tx queues.  
>> 
>> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
>> reads data written by the host (CPU) that may still be in the host's
>> local cache but is not required by the host anytime soon. This hint is
>> intended to prevent unnecessary cache invalidations that cause
>> interconnect latencies when a device writes to a buffer already in host
>> cache memory. In DPDK, this could happen with the recycling of mbufs
>> where a mbuf is placed in the Tx queue that then gets back into mempool
>> and gets recycled back into the Rx queue, all while a copy is being held
>> in the CPU's local cache unnecessarily. By using this hint on supported
>> platforms, the mbuf will be invalidated after the device completes the
>> buffer reading, but it will be well before the buffer gets recycled and
>> updated in the Rx path. This hint is only valid for transmit queues. 
>> 
>> Applications use three main interfaces in the ethdev library to discover
>> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
>> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
>> is used to set hints on an Rx queue. Both of these functions take the
>> following parameters as inputs: a port_id (the id of the ethernet
>> device), a cpu_id (the target CPU), a cache_level (the level of the
>> cache hierarchy the data should be stashed into), a queue_id (the queue
>> the hints are applied to). In addition to the above list of parameters,
>> a type parameter indicates the type of the object the application
>> expects to be stashed by the hardware. Depending on the hardware, these
>> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
>> packet headers, and packet payloads. These are indicated by the macros
>> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
>> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
>> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
>> type. When an offset is used, the offset parameter in the above two
>> functions should be set appropriately.
>> 
>> rte_eth_dev_stashing_hints_discover is used to discover the object types
>> and hints supported in the platform and the device. The function takes
>> types and hints pointers used as a bit vector to indicate hints and
>> types supported by the NIC. An application that intends to use stashing
>> hints should first discover supported hints and types and then use the
>> functions rte_eth_dev_stashing_hints_tx and
>> rte_eth_dev_stashing_hints_rx as required to set stashing hints
>> accordingly. eth_dev_ops structure has been updated with two new ops
>> that a PMD should implement to support cache stashing hints. A PMD that
>> intends to support cache stashing hints should initialize the
>> set_stashing_hints function pointer to a function that issues hints to
>> the underlying hardware in compliance with platform capabilities. The
>> same PMD should also implement a function that can return two-bit fields
>> indicating supported types and hints and then initialize the
>> discover_stashing_hints function pointer with it. If the NIC supports
>> cache stashing hints, the NIC should always set the
>> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
>> 
>> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> 
> My initial reaction is negative on this. The DPDK does not need more nerd knobs
> for performance. If it is a performance win, it should be automatic and handled
> by the driver.
> 
IMO, DPDK provides low level APIs and they should provide flexibility for users to control what part of the data from NIC is stashed where. For ex: currently available systems across multiple architectures provide system wide configuration to control stashing data from the NIC to system cache. The configuration allows for all the data from NIC to be stated or none. Whereas some applications need access to just the headers and some others need access to all the packet data. 

> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.
Agree. Extending the ABI would result in a better solution rather than another set of APIs.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
  2024-07-17  2:27 ` Stephen Hemminger
  2024-07-17 10:32 ` Konstantin Ananyev
@ 2024-07-22 11:18 ` Ferruh Yigit
  2024-07-26 20:01   ` Wathsala Wathawana Vithanage
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
  2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
  4 siblings, 1 reply; 27+ messages in thread
From: Ferruh Yigit @ 2024-07-22 11:18 UTC (permalink / raw)
  To: Wathsala Vithanage, dev, Thomas Monjalon, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi

On 7/15/2024 11:11 PM, Wathsala Vithanage wrote:
> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
> 
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions.  In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
> 
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.  
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues. 
> 
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
> 
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> 

This is a fine grained config for performance improvement, it may help
to see the performance impact and driver implementation complexity,
before deciding how practical it is.
As ethdev API, as long as it is separate set of APIs, I don't see much
concern to have them.

In existing FEC APIs, and now in the speed lane patchset [1], we are
following similar design with three APIs:
rte_eth_X_set()
rte_eth_X_get()
rte_eth_X_get_capability()

Instead of adding RTE_ETH_DEV_CAPA_ macro and contaminating
'rte_eth_dev_info' with this edge use case, what do you think follow
above design and have dedicated get capability API?

And I can see set() has two different APIs,
'rte_eth_dev_stashing_hints_rx' & 'rte_eth_dev_stashing_hints_tx', is
there a reason to have two separate APIs instead of having one which
gets RX & TX as argument, as done in internal device ops?



[1]
https://patches.dpdk.org/project/dpdk/list/?series=32450&state=*




^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-22 11:18 ` Ferruh Yigit
@ 2024-07-26 20:01   ` Wathsala Wathawana Vithanage
  2024-09-22 21:43     ` Ferruh Yigit
  0 siblings, 1 reply; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-26 20:01 UTC (permalink / raw)
  To: Ferruh Yigit, dev, thomas, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi, Honnappa Nagarahalli, nd

> rte_eth_X_get_capability()
> 

rte_eth_dev_stashing_hints_discover is somewhat similar.

> Instead of adding RTE_ETH_DEV_CAPA_ macro and contaminating
> 'rte_eth_dev_info' with this edge use case, what do you think follow above
> design and have dedicated get capability API?

I think it's better to have a dedicated API, given that we already have a fine
grained capabilities discovery function. I will add this feedback to V3 of the
RFC.

> 
> And I can see set() has two different APIs, 'rte_eth_dev_stashing_hints_rx' &
> 'rte_eth_dev_stashing_hints_tx', is there a reason to have two separate APIs
> instead of having one which gets RX & TX as argument, as done in internal
> device ops?

Some types/hints may only apply to a single queue direction, so I thought it
would be better to separate them out into separate Rx and Tx APIs for ease
of comprehension/use for the developer.
In fact, underneath, it uses one API for both Rx and Tx.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-26 20:01   ` Wathsala Wathawana Vithanage
@ 2024-09-22 21:43     ` Ferruh Yigit
  2024-10-04 17:52       ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: Ferruh Yigit @ 2024-09-22 21:43 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage, dev, thomas, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi, Honnappa Nagarahalli, Varghese, Vipin

On 7/26/2024 9:01 PM, Wathsala Wathawana Vithanage wrote:
>> rte_eth_X_get_capability()
>>
> 
> rte_eth_dev_stashing_hints_discover is somewhat similar.
> 
>> Instead of adding RTE_ETH_DEV_CAPA_ macro and contaminating
>> 'rte_eth_dev_info' with this edge use case, what do you think follow above
>> design and have dedicated get capability API?
> 
> I think it's better to have a dedicated API, given that we already have a fine
> grained capabilities discovery function. I will add this feedback to V3 of the
> RFC.
> 
>>
>> And I can see set() has two different APIs, 'rte_eth_dev_stashing_hints_rx' &
>> 'rte_eth_dev_stashing_hints_tx', is there a reason to have two separate APIs
>> instead of having one which gets RX & TX as argument, as done in internal
>> device ops?
> 
> Some types/hints may only apply to a single queue direction, so I thought it
> would be better to separate them out into separate Rx and Tx APIs for ease
> of comprehension/use for the developer.
> In fact, underneath, it uses one API for both Rx and Tx.
> 

Hi Wathsala,

Do you still pursue this RFC, should we expect a new version for this
release?

Did you have any change to measure the impact of the changes in this patch?


Btw, do you think the LLC aware lcore selection patch [1] can be
relevant or can it help for the cases this patch addresses?

[1]
https://patches.dpdk.org/project/dpdk/list/?series=32851

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-09-22 21:43     ` Ferruh Yigit
@ 2024-10-04 17:52       ` Stephen Hemminger
  2024-10-04 18:46         ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2024-10-04 17:52 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Wathsala Wathawana Vithanage, dev, thomas, Andrew Rybchenko, nd,
	Dhruv Tripathi, Honnappa Nagarahalli, Varghese, Vipin

On Sun, 22 Sep 2024 22:43:55 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >>
> >> And I can see set() has two different APIs, 'rte_eth_dev_stashing_hints_rx' &
> >> 'rte_eth_dev_stashing_hints_tx', is there a reason to have two separate APIs
> >> instead of having one which gets RX & TX as argument, as done in internal
> >> device ops?  
> > 
> > Some types/hints may only apply to a single queue direction, so I thought it
> > would be better to separate them out into separate Rx and Tx APIs for ease
> > of comprehension/use for the developer.
> > In fact, underneath, it uses one API for both Rx and Tx.
> >   
> 
> Hi Wathsala,
> 
> Do you still pursue this RFC, should we expect a new version for this
> release?
> 
> Did you have any change to measure the impact of the changes in this patch?
> 
> 
> Btw, do you think the LLC aware lcore selection patch [1] can be
> relevant or can it help for the cases this patch addresses?


Don't think this is ready for 24.11 release. The patch fails multiple
tests, has some doc issues and would need more exmaples and support.

Since it requires an ABI change, if you want to pursue it further
send a new version and maybe it will be ready for next ABI breaking release 25.11.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-10-04 17:52       ` Stephen Hemminger
@ 2024-10-04 18:46         ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-04 18:46 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: dev, thomas, Andrew Rybchenko, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Varghese, Vipin, nd

> > >>
> > >> And I can see set() has two different APIs,
> > >> 'rte_eth_dev_stashing_hints_rx' & 'rte_eth_dev_stashing_hints_tx',
> > >> is there a reason to have two separate APIs instead of having one
> > >> which gets RX & TX as argument, as done in internal device ops?
> > >
> > > Some types/hints may only apply to a single queue direction, so I
> > > thought it would be better to separate them out into separate Rx and
> > > Tx APIs for ease of comprehension/use for the developer.
> > > In fact, underneath, it uses one API for both Rx and Tx.
> > >
> >
> > Hi Wathsala,
> >
> > Do you still pursue this RFC, should we expect a new version for this
> > release?
> >
> > Did you have any change to measure the impact of the changes in this patch?
> >
> >
> > Btw, do you think the LLC aware lcore selection patch [1] can be
> > relevant or can it help for the cases this patch addresses?
> 
> 
> Don't think this is ready for 24.11 release. The patch fails multiple tests, has
> some doc issues and would need more exmaples and support.
> 
> Since it requires an ABI change, if you want to pursue it further send a new
> version and maybe it will be ready for next ABI breaking release 25.11.

I agree, I don't think it will be ready for 24.11 release. I will be sending out another version of this
in the coming days.
As we discussed in the summit, this also requires some work in the kernel to make it functional.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
                   ` (2 preceding siblings ...)
  2024-07-22 11:18 ` Ferruh Yigit
@ 2024-10-21  1:52 ` Wathsala Vithanage
  2024-10-21  1:52   ` [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
                     ` (3 more replies)
  2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
  4 siblings, 4 replies; 27+ messages in thread
From: Wathsala Vithanage @ 2024-10-21  1:52 UTC (permalink / raw)
  Cc: dev, nd, Wathsala Vithanage

DPDK applications benefit from Direct Cache Access (DCA) features like
Intel DDIO and Arm's write-allocate-to-SLC. However, those features do
not allow fine-grained control of direct cache access, such as stashing
packets into upper-level caches (L2 caches) of a processor or the shared
cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses this need
in a vendor-agnostic manner. TPH capability has existed since
PCI Express Base Specification revision 3.0; today, numerous Network
Interface Cards and interconnects from different vendors support TPH
capability. TPH comprises a steering tag (ST) and a processing hint
(PH). ST specifies the cache level of a CPU at which the data should be
written to (or DCAed into), while PH is a hint provided by the PCIe
requester to the completer on an upcoming traffic pattern. Some NIC
vendors bundle TPH capability with fine-grained control over the type of
objects that can be stashed into CPU caches, such as

- Rx/Tx queue descriptors
- Packet-headers
- Packet-payloads
- Data from a given offset from the start of a packet

Note that stashable object types are outside the scope of PCIe standard;
therefore, vendors could support any combination of the above items as
they see fit.

To enable TPH and fine-grained packet stashing, this API extends the
ethdev library, PCI library, and the PCI driver. In this design, the
application via the ethdev stashing API provides hints to the PMD to
indicate the underlying hardware at which processor and cache level it
prefers a packet to end up. Once the PMD receives a CPU and a
cache-level combination, it must extract the matching ST from the TPH
ACPI _DSM of the PCIe root port to which the NIC is connected. To
facilitate the extraction of STs, the PCI library and the PCI driver
APIs are extended.

PMD's implementation of eth_dev_ops stashing_rx_hints_set and
stashing_tx_hints_set function pointers are responsible for extracting
the ST. The PCI bus driver provides the generic TPH ST extraction API
that can be used by any PMD that drives a PCIe device. The extraction
process begins by calling rte_pci_extract_tph_st() function in
drivers/bus/pci/rte_bus_pci.h, which takes an initialized input object
rte_tph_acpi__dsm_args and a pointer to rte_tph_acpi__dsm_return to
store the ST returned by the TPH _DSM. rte_tph_acpi__dsm_arg and
rte_tph_acpi__dsm_return objects are defined in lib/pci/rte_pci_tph.h as
defined by the PCIe firmware specification and the associated ECN titled
"Revised _DSM for Cache Locality TPH Features". The helper function
rte_init_tph_acpi__dsm_args is used by the rte_pci_extract_tph_st() to
convert lcore_id and cache_level provided by the PMD into well-formatted
rte_tph_acpi__dsm_args. The processor or, in some cases, a container ID
(which is synonymous with a core complex of a chiplet die) and the cache
level in the rte_tph_acpi__dsm_args structure are not the same as the
lcore_id and the cache_level provided by the application to the ethdev
library, which PMD passes down to the rte_pci_extract_st() function. The
rte_init_tph_acpi__dsm_args helper converts lcore_id to an APIC
processor-id or a PPTT processor-container-id if the container of the
lcore_id was requested as the target by the application. Similarly, it
must convert cache_level to a PPTT cache-reference-id. These conversions
are possible with the hwloc library or some other library DPDK may
eventually provide. However, DPDK cannot execute the TPH _DSM directly,
as it can only be done with kernel privileges. Therefore, appropriate
mechanisms must be established in supported Operating Systems(Linux,
FreeBSD, and Windows) to expose the _DSM return for a given argument.
For instance, on Linux, this mechanism could be sysfs. Therefore, the
implementation of rte_pci_extract_tph_st() is done in OS-specific files
drivers/bus/pci/{bsd, linux, windows}/pci.c.

Once the ST is acquired from the OS-specific method described earlier,
the stashing_rx_hints_set/stashing_tx_hints_set PMD implementations are
ready to set the ST. As per PCIe specification, hints can be put on the
MSI-X tables or using a device-specific method. Considering this, many
NICs that support TPH allow setting steering tags and processing hints
on the device's MSI-X table and queue contexts. For PMDs, setting the ST
on queue contexts is the only viable method of using TPH. Therefore, the
DPDK can only support setting ST in queue contexts. An application uses
the cache stashing ethdev API by first calling the
rte_eth_dev_stashing_capabilities_get() function to find out what object
types can be stashed into a processor cache by the NIC out of the object
types in the bulleted list above. This function takes a port_id and a
pointer to a uint16_t to report back the object type flags. PMD
implements the stashing_capabilities_get function pointer in
eth_dev_ops. If the underlying platform or the NIC does not support TPH,
this function returns -ENOTSUP and the application should consider any
values stored in the objects pointer invalid.

Once the application knows the supported object types that can be
stashed, the next step is to set the steering tags for the packets
associated with Rx and Tx queues via
rte_eth_dev_stashing_rx_config_set() and
rte_eth_dev_stashing_tx_config_set() ethdev library function
respectively. These functions execute the  rte_pci_extract_tph_st() via
eth_dev_ops pointers stashing_rx_hints_set and stashing_tx_hints_set.
Both the functions have an identical signature, a port_id, a queue_id,
and a config object. The port_id and the queue-id are used to locate the
device and the queue. The config object is of type struct
rte_eth_stashing_config, which specifies the lcore_id and the
cache_level, indicating where objects from this queue should be stashed.
It also has the field 'container' to indicate if the target should be
the container of the processor specified by the lcore_id in a
chiplet-based SoC. The 'objects' field in the config sets the types of
objects the application wishes to stash based on the capabilities found
earlier. If the objects field includes the flag
RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
the desired offset. These functions invoke PMD implementations of the
stashing functionality via stashing_rx_hints_set and
stashing_tx_hints_set, function pointers in eth_dev_ops, respectively.


Wathsala Vithanage (2):
  pci: introduce the PCIe TLP Processing Hints API
  ethdev: introduce the cache stashing hints API

 drivers/bus/pci/bsd/pci.c     |  12 +++
 drivers/bus/pci/linux/pci.c   |  12 +++
 drivers/bus/pci/rte_bus_pci.h |  22 +++++
 drivers/bus/pci/version.map   |   3 +
 drivers/bus/pci/windows/pci.c |  14 +++
 lib/ethdev/ethdev_driver.h    |  66 ++++++++++++++
 lib/ethdev/rte_ethdev.c       | 120 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h       | 156 ++++++++++++++++++++++++++++++++++
 lib/ethdev/version.map        |   4 +
 lib/pci/meson.build           |   2 +
 lib/pci/rte_pci.h             |   2 +
 lib/pci/rte_pci_tph.c         |  20 +++++
 lib/pci/rte_pci_tph.h         | 111 ++++++++++++++++++++++++
 13 files changed, 544 insertions(+)
 create mode 100644 lib/pci/rte_pci_tph.c
 create mode 100644 lib/pci/rte_pci_tph.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
@ 2024-10-21  1:52   ` Wathsala Vithanage
  2024-10-21  1:52   ` [RFC v3 2/2] ethdev: introduce the cache stashing hints API Wathsala Vithanage
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Wathsala Vithanage @ 2024-10-21  1:52 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Gaetan Rivet
  Cc: dev, nd, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the PCI driver and the library to extract the Steering Tag (ST)
for a given Processor/Processor Container and Cache ID pair and validate
a Processing Hint from a TPH _DSM associated with a root port device.
The rte_pci_device structure passed into the rte_pci_extract_tph_st()
function could be a device or a root port. If it's a device, the
function should trace it back to the root port and use its TPH _DSM to
extract STs. The implementation of rte_pci_extract_tph_st() is dependent
on the operating system.

rte_pci_extract_tph_st() should also be supplied with a
rte_tph_acpi__dsm_args, and a rte_tph_acpi__dsm_return structures.
These two structures are defined in the PCI library and comply with the
TPH _DSM argument and return encoding specified in the PCI firmware ECN
titled "Revised _DSM for Cache Locality TPH Features.". Use of
rte_init_tph_acpi__dsm_args() is recommended for initializing the
rte_tph_acpi__dsm_args struct which is capable of converting lcore ID,
the cache level into values understood by the ACPI _DSM function.
rte_tph_acpi__dsm_return struct will be initialized with the values
returned by the TPH _DSM; it is up to the caller to use these values per
the device's capabilities.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

---
 drivers/bus/pci/bsd/pci.c     |  12 ++++
 drivers/bus/pci/linux/pci.c   |  12 ++++
 drivers/bus/pci/rte_bus_pci.h |  22 +++++++
 drivers/bus/pci/version.map   |   3 +
 drivers/bus/pci/windows/pci.c |  14 +++++
 lib/pci/meson.build           |   2 +
 lib/pci/rte_pci.h             |   2 +
 lib/pci/rte_pci_tph.c         |  21 +++++++
 lib/pci/rte_pci_tph.h         | 111 ++++++++++++++++++++++++++++++++++
 9 files changed, 199 insertions(+)
 create mode 100644 lib/pci/rte_pci_tph.c
 create mode 100644 lib/pci/rte_pci_tph.h

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 2f88252418..a143cecf45 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -639,3 +639,15 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_extract_tph_st(const struct rte_pci_device *dev,
+		       const struct rte_tph_acpi__dsm_args *args,
+		       struct rte_tph_acpi__dsm_return *ret)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(args);
+	RTE_SET_USED(ret);
+	/* BSD doesn't support this feature yet! */
+	return -1;
+}
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 9056035b33..dffb945462 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -803,3 +803,15 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_extract_tph_st(const struct rte_pci_device *dev,
+		       const struct rte_tph_acpi__dsm_args *args,
+		       struct rte_tph_acpi__dsm_return *ret)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(args);
+	RTE_SET_USED(ret);
+	/* Linux doesn't support this feature yet! */
+	return -1;
+}
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 19a7b15b99..a8167e9b4b 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -312,6 +312,28 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Extract steering tag from the ACPI TPH _DSM of the root port
+ * of the device is connected to.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use.
+ * @param args
+ *   An initialized args object for the _DSM.
+ * @param ret
+ *   A pointer to a _DSM return object to store the extracted steering tag.
+ * @return
+ *   0 on success, -1 on error extracting the steeting tag.
+ */
+__rte_experimental
+int rte_pci_extract_tph_st(const struct rte_pci_device *device,
+			   const struct rte_tph_acpi__dsm_args *args,
+			   struct rte_tph_acpi__dsm_return *ret);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index cd653de5ac..5c89f80c8e 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -31,6 +31,9 @@ EXPERIMENTAL {
 	rte_pci_find_capability;
 	rte_pci_find_next_capability;
 	rte_pci_has_capability_list;
+
+	# added in 24.11
+	rte_pci_extract_tph_st;
 };
 
 INTERNAL {
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index 36e6f89093..761f714a18 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -500,3 +500,17 @@ rte_pci_scan(void)
 
 	return ret;
 }
+
+
+int
+rte_pci_extract_tph_st(const struct rte_pci_device *dev,
+		       const struct rte_tph_acpi__dsm_args *args,
+		       struct rte_tph_acpi__dsm_return *ret)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(args);
+	RTE_SET_USED(ret);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
diff --git a/lib/pci/meson.build b/lib/pci/meson.build
index dd41cd5068..85e17c4257 100644
--- a/lib/pci/meson.build
+++ b/lib/pci/meson.build
@@ -3,3 +3,5 @@
 
 sources = files('rte_pci.c')
 headers = files('rte_pci.h')
+headers = files('rte_pci_tph.h')
+headers = files('rte_pci_tph.c')
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 9a50a12142..b7897640f1 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -16,6 +16,8 @@
 #include <inttypes.h>
 #include <sys/types.h>
 
+#include <rte_pci_tph.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/lib/pci/rte_pci_tph.c b/lib/pci/rte_pci_tph.c
new file mode 100644
index 0000000000..3b0c7d4d97
--- /dev/null
+++ b/lib/pci/rte_pci_tph.c
@@ -0,0 +1,21 @@
+#include <errno.h>
+#include <rte_pci_tph.h>
+
+int
+rte_init_tph_acpi__dsm_args(uint16_t lcore_id, uint8_t type,
+			    uint8_t cache_level, uint8_t ph,
+			    struct rte_tph_acpi__dsm_args *args)
+{
+	RTE_SET_USED(lcore_id);
+	RTE_SET_USED(type);
+	RTE_SET_USED(cache_level);
+	RTE_SET_USED(ph);
+
+	if (!args)
+		return -EINVAL;
+	/* Use libhwloc or other mechanism provided by DPDK to
+	 * map lcore_id and cache_level to hardware IDs for
+	 * initializing args.
+	 */
+	return -ENOTSUP;
+}
diff --git a/lib/pci/rte_pci_tph.h b/lib/pci/rte_pci_tph.h
new file mode 100644
index 0000000000..df851f5744
--- /dev/null
+++ b/lib/pci/rte_pci_tph.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Arm Ltd.
+ */
+
+#ifndef _RTE_PCI_TPH_H_
+#define _RTE_PCI_TPH_H_
+
+/**
+ * @file
+ *
+ * RTE PCI TLP Processing Hints helpers
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_common.h>
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
+ *
+ * ACPI TPH _DSM input args structure.
+ * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for details.
+ */
+struct rte_tph_acpi__dsm_args {
+	uint32_t feature_id; /**< Always 0. */
+	struct {
+		/** APIC/PPTT Processor/Processor container ID. */
+		uint32_t uid;
+	} __rte_packed featureArg1; /**< 1st Arg. */
+	struct {
+		/** Intended ph bits just for validating. */
+		uint64_t ph : 2;
+		/** If type=1 uid is Processor container ID. */
+		uint64_t type :  1;
+		/** cache_reference is valid if cache_ref_valid=1. */
+		uint64_t cache_ref_valid : 1;
+		uint64_t reserved : 28;
+		/** PPTT cache ID of the desired target. */
+		uint64_t cache_refernce : 32;
+	} __rte_packed featureArg2; /**< 2ns Arg. */
+} __rte_packed;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
+ *
+ * ACPI TPH _DSM return structure.
+ * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for details.
+ */
+struct rte_tph_acpi__dsm_return {
+	uint64_t vmem_st_valid : 1; /**< if set to 1, vmem_st (8-bit ST) is valid. */
+	/** if set to 1, vmem_ext_st (16-bit vmem ST) is valid. */
+	uint64_t vmem_ext_st_valid : 1;
+	/** if set to 1, ph bits in input args is valid. */
+	uint64_t vmem_ph_ignore : 1;
+	uint64_t reserved_1 : 5;
+	/** 8-bit volatile memory ST) */
+	uint64_t vmem_st : 8;
+	/** 16-bit volatile ST) */
+	uint64_t vmem_ext_st : 16;
+	uint64_t pmem_st_valid : 1;  /**< if set to 1, pmem_st (8-bit ST) is valid. */
+	/** if set to 1, pmem_ext_st (16-bit ST) is valid. */
+	uint64_t pmem_ext_st_valid : 1;
+	/** if set to 1, ph bits in input args are valid for persistent memory. */
+	uint64_t pmem_ph_ignore : 1;
+	uint64_t reserved_2 : 5;
+	/** 8-bit persistent memory ST) */
+	uint64_t pmem_st : 8;
+	/** 16-bit persistent memory ST) */
+	uint64_t pmem_ext_st : 16;
+} __rte_packed;
+
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Initializes stashing hints configuration with a platform specific stashing hint
+ * that matches the lcore_id and cache_level.
+ *
+ * @param lcore_id
+ *  The lcore_id of the processor of the cache stashing target. If is_container is set
+ *  the target is the processor container of the CPU specified by the lcore_id.
+ * @param type
+ *  If set to 1, the procssor container of the processor specified by lcore_id will be
+ *  used at the stashing target. If set to 0, processor specified by the lcore_id will be
+ *  used as the stashing target.
+ * @param cache_level
+ *  The cache level of the processor/container specified by the lcore_id.
+ * @param ph
+ *  TPH Processing Hints bits.
+ * @param args
+ *  ACPI TPH _DSM object arguments structure.
+ * @return
+ *  - (0) on Success.
+ *  - 0 < or 0 > on Failure.
+ */
+
+int rte_init_tph_acpi__dsm_args(uint16_t lcore_id, uint8_t type,
+				uint8_t cache_level, uint8_t ph,
+				struct rte_tph_acpi__dsm_args *args);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCI_TPH_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
  2024-10-21  1:52   ` [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
@ 2024-10-21  1:52   ` Wathsala Vithanage
  2024-10-21  7:36     ` Morten Brørup
  2024-10-24  5:49     ` Jerin Jacob
  2024-10-21  7:35   ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Chenbo Xia
  2024-10-22  1:12   ` Stephen Hemminger
  3 siblings, 2 replies; 27+ messages in thread
From: Wathsala Vithanage @ 2024-10-21  1:52 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, nd, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the ethdev library to enable the stashing of different data
objects, such as the ones listed below, into CPU caches directly
from the NIC.

- Rx/Tx queue descriptors
- Rx packets
- Packet headers
- packet payloads
- Data of a packet at an offset from the start of the packet

The APIs are designed in a hardware/vendor agnostic manner such that
supporting PMDs could use any capabilities available in the underlying
hardware for fine-grained stashing of data objects into a CPU cache
(e.g., Steering Tags int PCIe TLP Processing Hints).

The API provides an interface to query the availability of stashing
capabilities, i.e., platform/NIC support, stashable object types, etc,
via the rte_eth_dev_stashing_capabilities_get interface.

The function pair rte_eth_dev_stashing_rx_config_set and
rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
cache level, and data object types) on the Rx and Tx queues.

PMDs that support stashing must register their implementations with the
following eth_dev_ops callbacks, which are invoked by the ethdev
functions listed above.

- stashing_capabilities_get
- stashing_rx_hints_set
- stashing_tx_hints_set

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

---
 lib/ethdev/ethdev_driver.h |  66 +++++++++++++++
 lib/ethdev/rte_ethdev.c    | 120 +++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h    | 161 +++++++++++++++++++++++++++++++++++++
 lib/ethdev/version.map     |   4 +
 4 files changed, 351 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1fd4562b40..7caaea54a8 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1367,6 +1367,68 @@ enum rte_eth_dev_operation {
 typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
 					    enum rte_eth_dev_operation op);
 
+/**
+ * @internal
+ * Set cache stashing hints in Rx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Rx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Set cache stashing hints in Tx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Tx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Get cache stashing object types supported in the ethernet device.
+ * The return value indicates availability of stashing hints support
+ * in the hardware and the PMD.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param objects
+ *   PMD sets supported bits on return.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on NULL values for types or hints parameters.
+ *   On return, types and hints parameters will have bits set for supported
+ *   object types and hints.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
+					     uint16_t *objects);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -1393,6 +1455,10 @@ struct eth_dev_ops {
 	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
 	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
 	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
+	eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache stashing*/
+	eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache stashing*/
+	/** Get supported stashing hints*/
+	eth_stashing_capabilities_get_t stashing_capabilities_get;
 	/** Set list of multicast addresses */
 	eth_set_mc_addr_list_t     set_mc_addr_list;
 	mtu_set_t                  mtu_set;       /**< Set MTU */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 6413c54e3b..d9bcc6c13d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -153,6 +153,7 @@ static const struct {
 	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
 	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
 	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
+	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
 };
 
 enum {
@@ -7163,4 +7164,123 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
 	return ret;
 }
 
+int
+rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+				     uint8_t queue_direction,
+				     struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	uint16_t nb_queues;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!config) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing configuration");
+		return -EINVAL;
+	}
+
+	/*
+	 * Check for invalid objects
+	 */
+	if (!RTE_ETH_DEV_STASH_OBJECTS_VALID(config->objects)) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing objects");
+		return -EINVAL;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	nb_queues = (queue_direction == RTE_ETH_DEV_RX_QUEUE) ?
+				      dev->data->nb_rx_queues :
+				      dev->data->nb_tx_queues;
+
+	if (queue_id >= nb_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
+		return -EINVAL;
+	}
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
+		return -ENOTSUP;
+
+	if (*dev->dev_ops->stashing_rx_hints_set == NULL ||
+	    *dev->dev_ops->stashing_tx_hints_set == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		return -ENOSYS;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+
+	int ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						      RTE_ETH_DEV_RX_QUEUE,
+						      config);
+	if (ret < 0)
+		return ret;
+
+	dev = &rte_eth_devices[port_id];
+
+	return eth_err(port_id,
+		       (*dev->dev_ops->stashing_rx_hints_set)(dev, queue_id,
+		       config));
+}
+
+int
+rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+
+	int ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						      RTE_ETH_DEV_TX_QUEUE,
+						      config);
+	if (ret < 0)
+		return ret;
+
+	dev = &rte_eth_devices[port_id];
+
+	return eth_err(port_id,
+		       (*dev->dev_ops->stashing_rx_hints_set) (dev, queue_id,
+		       config));
+}
+
+int
+rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!objects)
+		return -EINVAL;
+
+	dev = &rte_eth_devices[port_id];
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
+		return -ENOTSUP;
+
+	if (*dev->dev_ops->stashing_capabilities_get == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		return -ENOSYS;
+	}
+	return eth_err(port_id,
+		       (*dev->dev_ops->stashing_capabilities_get)
+		       (dev, objects));
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c4241d048c..c08f60ad4c 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1653,6 +1653,9 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
+/** Device supports stashing to CPU/system caches. */
+#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
+
 /*
  * Fallback default preferred Rx/Tx port parameters.
  * These are used if an application requests default parameters
@@ -1824,6 +1827,7 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_txportconf;
 	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
 	uint64_t dev_capa;
+	uint16_t stashing_capa;
 	/**
 	 * Switching information for ports on a device with a
 	 * embedded managed interconnect/switch.
@@ -6115,6 +6119,163 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
 __rte_experimental
 int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
 
+
+
+/** Queue type is RX. */
+#define RTE_ETH_DEV_RX_QUEUE		0
+/** Queue type is TX. */
+#define RTE_ETH_DEV_TX_QUEUE		1
+
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
+ *
+ * A structure used for configuring the cache stashing hints.
+ */
+struct rte_eth_stashing_config {
+	/** ID of the Processor/Container the stashing hints are
+	 *  applied to
+	 */
+	uint16_t	lcore_id;
+	/** Set if the target is a CPU containeri.lcore_id will be
+	 * used to derive container ID
+	 */
+	uint16_t	container : 1;
+	uint16_t	padding : 7;
+	/** Cache level of the CPU specified by the cpu_id the
+	 *  stashing hints are applied to
+	 */
+	uint16_t	cache_level : 8;
+	/** Object types the configuration is applied to
+	 */
+	uint16_t	objects;
+	/** The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
+	 *  in objects
+	 */
+	off_t		offset;
+};
+
+/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
+ *@see rte_eth_dev_stashing_capabilities_get
+ *@see rte_eth_dev_stashing_rx_config_set
+ *@see rte_eth_dev_stashing_tx_config_set
+ */
+
+/**
+ * Apply stashing hint to data at a given offset from the start of a
+ * received packet.
+ */
+#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
+
+/** Apply stashing hint to an rx descriptor. */
+#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
+
+/** Apply stashing hint to a header of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
+
+/** Apply stashing hint to a payload of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
+
+#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
+/**@}*/
+
+#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)				\
+	((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * @internal
+ * Helper function to validate stashing hints configuration.
+ */
+__rte_experimental
+int rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+					 uint8_t queue_direction,
+					 struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Provide cache stashing hints for improved memory access latencies for
+ * packets received by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Configure cache stashing for improved memory access latencies for Tx
+ * queue completion descriptors being sent to host system by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types are used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) if hints are incompatible with TX queues.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Discover cache stashing objects supported in the ethernet device.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param objects
+ *  Supported objects vector set by the ethernet device.
+ * @return
+ *  On return types and hints parameters will have bits set for supported
+ *  object types.
+ *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
+ *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
+ *  feature.
+ *  - (-EINVAL)  on NULL values for types or hints parameters.
+ *  - (0) on success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects);
+
 #include <rte_ethdev_core.h>
 
 #ifdef __cplusplus
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 12f48c70a0..49c8c46a00 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -337,6 +337,10 @@ EXPERIMENTAL {
 	rte_eth_timesync_adjust_freq;
 	rte_flow_async_create_by_index_with_pattern;
 	rte_tm_node_query;
+	rte_eth_dev_stashing_rx_config_set;
+	rte_eth_dev_stashing_tx_config_set;
+	rte_eth_dev_stashing_capabilities_get;
+	rte_eth_dev_validate_stashing_config;
 };
 
 INTERNAL {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
  2024-10-21  1:52   ` [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
  2024-10-21  1:52   ` [RFC v3 2/2] ethdev: introduce the cache stashing hints API Wathsala Vithanage
@ 2024-10-21  7:35   ` Chenbo Xia
  2024-10-21 12:01     ` Wathsala Wathawana Vithanage
  2024-10-22  1:12   ` Stephen Hemminger
  3 siblings, 1 reply; 27+ messages in thread
From: Chenbo Xia @ 2024-10-21  7:35 UTC (permalink / raw)
  To: Wathsala Vithanage; +Cc: dev, nd

Hi,

> On Oct 21, 2024, at 09:52, Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> DPDK applications benefit from Direct Cache Access (DCA) features like
> Intel DDIO and Arm's write-allocate-to-SLC. However, those features do
> not allow fine-grained control of direct cache access, such as stashing
> packets into upper-level caches (L2 caches) of a processor or the shared
> cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses this need
> in a vendor-agnostic manner. TPH capability has existed since
> PCI Express Base Specification revision 3.0; today, numerous Network
> Interface Cards and interconnects from different vendors support TPH
> capability. TPH comprises a steering tag (ST) and a processing hint
> (PH). ST specifies the cache level of a CPU at which the data should be
> written to (or DCAed into), while PH is a hint provided by the PCIe
> requester to the completer on an upcoming traffic pattern. Some NIC
> vendors bundle TPH capability with fine-grained control over the type of
> objects that can be stashed into CPU caches, such as
> 
> - Rx/Tx queue descriptors
> - Packet-headers
> - Packet-payloads
> - Data from a given offset from the start of a packet
> 
> Note that stashable object types are outside the scope of PCIe standard;
> therefore, vendors could support any combination of the above items as
> they see fit.
> 
> To enable TPH and fine-grained packet stashing, this API extends the
> ethdev library, PCI library, and the PCI driver. In this design, the
> application via the ethdev stashing API provides hints to the PMD to
> indicate the underlying hardware at which processor and cache level it
> prefers a packet to end up. Once the PMD receives a CPU and a
> cache-level combination, it must extract the matching ST from the TPH
> ACPI _DSM of the PCIe root port to which the NIC is connected. To
> facilitate the extraction of STs, the PCI library and the PCI driver
> APIs are extended.
> 
> PMD's implementation of eth_dev_ops stashing_rx_hints_set and
> stashing_tx_hints_set function pointers are responsible for extracting
> the ST. The PCI bus driver provides the generic TPH ST extraction API
> that can be used by any PMD that drives a PCIe device. The extraction
> process begins by calling rte_pci_extract_tph_st() function in
> drivers/bus/pci/rte_bus_pci.h, which takes an initialized input object
> rte_tph_acpi__dsm_args and a pointer to rte_tph_acpi__dsm_return to
> store the ST returned by the TPH _DSM. rte_tph_acpi__dsm_arg and
> rte_tph_acpi__dsm_return objects are defined in lib/pci/rte_pci_tph.h as
> defined by the PCIe firmware specification and the associated ECN titled
> "Revised _DSM for Cache Locality TPH Features". The helper function
> rte_init_tph_acpi__dsm_args is used by the rte_pci_extract_tph_st() to
> convert lcore_id and cache_level provided by the PMD into well-formatted
> rte_tph_acpi__dsm_args. The processor or, in some cases, a container ID
> (which is synonymous with a core complex of a chiplet die) and the cache
> level in the rte_tph_acpi__dsm_args structure are not the same as the
> lcore_id and the cache_level provided by the application to the ethdev
> library, which PMD passes down to the rte_pci_extract_st() function. The
> rte_init_tph_acpi__dsm_args helper converts lcore_id to an APIC
> processor-id or a PPTT processor-container-id if the container of the
> lcore_id was requested as the target by the application. Similarly, it
> must convert cache_level to a PPTT cache-reference-id. These conversions
> are possible with the hwloc library or some other library DPDK may
> eventually provide. However, DPDK cannot execute the TPH _DSM directly,
> as it can only be done with kernel privileges. Therefore, appropriate
> mechanisms must be established in supported Operating Systems(Linux,
> FreeBSD, and Windows) to expose the _DSM return for a given argument.
> For instance, on Linux, this mechanism could be sysfs. Therefore, the
> implementation of rte_pci_extract_tph_st() is done in OS-specific files
> drivers/bus/pci/{bsd, linux, windows}/pci.c.
> 
> Once the ST is acquired from the OS-specific method described earlier,
> the stashing_rx_hints_set/stashing_tx_hints_set PMD implementations are
> ready to set the ST. As per PCIe specification, hints can be put on the
> MSI-X tables or using a device-specific method. Considering this, many
> NICs that support TPH allow setting steering tags and processing hints
> on the device's MSI-X table and queue contexts. For PMDs, setting the ST
> on queue contexts is the only viable method of using TPH. Therefore, the
> DPDK can only support setting ST in queue contexts. An application uses
> the cache stashing ethdev API by first calling the
> rte_eth_dev_stashing_capabilities_get() function to find out what object
> types can be stashed into a processor cache by the NIC out of the object
> types in the bulleted list above. This function takes a port_id and a
> pointer to a uint16_t to report back the object type flags. PMD
> implements the stashing_capabilities_get function pointer in
> eth_dev_ops. If the underlying platform or the NIC does not support TPH,
> this function returns -ENOTSUP and the application should consider any
> values stored in the objects pointer invalid.
> 
> Once the application knows the supported object types that can be
> stashed, the next step is to set the steering tags for the packets
> associated with Rx and Tx queues via
> rte_eth_dev_stashing_rx_config_set() and
> rte_eth_dev_stashing_tx_config_set() ethdev library function
> respectively. These functions execute the  rte_pci_extract_tph_st() via
> eth_dev_ops pointers stashing_rx_hints_set and stashing_tx_hints_set.
> Both the functions have an identical signature, a port_id, a queue_id,
> and a config object. The port_id and the queue-id are used to locate the
> device and the queue. The config object is of type struct
> rte_eth_stashing_config, which specifies the lcore_id and the
> cache_level, indicating where objects from this queue should be stashed.
> It also has the field 'container' to indicate if the target should be
> the container of the processor specified by the lcore_id in a
> chiplet-based SoC. The 'objects' field in the config sets the types of
> objects the application wishes to stash based on the capabilities found
> earlier. If the objects field includes the flag
> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
> the desired offset. These functions invoke PMD implementations of the
> stashing functionality via stashing_rx_hints_set and
> stashing_tx_hints_set, function pointers in eth_dev_ops, respectively.
> 
> 
> Wathsala Vithanage (2):
>  pci: introduce the PCIe TLP Processing Hints API
>  ethdev: introduce the cache stashing hints API
> 
> drivers/bus/pci/bsd/pci.c     |  12 +++
> drivers/bus/pci/linux/pci.c   |  12 +++
> drivers/bus/pci/rte_bus_pci.h |  22 +++++
> drivers/bus/pci/version.map   |   3 +
> drivers/bus/pci/windows/pci.c |  14 +++
> lib/ethdev/ethdev_driver.h    |  66 ++++++++++++++
> lib/ethdev/rte_ethdev.c       | 120 ++++++++++++++++++++++++++
> lib/ethdev/rte_ethdev.h       | 156 ++++++++++++++++++++++++++++++++++
> lib/ethdev/version.map        |   4 +
> lib/pci/meson.build           |   2 +
> lib/pci/rte_pci.h             |   2 +
> lib/pci/rte_pci_tph.c         |  20 +++++
> lib/pci/rte_pci_tph.h         | 111 ++++++++++++++++++++++++
> 13 files changed, 544 insertions(+)
> create mode 100644 lib/pci/rte_pci_tph.c
> create mode 100644 lib/pci/rte_pci_tph.h
> 
> —
> 2.34.1
> 

Do you have some numbers about how much performance this feature
can improve?

/Chenbo 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-21  1:52   ` [RFC v3 2/2] ethdev: introduce the cache stashing hints API Wathsala Vithanage
@ 2024-10-21  7:36     ` Morten Brørup
  2024-10-24  5:49     ` Jerin Jacob
  1 sibling, 0 replies; 27+ messages in thread
From: Morten Brørup @ 2024-10-21  7:36 UTC (permalink / raw)
  To: Wathsala Vithanage, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, nd, Honnappa Nagarahalli, Dhruv Tripathi

> +/**
> + * Apply stashing hint to data at a given offset from the start of a
> + * received packet.
> + */
> +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
> +
> +/** Apply stashing hint to an rx descriptor. */
> +#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
> +
> +/** Apply stashing hint to a header of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
> +
> +/** Apply stashing hint to a payload of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
> +
> +#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
> +/**@}*/

Although I agree these would be sensible parameters, they seem orthogonal to the Ethdev RX port queue capabilities/configuration.

The RTE_ETH_DEV_STASH_OBJECT_DESC aligns well; no problem there.

How much of a packet is considered the "header"? L3, L4, Outer tunnel header, or both Outer and Inner headers? If it follows the configuration already established through the existing Ethdev APIs on the RX port queue, it aligns well too.

Now, it's the data part I'm wondering about:

If an RX port queue is configured for receiving into mbufs from one mempool containing 2 KB objects (i.e. segments of 2 KB contiguous RAM), how do the OFFSET, HEADER, and PAYLOAD hints work? And what is the corresponding Ethdev RX queue configuration?

Same questions, considering an RX port queue configured for receiving the first 128 B into an mbuf from one mempool and the remaining part of the packet into 2 KB mbufs from another mempool?

Please provide some explanatory examples using these APIs along with the existing APIs for setting up the RX port queue.

Packets may be jumbo frames, scattered into multiple 2 KB mbufs. That should not make any difference; i.e. I assume the OFFSET, HEADER and PAYLOAD hints are considered at a packet level, not segment level.

I also assume IP fragments are treated like any other IP packets; basically, the only difference is the size of the header (it only has the IP header, and no following headers).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-10-21  7:35   ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Chenbo Xia
@ 2024-10-21 12:01     ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-21 12:01 UTC (permalink / raw)
  To: Chenbo Xia; +Cc: dev, nd, nd



> -----Original Message-----
> From: Chenbo Xia <chenbox@nvidia.com>
> Sent: Monday, October 21, 2024 2:35 AM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [RFC v3 0/2] An API for Stashing Packets into CPU caches
> 
> Hi,
> 
> > On Oct 21, 2024, at 09:52, Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > DPDK applications benefit from Direct Cache Access (DCA) features like
> > Intel DDIO and Arm's write-allocate-to-SLC. However, those features do
> > not allow fine-grained control of direct cache access, such as
> > stashing packets into upper-level caches (L2 caches) of a processor or
> > the shared cache of a chiplet. PCIe TLP Processing Hints (TPH)
> > addresses this need in a vendor-agnostic manner. TPH capability has
> > existed since PCI Express Base Specification revision 3.0; today,
> > numerous Network Interface Cards and interconnects from different
> > vendors support TPH capability. TPH comprises a steering tag (ST) and
> > a processing hint (PH). ST specifies the cache level of a CPU at which
> > the data should be written to (or DCAed into), while PH is a hint
> > provided by the PCIe requester to the completer on an upcoming traffic
> > pattern. Some NIC vendors bundle TPH capability with fine-grained
> > control over the type of objects that can be stashed into CPU caches,
> > such as
> >
> > - Rx/Tx queue descriptors
> > - Packet-headers
> > - Packet-payloads
> > - Data from a given offset from the start of a packet
> >
> > Note that stashable object types are outside the scope of PCIe
> > standard; therefore, vendors could support any combination of the
> > above items as they see fit.
> >
> > To enable TPH and fine-grained packet stashing, this API extends the
> > ethdev library, PCI library, and the PCI driver. In this design, the
> > application via the ethdev stashing API provides hints to the PMD to
> > indicate the underlying hardware at which processor and cache level it
> > prefers a packet to end up. Once the PMD receives a CPU and a
> > cache-level combination, it must extract the matching ST from the TPH
> > ACPI _DSM of the PCIe root port to which the NIC is connected. To
> > facilitate the extraction of STs, the PCI library and the PCI driver
> > APIs are extended.
> >
> > PMD's implementation of eth_dev_ops stashing_rx_hints_set and
> > stashing_tx_hints_set function pointers are responsible for extracting
> > the ST. The PCI bus driver provides the generic TPH ST extraction API
> > that can be used by any PMD that drives a PCIe device. The extraction
> > process begins by calling rte_pci_extract_tph_st() function in
> > drivers/bus/pci/rte_bus_pci.h, which takes an initialized input object
> > rte_tph_acpi__dsm_args and a pointer to rte_tph_acpi__dsm_return to
> > store the ST returned by the TPH _DSM. rte_tph_acpi__dsm_arg and
> > rte_tph_acpi__dsm_return objects are defined in lib/pci/rte_pci_tph.h
> > as defined by the PCIe firmware specification and the associated ECN
> > titled "Revised _DSM for Cache Locality TPH Features". The helper
> > function rte_init_tph_acpi__dsm_args is used by the
> > rte_pci_extract_tph_st() to convert lcore_id and cache_level provided
> > by the PMD into well-formatted rte_tph_acpi__dsm_args. The processor
> > or, in some cases, a container ID (which is synonymous with a core
> > complex of a chiplet die) and the cache level in the
> > rte_tph_acpi__dsm_args structure are not the same as the lcore_id and
> > the cache_level provided by the application to the ethdev library,
> > which PMD passes down to the rte_pci_extract_st() function. The
> > rte_init_tph_acpi__dsm_args helper converts lcore_id to an APIC
> > processor-id or a PPTT processor-container-id if the container of the
> > lcore_id was requested as the target by the application. Similarly, it
> > must convert cache_level to a PPTT cache-reference-id. These
> > conversions are possible with the hwloc library or some other library
> > DPDK may eventually provide. However, DPDK cannot execute the TPH
> _DSM
> > directly, as it can only be done with kernel privileges. Therefore,
> > appropriate mechanisms must be established in supported Operating
> Systems(Linux, FreeBSD, and Windows) to expose the _DSM return for a given
> argument.
> > For instance, on Linux, this mechanism could be sysfs. Therefore, the
> > implementation of rte_pci_extract_tph_st() is done in OS-specific
> > files drivers/bus/pci/{bsd, linux, windows}/pci.c.
> >
> > Once the ST is acquired from the OS-specific method described earlier,
> > the stashing_rx_hints_set/stashing_tx_hints_set PMD implementations
> > are ready to set the ST. As per PCIe specification, hints can be put
> > on the MSI-X tables or using a device-specific method. Considering
> > this, many NICs that support TPH allow setting steering tags and
> > processing hints on the device's MSI-X table and queue contexts. For
> > PMDs, setting the ST on queue contexts is the only viable method of
> > using TPH. Therefore, the DPDK can only support setting ST in queue
> > contexts. An application uses the cache stashing ethdev API by first
> > calling the
> > rte_eth_dev_stashing_capabilities_get() function to find out what
> > object types can be stashed into a processor cache by the NIC out of
> > the object types in the bulleted list above. This function takes a
> > port_id and a pointer to a uint16_t to report back the object type
> > flags. PMD implements the stashing_capabilities_get function pointer
> > in eth_dev_ops. If the underlying platform or the NIC does not support
> > TPH, this function returns -ENOTSUP and the application should
> > consider any values stored in the objects pointer invalid.
> >
> > Once the application knows the supported object types that can be
> > stashed, the next step is to set the steering tags for the packets
> > associated with Rx and Tx queues via
> > rte_eth_dev_stashing_rx_config_set() and
> > rte_eth_dev_stashing_tx_config_set() ethdev library function
> > respectively. These functions execute the  rte_pci_extract_tph_st()
> > via eth_dev_ops pointers stashing_rx_hints_set and stashing_tx_hints_set.
> > Both the functions have an identical signature, a port_id, a queue_id,
> > and a config object. The port_id and the queue-id are used to locate
> > the device and the queue. The config object is of type struct
> > rte_eth_stashing_config, which specifies the lcore_id and the
> > cache_level, indicating where objects from this queue should be stashed.
> > It also has the field 'container' to indicate if the target should be
> > the container of the processor specified by the lcore_id in a
> > chiplet-based SoC. The 'objects' field in the config sets the types of
> > objects the application wishes to stash based on the capabilities
> > found earlier. If the objects field includes the flag
> > RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to
> > set the desired offset. These functions invoke PMD implementations of
> > the stashing functionality via stashing_rx_hints_set and
> > stashing_tx_hints_set, function pointers in eth_dev_ops, respectively.
> >
> >
> > Wathsala Vithanage (2):
> >  pci: introduce the PCIe TLP Processing Hints API
> >  ethdev: introduce the cache stashing hints API
> >
> > drivers/bus/pci/bsd/pci.c     |  12 +++
> > drivers/bus/pci/linux/pci.c   |  12 +++
> > drivers/bus/pci/rte_bus_pci.h |  22 +++++
> > drivers/bus/pci/version.map   |   3 +
> > drivers/bus/pci/windows/pci.c |  14 +++
> > lib/ethdev/ethdev_driver.h    |  66 ++++++++++++++
> > lib/ethdev/rte_ethdev.c       | 120 ++++++++++++++++++++++++++
> > lib/ethdev/rte_ethdev.h       | 156
> ++++++++++++++++++++++++++++++++++
> > lib/ethdev/version.map        |   4 +
> > lib/pci/meson.build           |   2 +
> > lib/pci/rte_pci.h             |   2 +
> > lib/pci/rte_pci_tph.c         |  20 +++++
> > lib/pci/rte_pci_tph.h         | 111 ++++++++++++++++++++++++
> > 13 files changed, 544 insertions(+)
> > create mode 100644 lib/pci/rte_pci_tph.c create mode 100644
> > lib/pci/rte_pci_tph.h
> >
> > —
> > 2.34.1
> >
> 
> Do you have some numbers about how much performance this feature can
> improve?
> 

This patch requires some additional work done in the Linux kernel to get it working. 
I'm planning to test this on a supported HW platform soon by hardcoding some STs.
The TPH enablement patch in the kernel reports a significant improvement. 
https://patchew.org/linux/20240927215653.1552411-1-wei.huang2@amd.com/
I hope it will improve performance in DPDK too.

Please join the call scheduled for 10/23/24 to discuss what we need in the OS to support this feature.
https://inbox.dpdk.org/dev/PAWPR08MB890901574A3113840E7D7CCC9F472@PAWPR08MB8909.eurprd08.prod.outlook.com/

Thanks.

--wathsala



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
                     ` (2 preceding siblings ...)
  2024-10-21  7:35   ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Chenbo Xia
@ 2024-10-22  1:12   ` Stephen Hemminger
  2024-10-22 18:37     ` Wathsala Wathawana Vithanage
  3 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2024-10-22  1:12 UTC (permalink / raw)
  To: Wathsala Vithanage; +Cc: dev, nd

On Mon, 21 Oct 2024 01:52:44 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> DPDK applications benefit from Direct Cache Access (DCA) features like
> Intel DDIO and Arm's write-allocate-to-SLC. However, those features do
> not allow fine-grained control of direct cache access, such as stashing
> packets into upper-level caches (L2 caches) of a processor or the shared
> cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses this need
> in a vendor-agnostic manner. TPH capability has existed since
> PCI Express Base Specification revision 3.0; today, numerous Network
> Interface Cards and interconnects from different vendors support TPH
> capability. TPH comprises a steering tag (ST) and a processing hint
> (PH). ST specifies the cache level of a CPU at which the data should be
> written to (or DCAed into), while PH is a hint provided by the PCIe
> requester to the completer on an upcoming traffic pattern. Some NIC
> vendors bundle TPH capability with fine-grained control over the type of
> objects that can be stashed into CPU caches, such as
> 
> - Rx/Tx queue descriptors
> - Packet-headers
> - Packet-payloads
> - Data from a given offset from the start of a packet
> 
> Note that stashable object types are outside the scope of PCIe standard;
> therefore, vendors could support any combination of the above items as
> they see fit.
> 
> To enable TPH and fine-grained packet stashing, this API extends the
> ethdev library, PCI library, and the PCI driver. In this design, the
> application via the ethdev stashing API provides hints to the PMD to
> indicate the underlying hardware at which processor and cache level it
> prefers a packet to end up. Once the PMD receives a CPU and a
> cache-level combination, it must extract the matching ST from the TPH
> ACPI _DSM of the PCIe root port to which the NIC is connected. To
> facilitate the extraction of STs, the PCI library and the PCI driver
> APIs are extended.


There is a fundamental conflict with the increasing growth of "nerd knobs"
like this in the DPDK. Users already have problems understanding DPDK
and adding more complexity does not help.

So any new feature like this should be:
  1. Just work right without any configuration. It can't suck by default.

  2. The API's should be used in the drivers and core, not exposed up
     to the application.  Most of the hot data structures are in the
     drivers now.

  3. Fit into existing API models. Like rte_prefetch().

Is the goal of DPDK enabling high speed applications, or enabling vendor
benchmarks?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-10-22  1:12   ` Stephen Hemminger
@ 2024-10-22 18:37     ` Wathsala Wathawana Vithanage
  2024-10-22 21:23       ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-22 18:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, nd, nd

Hi Stephen,

> There is a fundamental conflict with the increasing growth of "nerd knobs"
> like this in the DPDK. Users already have problems understanding DPDK and
> adding more complexity does not help.
> 
> So any new feature like this should be:
>   1. Just work right without any configuration. It can't suck by default.
> 
By default, this feature is disabled. It can be only enabled by calling the following 
at the queue setup time. 
rte_eth_dev_stashing_rx_config_set
rte_eth_dev_stashing_tx_config_set

It's unlikely for someone not familiar with TPH to call these functions.
The performance for them should be as good as without this feature.

>   2. The API's should be used in the drivers and core, not exposed up
>      to the application.  Most of the hot data structures are in the
>      drivers now.
> 
PMDs don't know which CPU and cache level to use with TPH.
That information needs to be conveyed to the PMD, for it to work. 
Please suggest alternatives.

>   3. Fit into existing API models. Like rte_prefetch().
> 
PCIe TPH is a hint from a PCIe device to the system interconnect
to push data into CPU caches. I cannot think of an existing API
that matches the semantics of TPH.
rte_prefetch() is a hint to the CPU from the application, something
totally different.

> Is the goal of DPDK enabling high speed applications, or enabling vendor
> benchmarks?
This is a vendor agnostic feature from the PCI-SIG implemented by almost
every hardware vendor in their NICs and SoCs.
FYI: Kernel patch - https://patchwork.kernel.org/project/linux-pci/patch/20240927215653.1552411-2-wei.huang2@amd.com/



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v3 0/2] An API for Stashing Packets into CPU caches
  2024-10-22 18:37     ` Wathsala Wathawana Vithanage
@ 2024-10-22 21:23       ` Stephen Hemminger
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2024-10-22 21:23 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage; +Cc: dev, nd

On Tue, 22 Oct 2024 18:37:09 +0000
Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com> wrote:

> >   2. The API's should be used in the drivers and core, not exposed up
> >      to the application.  Most of the hot data structures are in the
> >      drivers now.
> >   
> PMDs don't know which CPU and cache level to use with TPH.
> That information needs to be conveyed to the PMD, for it to work. 
> Please suggest alternatives.

It would be better if EAL had a representation of CPU and cache hierarchy
which it built. Then have the PMD (if it cared) be able to query the
topology. That way the application would not have to care.

The DPDK already leaks too many PCI details in the API.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
                   ` (3 preceding siblings ...)
  2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
@ 2024-10-23 17:59 ` Mattias Rönnblom
  2024-10-23 20:18   ` Stephen Hemminger
                     ` (2 more replies)
  4 siblings, 3 replies; 27+ messages in thread
From: Mattias Rönnblom @ 2024-10-23 17:59 UTC (permalink / raw)
  To: Wathsala Vithanage, dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi

On 2024-07-16 00:11, Wathsala Vithanage wrote:
> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
> 
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions.  In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
> 
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues.
> 
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
> 
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---
>   .mailmap                   |   1 +
>   lib/ethdev/ethdev_driver.h |  67 +++++++++++
>   lib/ethdev/rte_ethdev.c    | 153 +++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h    | 225 +++++++++++++++++++++++++++++++++++++
>   lib/ethdev/version.map     |   6 +
>   5 files changed, 452 insertions(+)
> 
> diff --git a/.mailmap b/.mailmap
> index f1e64286a1..9c28b74655 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
>   Dexuan Cui <decui@microsoft.com>
>   Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
>   Dheemanth Mallikarjun <dheemanthm@vmware.com>
> +Dhruv Tripathi <dhruv.tripathi@arm.com>
>   Diana Wang <na.wang@corigine.com>
>   Didier Pallard <didier.pallard@6wind.com>
>   Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 883e59a927..b90dc8793b 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1235,6 +1235,70 @@ typedef int (*eth_count_aggr_ports_t)(struct rte_eth_dev *dev);
>   typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t tx_queue_id,
>   					  uint8_t affinity);
>   
> +/**
> + * @internal
> + * Set cache stashing hint in the ethernet device.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param cpuid
> + *   ID of the targeted CPU.
> + * @param cache_level
> + *   Level of the cache to stash data.

If we had a hwtopo API in DPDK, we could just use a node id in such a 
graph (of CPUs and caches) to describe were the data ideally would land. 
In such a case, you could have a node id for DDR as well, and thus you 
could drop the notion of "stashing". Just a "drop off the data here, 
please, if you can" API.

I don't think this API and its documentation should talk about what the 
"CPU" needs, since it's somewhat misleading.

For example, you can imagine you want the packet payload to land in the 
LLC, even though it's not for any CPU to consume, in case you know with 
some certaintly that the packet will soon be transmitted (and thus 
consumed by the NIC).

The same scenario can happen, the consumer is an accelerator (e.g., a 
crypto engine).

Likewise, you may know that the whole packet will be read by some CPU 
core, but you also know the system tends to buffer packets before they 
are being processed. In such a case, it's better to go to DRAM right 
away, to avoid trashing the LLC (or some other cache).

Also, why do you need to use the word "host"? Seems like a PCI thing. 
This may be implemented in PCI, but surely can be done (and has been 
done) without PCI.

> + * @param queue_id
> + *   List of receive queue ids used in rte_eth_rx_burst().
> + * @param queue_direction
> + *   RTE_ETH_DEV_QUEUE_TYPE_RX if queue that corresponds to queue_id is an
> + *   rx queue.
> + *   RTE_ETH_DEV_QUEUE_TYPE_TX if queue that corresponds to queue_id is a
> + *   tx queue.
> + * @param types
> + *   A vector of stashing types to apply hints on a given queue direction.
> + *   hints are applied on the types specified in types vector.
> + *   types can include queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
> + *   packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
> + *   packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
> + *   to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to a packet.
> + *   types have to be compatible with the queue_direction or an -EINVAL will
> + *   be returned.
> + * @param hints
> + *   Cache stashing hints
> + * @param offset
> + *   Offset into the packet if RTE_ETH_DEV_STASH_TYPE_OFFSET is set in hints.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
> + *   -EINVAL  on invalid arguments.
> + *   0 on success.
> + */
> +typedef int (*eth_set_stashing_hints_t)(struct rte_eth_dev *dev, uint16_t cpuid,
> +					uint8_t cache_level,
> +					uint16_t queue_id, uint8_t queue_direction,
> +					uint16_t types, uint8_t hints, off_t offset);
> +
> +/**
> + * @internal
> + * Discover cache stashing hints and object types supported in the ethernet device.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param types
> + *   Set bits for supported object types.
> + * @param hints
> + *   Set bits for supported stashing hints.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
> + *   -EINVAL  on NULL values for types or hints parameters.
> + *   On return, types and hints parameters will have bits set for supported
> + *   object types and hints.
> + *   0 on success.
> + */
> +typedef int (*eth_discover_stashing_hints_t)(struct rte_eth_dev *dev,
> +					     uint16_t *types, uint16_t *hints);
> +
>   /**
>    * @internal A structure containing the functions exported by an Ethernet driver.
>    */
> @@ -1257,6 +1321,9 @@ struct eth_dev_ops {
>   	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
>   	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
>   	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
> +	eth_set_stashing_hints_t   set_stashing_hints; /**< Set cache stashing*/
> +	/**Discover supported stashing hints*/
> +	eth_discover_stashing_hints_t discover_stashing_hints;
>   	/** Set list of multicast addresses */
>   	eth_set_mc_addr_list_t     set_mc_addr_list;
>   	mtu_set_t                  mtu_set;       /**< Set MTU */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index f1c658f49e..fafc94223e 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -153,6 +153,7 @@ static const struct {
>   	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
>   	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
>   	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
> +	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
>   };
>   
>   enum {
> @@ -7008,4 +7009,156 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
>   	return ret;
>   }
>   
> +int
> +rte_eth_dev_validate_stashing_hints(uint16_t port_id, uint16_t queue_id,
> +				    uint8_t queue_direction, uint16_t types,
> +				    uint16_t hints)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	uint16_t nb_queues;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> +	/*
> +	 * Check for invalid types
> +	 */
> +	if (!RTE_ETH_DEV_STASH_TYPE_VALID(types)) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing type");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Ensure that hints (HOST_DONOTNEED, HOST_WILLNEED, BI_DIR_DATA, and
> +	 * DEV_ONLY etc.) are not mixed incorrectly in the hint argument.
> +	 * Only hints of one queue direction (Rx or Tx) can be combined in the
> +	 * hint argument. If the hint argument contains hint types compatible
> +	 * with both Rx and Tx directions it can be applied to any queue of the
> +	 * two queue types.
> +	 */
> +	if (!RTE_ETH_DEV_STASH_HINT_IS_RXTX(hints)) {
> +		/*
> +		 * This is not a Rx and a Tx hint.
> +		 * Therefore it can only be applied to single queue direction.
> +		 */
> +		if (RTE_ETH_DEV_STASH_HINT_IS_TX(hints) ==
> +		    RTE_ETH_DEV_STASH_HINT_IS_RX(hints)) {
> +			RTE_ETHDEV_LOG_LINE(ERR, "This hint is not compatible "
> +					    "with both Rx and Tx paths");
> +			return -EINVAL;
> +		}
> +		/*
> +		 * Ensure that hint is compatible with the specified queue
> +		 * direction in the queue_direction argument.
> +		 */
> +		if (((queue_direction == RTE_ETH_DEV_QUEUE_TYPE_TX) &&
> +		    RTE_ETH_DEV_STASH_HINT_IS_RX(hints)) ||
> +		    ((queue_direction == RTE_ETH_DEV_QUEUE_TYPE_RX) &&
> +		    RTE_ETH_DEV_STASH_HINT_IS_TX(hints))) {
> +			RTE_ETHDEV_LOG_LINE(ERR, "Hints are not applicable to "
> +					    "this queue type");
> +			return -EINVAL;
> +		}
> +	}
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	nb_queues = (queue_direction == RTE_ETH_DEV_QUEUE_TYPE_RX) ?
> +				      dev->data->nb_rx_queues :
> +				      dev->data->nb_tx_queues;
> +
> +	if (queue_id >= nb_queues) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
> +		return -EINVAL;
> +	}
> +
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +
> +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> +	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
> +		return -ENOTSUP;
> +
> +	if (*dev->dev_ops->set_stashing_hints == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
> +				    "in %s for %s", dev_info.driver_name,
> +				    dev_info.device->name);
> +		return -ENOSYS;
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_stashing_hints_rx(uint16_t port_id, uint16_t cpuid,
> +			      uint8_t cache_level, uint16_t queue_id,
> +			      uint16_t types, off_t offset,
> +			      uint16_t hints)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	int ret = rte_eth_dev_validate_stashing_hints(port_id, queue_id,
> +						      RTE_ETH_DEV_QUEUE_TYPE_RX,
> +						      types, hints);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	return eth_err(port_id, (*dev->dev_ops->set_stashing_hints)(dev, cpuid,
> +		       cache_level, queue_id, RTE_ETH_DEV_QUEUE_TYPE_RX,
> +		       types, hints, offset));
> +}
> +
> +int
> +rte_eth_dev_stashing_hints_tx(uint16_t port_id, uint16_t cpuid,
> +			      uint8_t cache_level, uint16_t queue_id,
> +			      uint16_t types, off_t offset,
> +			      uint16_t hints)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	int ret = rte_eth_dev_validate_stashing_hints(port_id, queue_id,
> +						      RTE_ETH_DEV_QUEUE_TYPE_TX,
> +						      types, hints);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	return eth_err(port_id,
> +		       (*dev->dev_ops->set_stashing_hints) (dev, cpuid,
> +		       cache_level, queue_id, RTE_ETH_DEV_QUEUE_TYPE_TX, types,
> +		       hints, offset));
> +}
> +
> +int
> +rte_eth_dev_stashing_hints_discover(uint16_t port_id, uint16_t *types,
> +				    uint16_t *hints)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> +	if (!types || !hints)
> +		return -EINVAL;
> +
> +	dev = &rte_eth_devices[port_id];
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +
> +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> +	    RTE_ETH_DEV_CAPA_CACHE_STASHING)
> +		return -ENOTSUP;
> +
> +	if (*dev->dev_ops->discover_stashing_hints == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
> +				    "in %s for %s", dev_info.driver_name,
> +				    dev_info.device->name);
> +		return -ENOSYS;
> +	}
> +	return eth_err(port_id,
> +		       (*dev->dev_ops->discover_stashing_hints)
> +		       (dev, types, hints));
> +}
> +
>   RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 548fada1c7..a42f272885 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1648,6 +1648,9 @@ struct rte_eth_conf {
>   #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
>   /**@}*/
>   
> +/** Device supports stashing to CPU/system caches. */
> +#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
> +
>   /*
>    * Fallback default preferred Rx/Tx port parameters.
>    * These are used if an application requests default parameters
> @@ -1819,6 +1822,8 @@ struct rte_eth_dev_info {
>   	struct rte_eth_dev_portconf default_txportconf;
>   	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
>   	uint64_t dev_capa;
> +	uint16_t stashing_hints_capa;
> +	uint16_t stashing_types_capa;
>   	/**
>   	 * Switching information for ports on a device with a
>   	 * embedded managed interconnect/switch.
> @@ -5964,6 +5969,226 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
>   __rte_experimental
>   int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
>   
> +
> +
> +/** Queue type is RX. */
> +#define RTE_ETH_DEV_QUEUE_TYPE_RX		0
> +/** Queue type is TX. */
> +#define RTE_ETH_DEV_QUEUE_TYPE_TX		1
> +
> +/**@{@name Ethernet device cache stashing hints
> + *@see rte_eth_dev_stashing_hints_discover
> + *@see rte_eth_dev_stashing_hints_rx
> + *@see rte_eth_dev_stashing_hints_tx
> + */
> +/**
> + * Data read by the device could still be in a CPU local cache memory but
> + * not required by the CPU before ethernet device is done with Tx.
> + * In other words CPU does not mind evicting the relevant cache line(s)
> + * from it's local cache.
> + */
> +#define RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED	0x001
> +
> +/**
> + * Data is read and written equally by the CPU and the NIC.
> + */
> +#define RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA	0x100
> +
> +/**
> + * Data written by the device is read by a CPU immediately. CPU prefers
> + * availability of the data in it's local cache memory by the time read
> + * takes place.
> + */
> +#define RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED	0x010
> +
> +/**
> + * Data written by the device is only read by device.
> + * Host CPUs do not read this data or write to the location of the data.
> + */
> +#define RTE_ETH_DEV_STASH_HINT_DEV_ONLY		0x200
> +
> +
> +#define __RTE_ETH_DEV_STASH_HINT_TX_MASK	0x00f
> +
> +#define __RTE_ETH_DEV_STASH_HINT_RX_MASK	0x0f0
> +
> +#define __RTE_ETH_DEV_STASH_HINT_RXTX_MASK	0xf00
> +
> +
> +/**@}*/
> +
> +#define RTE_ETH_DEV_STASH_HINT_IS_TX(h)				\
> +	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_TX_MASK))) && (h))
> +
> +#define RTE_ETH_DEV_STASH_HINT_IS_RX(h)				\
> +	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_RX_MASK))) && (h))
> +
> +#define RTE_ETH_DEV_STASH_HINT_IS_RXTX(h)		\
> +	((!((h) & ~(__RTE_ETH_DEV_STASH_HINT_RXTX_MASK))) && (h))
> +
> +/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
> + *@see rte_eth_dev_stashing_hints_discover
> + *@see rte_eth_dev_stashing_hints_rx
> + *@see rte_eth_dev_stashing_hints_tx
> + */
> +
> +/**
> + * Apply stashing hint to data at a given offset from the start of a
> + * received packet.
> + */
> +#define RTE_ETH_DEV_STASH_TYPE_OFFSET	0x0001
> +
> +/** Apply stashing hint to an rx descriptor. */
> +#define RTE_ETH_DEV_STASH_TYPE_DESC	0x0002
> +
> +/** Apply stashing hint to a header of a received packet. */
> +#define RTE_ETH_DEV_STASH_TYPE_HEADER	0x0004
> +
> +/** Apply stashing hint to a payload of a received packet. */
> +#define RTE_ETH_DEV_STASH_TYPE_PAYLOAD	0x0008
> +#define __RTE_ETH_DEV_STASH_TYPE_MASK	0x000f
> +/**@}*/
> +
> +#define RTE_ETH_DEV_STASH_TYPE_VALID(t)				\
> +	((!((t) & (~__RTE_ETH_DEV_STASH_TYPE_MASK))) && (t))
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * @internal
> + * Helper function to validate stashing hints.
> + */
> +__rte_experimental
> +int rte_eth_dev_validate_stashing_hints(uint16_t port_id, uint16_t queue_id,
> +					uint8_t queue_direction, uint16_t type,
> +					uint16_t hint);
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Provide cache stashing hints for improved memory access latencies for
> + * packets received by the NIC. Hints the underlying hardware that CPU indicated
> + * in cpuid parameter prefers to have the data specified in the type parameter
> + * at a level in the memory hierarchy specified in cache_level parameter for
> + * access pattern(s) specified in hints parameter.
> + * This feature is available only in supported NICs and platforms.
> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param cpuid
> + *  ID of the targeted CPU for the hint.
> + * @param cache_level
> + *  The preferred level of the cache the packets are expected at the time of
> + *  retrieval.
> + * @param queue_id
> + *  The index of the receive queue to which hints are applied.
> + * @param types
> + *  A vector of stashing types to apply hints on receive queue.
> + *  Hints are applied on the types specified in types vector.
> + *  types can include receive queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
> + *  packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
> + *  packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
> + *  to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to packet.
> + *  Types used should be compatible with RX queues, if not -EINVAL will be
> + *  returned.
> + * @param offset
> + *  Offset into the packet if RTE_ETH_DEV_STASH_TYPE_RX_OFFSET is set in hints.
> + * @param hints
> + *  A vector of stashing hints to the device and the platform.
> + * @return
> + *  - (-ENODEV) on incorrect port_ids.
> + *  - (-EINVAL) if both RX and TX types are used in conjuection in type
> + *  parameter.
> + *  - (-EINVAL) if hints are incompatible with RX queues.
> + *  - (-EINVAL) on invalid queue_id.
> + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
> + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> + *  - (0) on Success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_hints_rx(uint16_t port_id, uint16_t cpuid,
> +				 uint8_t cache_level, uint16_t queue_id,
> +				 uint16_t types, off_t offset, uint16_t hints);
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Provide cache stashing hints for improved memory access latencies for
> + * packets being transmitted by the NIC. Hints the underlying hardware that CPU
> + * prefers to have the data specified in the type parameter at a level in the
> + * memory hierarchy specified in cache_level parameter for an access pattern
> + * specified in hints parameter.
> + * This feature is available only in supported NICs and platforms.
> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param cpuid
> + *  ID of the targeted CPU for the hint.
> + * @param cache_level
> + *  The preferred level of the cache the packets are expected at the time of
> + *  transmission.
> + * @param queue_id
> + *  The index of the transmit queue which hints are applied to.
> + * @param types
> + *  A vector of stashing types to apply hints on transmit queue.
> + *  hints are applied on types specified in types vector.
> + *  types can innclude transmit queue descriptors (RTE_ETH_DEV_STASH_TYPE_DESC),
> + *  packet headers (RTE_ETH_DEV_STASH_TYPE_HEADER),
> + *  packet payloads (RTE_ETH_DEV_STASH_TYPE_PAYLOAD) or
> + *  to an offset (RTE_ETH_DEV_STASH_TYPE_OFFSET) in to packet.
> + *  Types used should be compatible with TX queues, if not -EINVAL will be
> + *  returned.
> + * @param offset
> + *  Offset into the packet if RTE_ETH_DEV_STASH_TYPE_RX_OFFSET is set in hints.
> + * @param hints
> + *  A vector of stashing hints to the device and the platform.
> + * @return
> + *  - (-ENODEV) on incorrect port_ids.
> + *  - (-EINVAL) if both RX and TX types are used in conjuection in type
> + *  parameter.
> + *  - (-EINVAL) if hints are incompatible with TX queues.
> + *  - (-EINVAL) on invalid queue_id.
> + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
> + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> + *  - (0) on Success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_hints_tx(uint16_t port_id, uint16_t cpuid,
> +				 uint8_t cache_level, uint16_t queue_id,
> +				 uint16_t types, off_t offset, uint16_t hints);
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Discover cache stashing hints and object types supported in the ethernet
> + * device.
> + *

Why is this needed?

It seems to me the application should just give hints what it thinks is 
best, and then the PMD should do the best it can with that information.

No validating, no discovering.

Now you have a lot of API verbiage for a fairly esoteric function.

Just to be clear: I think an API like this is a great idea. Controlling 
how the NIC and other devices loads and stores data can be crucial.

> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param types
> + *  Supported types vector set by the ethernet device.
> + * @param hints
> + *  Supported hints vector set by the ethernet device.
> + * @return
> + *  On return types and hints parameters will have bits set for supported
> + *  object types.
> + *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
> + *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
> + *  feature.
> + *  - (-EINVAL)  on NULL values for types or hints parameters.
> + *  - (0) on success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_hints_discover(uint16_t port_id, uint16_t *types,
> +					uint16_t *hints);
> +
>   #include <rte_ethdev_core.h>
>   
>   /**
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 79f6f5293b..5eef0b4540 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -325,6 +325,12 @@ EXPERIMENTAL {
>   	rte_flow_template_table_resizable;
>   	rte_flow_template_table_resize;
>   	rte_flow_template_table_resize_complete;
> +
> +	# added in 24.07
> +	rte_eth_dev_stashing_hints_rx;
> +	rte_eth_dev_stashing_hints_tx;
> +	rte_eth_dev_stashing_hints_discover;
> +	rte_eth_dev_validate_stashing_hints;
>   };
>   
>   INTERNAL {


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
@ 2024-10-23 20:18   ` Stephen Hemminger
  2024-10-24 14:59   ` Wathsala Wathawana Vithanage
  2024-10-25  7:43   ` Andrew Rybchenko
  2 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2024-10-23 20:18 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Wathsala Vithanage, dev, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, nd, Dhruv Tripathi

On Wed, 23 Oct 2024 19:59:35 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 883e59a927..b90dc8793b 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -1235,6 +1235,70 @@ typedef int (*eth_count_aggr_ports_t)(struct rte_eth_dev *dev);
> >   typedef int (*eth_map_aggr_tx_affinity_t)(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> >   					  uint8_t affinity);
> >   
> > +/**
> > + * @internal
> > + * Set cache stashing hint in the ethernet device.
> > + *
> > + * @param dev
> > + *   Port (ethdev) handle.
> > + * @param cpuid
> > + *   ID of the targeted CPU.
> > + * @param cache_level
> > + *   Level of the cache to stash data.  
> 
> If we had a hwtopo API in DPDK, we could just use a node id in such a 
> graph (of CPUs and caches) to describe were the data ideally would land. 
> In such a case, you could have a node id for DDR as well, and thus you 
> could drop the notion of "stashing". Just a "drop off the data here, 
> please, if you can" API.
> 
> I don't think this API and its documentation should talk about what the 
> "CPU" needs, since it's somewhat misleading.
> 
> For example, you can imagine you want the packet payload to land in the 
> LLC, even though it's not for any CPU to consume, in case you know with 
> some certaintly that the packet will soon be transmitted (and thus 
> consumed by the NIC).
> 
> The same scenario can happen, the consumer is an accelerator (e.g., a 
> crypto engine).
> 
> Likewise, you may know that the whole packet will be read by some CPU 
> core, but you also know the system tends to buffer packets before they 
> are being processed. In such a case, it's better to go to DRAM right 
> away, to avoid trashing the LLC (or some other cache).
> 
> Also, why do you need to use the word "host"? Seems like a PCI thing. 
> This may be implemented in PCI, but surely can be done (and has been 
> done) without PCI.

+1 for the concept of having a CPU and PCI topology map that
can be queried by drivers and application. Dumpster diving into sysfs
is hard to get right and keeps growing. I wonder if there exists an open
source library that is a good enough starting point for this already.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-21  1:52   ` [RFC v3 2/2] ethdev: introduce the cache stashing hints API Wathsala Vithanage
  2024-10-21  7:36     ` Morten Brørup
@ 2024-10-24  5:49     ` Jerin Jacob
  2024-10-24  6:59       ` Morten Brørup
  2024-10-24 15:04       ` Wathsala Wathawana Vithanage
  1 sibling, 2 replies; 27+ messages in thread
From: Jerin Jacob @ 2024-10-24  5:49 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi

On Mon, Oct 21, 2024 at 7:23 AM Wathsala Vithanage
<wathsala.vithanage@arm.com> wrote:
>
> Extend the ethdev library to enable the stashing of different data
> objects, such as the ones listed below, into CPU caches directly
> from the NIC.
>
> - Rx/Tx queue descriptors
> - Rx packets
> - Packet headers
> - packet payloads
> - Data of a packet at an offset from the start of the packet
>
> The APIs are designed in a hardware/vendor agnostic manner such that
> supporting PMDs could use any capabilities available in the underlying
> hardware for fine-grained stashing of data objects into a CPU cache
> (e.g., Steering Tags int PCIe TLP Processing Hints).
>
> The API provides an interface to query the availability of stashing
> capabilities, i.e., platform/NIC support, stashable object types, etc,
> via the rte_eth_dev_stashing_capabilities_get interface.
>
> The function pair rte_eth_dev_stashing_rx_config_set and
> rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> cache level, and data object types) on the Rx and Tx queues.
>
> PMDs that support stashing must register their implementations with the
> following eth_dev_ops callbacks, which are invoked by the ethdev
> functions listed above.
>
> - stashing_capabilities_get
> - stashing_rx_hints_set
> - stashing_tx_hints_set
>
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
>

> +
> +/** Queue type is RX. */
> +#define RTE_ETH_DEV_RX_QUEUE           0
> +/** Queue type is TX. */
> +#define RTE_ETH_DEV_TX_QUEUE           1
> +
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
> + *
> + * A structure used for configuring the cache stashing hints.
> + */
> +struct rte_eth_stashing_config {
> +       /** ID of the Processor/Container the stashing hints are
> +        *  applied to
> +        */
> +       uint16_t        lcore_id;
> +       /** Set if the target is a CPU containeri.lcore_id will be
> +        * used to derive container ID
> +        */
> +       uint16_t        container : 1;
> +       uint16_t        padding : 7;
> +       /** Cache level of the CPU specified by the cpu_id the
> +        *  stashing hints are applied to
> +        */
> +       uint16_t        cache_level : 8;
> +       /** Object types the configuration is applied to
> +        */
> +       uint16_t        objects;
> +       /** The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> +        *  in objects
> +        */
> +       off_t           offset;
> +};
> +
> +/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
> + *@see rte_eth_dev_stashing_capabilities_get
> + *@see rte_eth_dev_stashing_rx_config_set
> + *@see rte_eth_dev_stashing_tx_config_set
> + */
> +
> +/**
> + * Apply stashing hint to data at a given offset from the start of a
> + * received packet.
> + */
> +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET                0x0001
> +
> +/** Apply stashing hint to an rx descriptor. */
> +#define RTE_ETH_DEV_STASH_OBJECT_DESC          0x0002
> +
> +/** Apply stashing hint to a header of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_HEADER                0x0004
> +
> +/** Apply stashing hint to a payload of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD       0x0008
> +
> +#define __RTE_ETH_DEV_STASH_OBJECT_MASK                0x000f
> +/**@}*/
> +
> +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)                             \
> +       ((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> +


I think, at one of point of time, we need to extend this to other
device class like(cryptodev etc)
where the data needs to move over bus. In that context, all the above
symbols better to be in
EAL and the device class subsystem(example ethdev) gives PMD callback.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-24  5:49     ` Jerin Jacob
@ 2024-10-24  6:59       ` Morten Brørup
  2024-10-24 15:12         ` Wathsala Wathawana Vithanage
  2024-10-24 15:04       ` Wathsala Wathawana Vithanage
  1 sibling, 1 reply; 27+ messages in thread
From: Morten Brørup @ 2024-10-24  6:59 UTC (permalink / raw)
  To: Jerin Jacob, Wathsala Vithanage
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi

> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> Sent: Thursday, 24 October 2024 07.49
> 
> On Mon, Oct 21, 2024 at 7:23 AM Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> >
> > Extend the ethdev library to enable the stashing of different data
> > objects, such as the ones listed below, into CPU caches directly
> > from the NIC.
> >
> > - Rx/Tx queue descriptors
> > - Rx packets
> > - Packet headers
> > - packet payloads
> > - Data of a packet at an offset from the start of the packet
> >
> > The APIs are designed in a hardware/vendor agnostic manner such that
> > supporting PMDs could use any capabilities available in the
> underlying
> > hardware for fine-grained stashing of data objects into a CPU cache
> > (e.g., Steering Tags int PCIe TLP Processing Hints).
> >
> > The API provides an interface to query the availability of stashing
> > capabilities, i.e., platform/NIC support, stashable object types,
> etc,
> > via the rte_eth_dev_stashing_capabilities_get interface.
> >
> > The function pair rte_eth_dev_stashing_rx_config_set and
> > rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> > cache level, and data object types) on the Rx and Tx queues.
> >
> > PMDs that support stashing must register their implementations with
> the
> > following eth_dev_ops callbacks, which are invoked by the ethdev
> > functions listed above.
> >
> > - stashing_capabilities_get
> > - stashing_rx_hints_set
> > - stashing_tx_hints_set
> >
> > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> >
> 
> > +
> > +/** Queue type is RX. */
> > +#define RTE_ETH_DEV_RX_QUEUE           0
> > +/** Queue type is TX. */
> > +#define RTE_ETH_DEV_TX_QUEUE           1
> > +
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change, or be removed,
> without prior notice
> > + *
> > + * A structure used for configuring the cache stashing hints.

This structure can only describe one stashing hint.
Please use singular, not plural, in its description.

> > + */
> > +struct rte_eth_stashing_config {
> > +       /** ID of the Processor/Container the stashing hints are
> > +        *  applied to
> > +        */
> > +       uint16_t        lcore_id;

The common type used for lcore_id is "unsigned int", ref. e.g. rte_lcore_id() return value.
Alternatively uint32_t, ref. LCORE_ID_ANY.

> > +       /** Set if the target is a CPU containeri.lcore_id will be
> > +        * used to derive container ID
> > +        */
> > +       uint16_t        container : 1;
> > +       uint16_t        padding : 7;
> > +       /** Cache level of the CPU specified by the cpu_id the
> > +        *  stashing hints are applied to
> > +        */
> > +       uint16_t        cache_level : 8;
> > +       /** Object types the configuration is applied to
> > +        */
> > +       uint16_t        objects;
> > +       /** The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> > +        *  in objects
> > +        */
> > +       off_t           offset;

off_t is for files, ptrdiff_t is for memory.

> > +};
> > +
> > +/**@{@name Stashable Rx/Tx queue object types supported by the
> ethernet device
> > + *@see rte_eth_dev_stashing_capabilities_get
> > + *@see rte_eth_dev_stashing_rx_config_set
> > + *@see rte_eth_dev_stashing_tx_config_set
> > + */
> > +
> > +/**
> > + * Apply stashing hint to data at a given offset from the start of a
> > + * received packet.
> > + */
> > +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET                0x0001
> > +
> > +/** Apply stashing hint to an rx descriptor. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_DESC          0x0002
> > +
> > +/** Apply stashing hint to a header of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_HEADER                0x0004
> > +
> > +/** Apply stashing hint to a payload of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD       0x0008
> > +
> > +#define __RTE_ETH_DEV_STASH_OBJECT_MASK                0x000f
> > +/**@}*/
> > +
> > +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)
> \
> > +       ((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> > +
> 
> 
> I think, at one of point of time, we need to extend this to other
> device class like(cryptodev etc)
> where the data needs to move over bus. In that context, all the above
> symbols better to be in
> EAL and the device class subsystem(example ethdev) gives PMD callback.

+1

When generalizing this, perhaps "header" and "payload" should be renamed to "device-specific".

For ethdevs, the typical meaning of "device-specific" would be splitting at some header (as suggested by the "header" and "payload" enum values).

Furthermore, for ethdevs, using a "device-specific" would allow the device to split at some other point, controlled through other ethdev APIs.
E.g. the split point could be controlled by rte_flow; this would allow rte_flow to put entire packets in L2 cache for some packet types, and only the packet header in L2 cache for some other packet types. (Someone at the conference call suggested combining Steering Tags with rte_flow - this might be a way of doing it.)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
  2024-10-23 20:18   ` Stephen Hemminger
@ 2024-10-24 14:59   ` Wathsala Wathawana Vithanage
  2024-10-25  7:43   ` Andrew Rybchenko
  2 siblings, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-24 14:59 UTC (permalink / raw)
  To: Mattias Rönnblom, dev, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: nd, Dhruv Tripathi, nd

> If we had a hwtopo API in DPDK, we could just use a node id in such a graph
> (of CPUs and caches) to describe were the data ideally would land.
> In such a case, you could have a node id for DDR as well, and thus you could
> drop the notion of "stashing". Just a "drop off the data here, please, if you
> can" API.
> 
> I don't think this API and its documentation should talk about what the "CPU"
> needs, since it's somewhat misleading.
> 
> For example, you can imagine you want the packet payload to land in the LLC,
> even though it's not for any CPU to consume, in case you know with some
> certaintly that the packet will soon be transmitted (and thus consumed by the
> NIC).
> 
> The same scenario can happen, the consumer is an accelerator (e.g., a crypto
> engine).
> 
> Likewise, you may know that the whole packet will be read by some CPU core,
> but you also know the system tends to buffer packets before they are being
> processed. In such a case, it's better to go to DRAM right away, to avoid
> trashing the LLC (or some other cache).
> 
> Also, why do you need to use the word "host"? Seems like a PCI thing.
> This may be implemented in PCI, but surely can be done (and has been
> done) without PCI.
> 

Thanks, Mattias. V2 is outdated, please provide feedback on V3.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-24  5:49     ` Jerin Jacob
  2024-10-24  6:59       ` Morten Brørup
@ 2024-10-24 15:04       ` Wathsala Wathawana Vithanage
  1 sibling, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-24 15:04 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: thomas, Ferruh Yigit, Andrew Rybchenko, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi, nd



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, October 24, 2024 12:49 AM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Cc: thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; nd
> <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Dhruv Tripathi <Dhruv.Tripathi@arm.com>
> Subject: Re: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
> 
> On Mon, Oct 21, 2024 at 7:23 AM Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> >
> > Extend the ethdev library to enable the stashing of different data
> > objects, such as the ones listed below, into CPU caches directly from
> > the NIC.
> >
> > - Rx/Tx queue descriptors
> > - Rx packets
> > - Packet headers
> > - packet payloads
> > - Data of a packet at an offset from the start of the packet
> >
> > The APIs are designed in a hardware/vendor agnostic manner such that
> > supporting PMDs could use any capabilities available in the underlying
> > hardware for fine-grained stashing of data objects into a CPU cache
> > (e.g., Steering Tags int PCIe TLP Processing Hints).
> >
> > The API provides an interface to query the availability of stashing
> > capabilities, i.e., platform/NIC support, stashable object types, etc,
> > via the rte_eth_dev_stashing_capabilities_get interface.
> >
> > The function pair rte_eth_dev_stashing_rx_config_set and
> > rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> > cache level, and data object types) on the Rx and Tx queues.
> >
> > PMDs that support stashing must register their implementations with
> > the following eth_dev_ops callbacks, which are invoked by the ethdev
> > functions listed above.
> >
> > - stashing_capabilities_get
> > - stashing_rx_hints_set
> > - stashing_tx_hints_set
> >
> > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> >
> 
> > +
> > +/** Queue type is RX. */
> > +#define RTE_ETH_DEV_RX_QUEUE           0
> > +/** Queue type is TX. */
> > +#define RTE_ETH_DEV_TX_QUEUE           1
> > +
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change, or be removed, without
> > +prior notice
> > + *
> > + * A structure used for configuring the cache stashing hints.
> > + */
> > +struct rte_eth_stashing_config {
> > +       /** ID of the Processor/Container the stashing hints are
> > +        *  applied to
> > +        */
> > +       uint16_t        lcore_id;
> > +       /** Set if the target is a CPU containeri.lcore_id will be
> > +        * used to derive container ID
> > +        */
> > +       uint16_t        container : 1;
> > +       uint16_t        padding : 7;
> > +       /** Cache level of the CPU specified by the cpu_id the
> > +        *  stashing hints are applied to
> > +        */
> > +       uint16_t        cache_level : 8;
> > +       /** Object types the configuration is applied to
> > +        */
> > +       uint16_t        objects;
> > +       /** The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> > +        *  in objects
> > +        */
> > +       off_t           offset;
> > +};
> > +
> > +/**@{@name Stashable Rx/Tx queue object types supported by the
> > +ethernet device  *@see rte_eth_dev_stashing_capabilities_get
> > + *@see rte_eth_dev_stashing_rx_config_set
> > + *@see rte_eth_dev_stashing_tx_config_set
> > + */
> > +
> > +/**
> > + * Apply stashing hint to data at a given offset from the start of a
> > + * received packet.
> > + */
> > +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET                0x0001
> > +
> > +/** Apply stashing hint to an rx descriptor. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_DESC          0x0002
> > +
> > +/** Apply stashing hint to a header of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_HEADER                0x0004
> > +
> > +/** Apply stashing hint to a payload of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD       0x0008
> > +
> > +#define __RTE_ETH_DEV_STASH_OBJECT_MASK                0x000f
> > +/**@}*/
> > +
> > +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)                             \
> > +       ((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> > +
> 
> 
> I think, at one of point of time, we need to extend this to other device class
> like(cryptodev etc) where the data needs to move over bus. In that context, all
> the above symbols better to be in EAL and the device class subsystem(example
> ethdev) gives PMD callback.

+1
I will make that change in the RFC v4.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
  2024-10-24  6:59       ` Morten Brørup
@ 2024-10-24 15:12         ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 27+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-10-24 15:12 UTC (permalink / raw)
  To: Morten Brørup, Jerin Jacob
  Cc: thomas, Ferruh Yigit, Andrew Rybchenko, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi, nd



> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, October 24, 2024 2:00 AM
> To: Jerin Jacob <jerinjacobk@gmail.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>
> Cc: thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; nd
> <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Dhruv Tripathi <Dhruv.Tripathi@arm.com>
> Subject: RE: [RFC v3 2/2] ethdev: introduce the cache stashing hints API
> 
> > From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> > Sent: Thursday, 24 October 2024 07.49
> >
> > On Mon, Oct 21, 2024 at 7:23 AM Wathsala Vithanage
> > <wathsala.vithanage@arm.com> wrote:
> > >
> > > Extend the ethdev library to enable the stashing of different data
> > > objects, such as the ones listed below, into CPU caches directly
> > > from the NIC.
> > >
> > > - Rx/Tx queue descriptors
> > > - Rx packets
> > > - Packet headers
> > > - packet payloads
> > > - Data of a packet at an offset from the start of the packet
> > >
> > > The APIs are designed in a hardware/vendor agnostic manner such that
> > > supporting PMDs could use any capabilities available in the
> > underlying
> > > hardware for fine-grained stashing of data objects into a CPU cache
> > > (e.g., Steering Tags int PCIe TLP Processing Hints).
> > >
> > > The API provides an interface to query the availability of stashing
> > > capabilities, i.e., platform/NIC support, stashable object types,
> > etc,
> > > via the rte_eth_dev_stashing_capabilities_get interface.
> > >
> > > The function pair rte_eth_dev_stashing_rx_config_set and
> > > rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> > > cache level, and data object types) on the Rx and Tx queues.
> > >
> > > PMDs that support stashing must register their implementations with
> > the
> > > following eth_dev_ops callbacks, which are invoked by the ethdev
> > > functions listed above.
> > >
> > > - stashing_capabilities_get
> > > - stashing_rx_hints_set
> > > - stashing_tx_hints_set
> > >
> > > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> > >
> >
> > > +
> > > +/** Queue type is RX. */
> > > +#define RTE_ETH_DEV_RX_QUEUE           0
> > > +/** Queue type is TX. */
> > > +#define RTE_ETH_DEV_TX_QUEUE           1
> > > +
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this structure may change, or be removed,
> > without prior notice
> > > + *
> > > + * A structure used for configuring the cache stashing hints.
> 
> This structure can only describe one stashing hint.
> Please use singular, not plural, in its description.
> 
> > > + */
> > > +struct rte_eth_stashing_config {
> > > +       /** ID of the Processor/Container the stashing hints are
> > > +        *  applied to
> > > +        */
> > > +       uint16_t        lcore_id;
> 
> The common type used for lcore_id is "unsigned int", ref. e.g. rte_lcore_id()
> return value.
> Alternatively uint32_t, ref. LCORE_ID_ANY.
> 
+1

> > > +       /** Set if the target is a CPU containeri.lcore_id will be
> > > +        * used to derive container ID
> > > +        */
> > > +       uint16_t        container : 1;
> > > +       uint16_t        padding : 7;
> > > +       /** Cache level of the CPU specified by the cpu_id the
> > > +        *  stashing hints are applied to
> > > +        */
> > > +       uint16_t        cache_level : 8;
> > > +       /** Object types the configuration is applied to
> > > +        */
> > > +       uint16_t        objects;
> > > +       /** The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> > > +        *  in objects
> > > +        */
> > > +       off_t           offset;
> 
> off_t is for files, ptrdiff_t is for memory.
> 

+1

> > > +};
> > > +
> > > +/**@{@name Stashable Rx/Tx queue object types supported by the
> > ethernet device
> > > + *@see rte_eth_dev_stashing_capabilities_get
> > > + *@see rte_eth_dev_stashing_rx_config_set
> > > + *@see rte_eth_dev_stashing_tx_config_set
> > > + */
> > > +
> > > +/**
> > > + * Apply stashing hint to data at a given offset from the start of
> > > +a
> > > + * received packet.
> > > + */
> > > +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET                0x0001
> > > +
> > > +/** Apply stashing hint to an rx descriptor. */
> > > +#define RTE_ETH_DEV_STASH_OBJECT_DESC          0x0002
> > > +
> > > +/** Apply stashing hint to a header of a received packet. */
> > > +#define RTE_ETH_DEV_STASH_OBJECT_HEADER                0x0004
> > > +
> > > +/** Apply stashing hint to a payload of a received packet. */
> > > +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD       0x0008
> > > +
> > > +#define __RTE_ETH_DEV_STASH_OBJECT_MASK                0x000f
> > > +/**@}*/
> > > +
> > > +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)
> > \
> > > +       ((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> > > +
> >
> >
> > I think, at one of point of time, we need to extend this to other
> > device class like(cryptodev etc) where the data needs to move over
> > bus. In that context, all the above symbols better to be in EAL and
> > the device class subsystem(example ethdev) gives PMD callback.
> 
> +1
> 
> When generalizing this, perhaps "header" and "payload" should be renamed
> to "device-specific".
> 
> For ethdevs, the typical meaning of "device-specific" would be splitting at
> some header (as suggested by the "header" and "payload" enum values).
> 
> Furthermore, for ethdevs, using a "device-specific" would allow the device to
> split at some other point, controlled through other ethdev APIs.
> E.g. the split point could be controlled by rte_flow; this would allow rte_flow
> to put entire packets in L2 cache for some packet types, and only the packet
> header in L2 cache for some other packet types. (Someone at the conference
> call suggested combining Steering Tags with rte_flow - this might be a way of
> doing it.)

+1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
  2024-10-23 20:18   ` Stephen Hemminger
  2024-10-24 14:59   ` Wathsala Wathawana Vithanage
@ 2024-10-25  7:43   ` Andrew Rybchenko
  2 siblings, 0 replies; 27+ messages in thread
From: Andrew Rybchenko @ 2024-10-25  7:43 UTC (permalink / raw)
  To: Mattias Rönnblom, Wathsala Vithanage, dev, Thomas Monjalon,
	Ferruh Yigit
  Cc: nd, Dhruv Tripathi

On 10/23/24 20:59, Mattias Rönnblom wrote:
> On 2024-07-16 00:11, Wathsala Vithanage wrote:

...

>> +/**
>> + *
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior 
>> notice
>> + *
>> + * Discover cache stashing hints and object types supported in the 
>> ethernet
>> + * device.
>> + *
> 
> Why is this needed?
> 
> It seems to me the application should just give hints what it thinks is 
> best, and then the PMD should do the best it can with that information.

+1, I think it is a very good idea

> 
> No validating, no discovering.
> 
> Now you have a lot of API verbiage for a fairly esoteric function.
> 
> Just to be clear: I think an API like this is a great idea. Controlling 
> how the NIC and other devices loads and stores data can be crucial.

+1



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-10-25  7:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-15 22:11 [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
2024-07-17  2:27 ` Stephen Hemminger
2024-07-18 18:48   ` Wathsala Wathawana Vithanage
2024-07-20  3:05   ` Honnappa Nagarahalli
2024-07-17 10:32 ` Konstantin Ananyev
2024-07-22 11:18 ` Ferruh Yigit
2024-07-26 20:01   ` Wathsala Wathawana Vithanage
2024-09-22 21:43     ` Ferruh Yigit
2024-10-04 17:52       ` Stephen Hemminger
2024-10-04 18:46         ` Wathsala Wathawana Vithanage
2024-10-21  1:52 ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Wathsala Vithanage
2024-10-21  1:52   ` [RFC v3 1/2] pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
2024-10-21  1:52   ` [RFC v3 2/2] ethdev: introduce the cache stashing hints API Wathsala Vithanage
2024-10-21  7:36     ` Morten Brørup
2024-10-24  5:49     ` Jerin Jacob
2024-10-24  6:59       ` Morten Brørup
2024-10-24 15:12         ` Wathsala Wathawana Vithanage
2024-10-24 15:04       ` Wathsala Wathawana Vithanage
2024-10-21  7:35   ` [RFC v3 0/2] An API for Stashing Packets into CPU caches Chenbo Xia
2024-10-21 12:01     ` Wathsala Wathawana Vithanage
2024-10-22  1:12   ` Stephen Hemminger
2024-10-22 18:37     ` Wathsala Wathawana Vithanage
2024-10-22 21:23       ` Stephen Hemminger
2024-10-23 17:59 ` [RFC v2] ethdev: an API for cache stashing hints Mattias Rönnblom
2024-10-23 20:18   ` Stephen Hemminger
2024-10-24 14:59   ` Wathsala Wathawana Vithanage
2024-10-25  7:43   ` Andrew Rybchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).