DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries
  @ 2024-04-03 17:53  3% ` Tyler Retzlaff
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
  2024-06-19 15:01  3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
  2 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v10:
  * move removal notices in in release notes from 24.03 to 24.07

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

Tyler Retzlaff (4):
  net/i40e: use inline prefetch function
  mbuf: remove rte marker fields
  security: remove rte marker fields
  cryptodev: remove rte marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_07.rst  |   9 ++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   5 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 200 +++++++++++++++++---------------
 lib/security/rte_security_driver.h      |   5 +-
 7 files changed, 128 insertions(+), 103 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-03 17:53  2%   ` Tyler Retzlaff
  2024-04-03 19:32  0%     ` Morten Brørup
  2024-04-03 21:49  0%     ` Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_07.rst |   3 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 200 +++++++++++++++++----------------
 4 files changed, 115 insertions(+), 98 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
 [suppress_type]
 	name = rte_eth_fp_ops
 	has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
+
 
 API Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..9d838b8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						/**< ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						uint8_t inner_esp_next_proto;
+						__extension__
+						struct {
+							/**< Inner L2 type. */
+							uint8_t inner_l2_type:4;
+							/**< Inner L3 type. */
+							uint8_t inner_l3_type:4;
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
+
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							/**< Second 4 flexible bytes */
+							uint32_t lo;
+						};
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+						uint32_t hi;
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+						uint16_t txq;
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					/**< User defined tags. See rte_distributor_process() */
+					uint32_t usr;
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L2 (MAC) Header Length for non-tunneling pkt.
 			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
 			 */
-			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L3 (IP) Header Length. */
-			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
 			/**< L4 (TCP/UDP) Header Length. */
-			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
 			/**< TCP TSO segment size */
-
+			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
 			/*
 			 * Fields for Tx offloading of tunnels.
 			 * These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
 			 * Applications are expected to set appropriate tunnel
 			 * offload flags when they fill in these fields.
 			 */
-			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L3 (IP) Hdr Length. */
-			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L2 (MAC) Hdr Length. */
+			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
 
 			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
 		};
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* RE: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
@ 2024-04-03 19:32  0%     ` Morten Brørup
  2024-04-03 22:45  0%       ` Tyler Retzlaff
  2024-04-03 21:49  0%     ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-03 19:32 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 3 April 2024 19.54
> 
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
> 
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> 
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
> 
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---

[...]

> +	/* remaining 24 bytes are set on RX when pulling packet from
> descriptor */

Good.

>  	union {
> +		/* void * type of the array elements is retained for driver
> compatibility. */
> +		void *rx_descriptor_fields1[24 / sizeof(void *)];

Good, also the description.

>  		__extension__
>  		struct {
> -			uint8_t l2_type:4;   /**< (Outer) L2 type. */
> -			uint8_t l3_type:4;   /**< (Outer) L3 type. */
> -			uint8_t l4_type:4;   /**< (Outer) L4 type. */
> -			uint8_t tun_type:4;  /**< Tunnel type. */
> +			/*
> +			 * The packet type, which is the combination of
> outer/inner L2, L3, L4
> +			 * and tunnel types. The packet_type is about data
> really present in the
> +			 * mbuf. Example: if vlan stripping is enabled, a
> received vlan packet
> +			 * would have RTE_PTYPE_L2_ETHER and not
> RTE_PTYPE_L2_VLAN because the
> +			 * vlan is stripped from the data.
> +			 */
>  			union {
> -				uint8_t inner_esp_next_proto;
> -				/**< ESP next protocol type, valid if
> -				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> -				 * on both Tx and Rx.
> -				 */

[...]

> +						/**< ESP next protocol type, valid
> if
> +						 * RTE_PTYPE_TUNNEL_ESP tunnel type
> is set
> +						 * on both Tx and Rx.
> +						 */
> +						uint8_t inner_esp_next_proto;

Thank you for moving the comments up before the fields.

Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.

Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.

[...]

>  	/* second cache line - fields only used in slow path or on TX */
> -	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> -
>  #if RTE_IOVA_IN_MBUF
>  	/**
>  	 * Next segment of scattered packet. Must be NULL in the last
>  	 * segment or in case of non-segmented packet.
>  	 */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
>  	struct rte_mbuf *next;
>  #else
>  	/**
>  	 * Reserved for dynamic fields
>  	 * when the next pointer is in first cache line (i.e.
> RTE_IOVA_IN_MBUF is 0).
>  	 */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)

Good positioning of the alignas().

I like everything in this patch.

Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-04-03 19:32  0%     ` Morten Brørup
@ 2024-04-03 21:49  0%     ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-03 21:49 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Wed,  3 Apr 2024 10:53:34 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
> 
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> 
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
> 
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 19:32  0%     ` Morten Brørup
@ 2024-04-03 22:45  0%       ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 22:45 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang

On Wed, Apr 03, 2024 at 09:32:21PM +0200, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 3 April 2024 19.54
> > 
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> > 
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> > 
> > Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> > unions as single element arrays of with types matching the original
> > markers to maintain API compatibility.
> > 
> > This change breaks the API for cacheline{0,1} fields that have been
> > removed from rte_mbuf but it does not break the ABI, to address the
> > false positives of the removed (but 0 size fields) provide the minimum
> > libabigail.abignore for type = rte_mbuf.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> 
> [...]
> 
> > +	/* remaining 24 bytes are set on RX when pulling packet from
> > descriptor */
> 
> Good.
> 
> >  	union {
> > +		/* void * type of the array elements is retained for driver
> > compatibility. */
> > +		void *rx_descriptor_fields1[24 / sizeof(void *)];
> 
> Good, also the description.
> 
> >  		__extension__
> >  		struct {
> > -			uint8_t l2_type:4;   /**< (Outer) L2 type. */
> > -			uint8_t l3_type:4;   /**< (Outer) L3 type. */
> > -			uint8_t l4_type:4;   /**< (Outer) L4 type. */
> > -			uint8_t tun_type:4;  /**< Tunnel type. */
> > +			/*
> > +			 * The packet type, which is the combination of
> > outer/inner L2, L3, L4
> > +			 * and tunnel types. The packet_type is about data
> > really present in the
> > +			 * mbuf. Example: if vlan stripping is enabled, a
> > received vlan packet
> > +			 * would have RTE_PTYPE_L2_ETHER and not
> > RTE_PTYPE_L2_VLAN because the
> > +			 * vlan is stripped from the data.
> > +			 */
> >  			union {
> > -				uint8_t inner_esp_next_proto;
> > -				/**< ESP next protocol type, valid if
> > -				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> > -				 * on both Tx and Rx.
> > -				 */
> 
> [...]
> 
> > +						/**< ESP next protocol type, valid
> > if
> > +						 * RTE_PTYPE_TUNNEL_ESP tunnel type
> > is set
> > +						 * on both Tx and Rx.
> > +						 */
> > +						uint8_t inner_esp_next_proto;
> 
> Thank you for moving the comments up before the fields.
> 
> Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.

ooh, i'll fix it i'm not well versed in doxygen documentation.

> 
> Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
> If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.

it's easy enough for me to fix the comments in place and bring in a new
version of the series, assuming other reviewers don't object i'll do that.
the diff is already kind of annoying to review in mail without -b
anyway.

> 
> [...]
> 
> >  	/* second cache line - fields only used in slow path or on TX */
> > -	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> > -
> >  #if RTE_IOVA_IN_MBUF
> >  	/**
> >  	 * Next segment of scattered packet. Must be NULL in the last
> >  	 * segment or in case of non-segmented packet.
> >  	 */
> > +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> >  	struct rte_mbuf *next;
> >  #else
> >  	/**
> >  	 * Reserved for dynamic fields
> >  	 * when the next pointer is in first cache line (i.e.
> > RTE_IOVA_IN_MBUF is 0).
> >  	 */
> > +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> 
> Good positioning of the alignas().
> 
> I like everything in this patch.
> 
> Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.

ack, next rev.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib: add get/set link settings interface
  @ 2024-04-04  7:09  4%     ` David Marchand
  2024-04-05  0:55  0%       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-04-04  7:09 UTC (permalink / raw)
  To: Tyler Retzlaff, Marek Pazdan
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

Hello Tyler, Marek,

On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> >  There are link settings parameters available from PMD drivers level
> >  which are currently not exposed to the user via consistent interface.
> >  When interface is available for system level those information can
> >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> >  ETHTOOL_GLINKSETTINGS). There are use cases where
> >  physical interface is passthrough to dpdk driver and is not available
> >  from system level. Information provided by ioctl carries information
> >  useful for link auto negotiation settings among others.
> >
> > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > ---
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 147257d6a2..66aad925d0 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> >  __extension__
> >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> >  };
>
> this breaks the abi. David does libabigail pick this up i wonder?
>

Yes, the CI flagged it.

Looking at the UNH report (in patchwork):
http://mails.dpdk.org/archives/test-report/2024-April/631222.html

1 function with some indirect sub-type change:

[C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
rte_ethdev.c:2972:1 has some indirect sub-type changes:
parameter 2 of type 'rte_eth_link*' has sub-type changes:
in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
type size hasn't changed
2 data member changes:
'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)

Error: ABI issue reported for abidiff --suppr
/home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
reference/usr/local/include --headers-dir2
build_install/usr/local/include
reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
this as a potential issue).


GHA would have caught it too, but the documentation generation failed
before reaching the ABI check.
http://mails.dpdk.org/archives/test-report/2024-April/631086.html


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Community CI Meeting Minutes - April 4, 2024
@ 2024-04-04 16:29  3% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-04 16:29 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

April 4, 2024

#####################################################################
Attendees
1. Patrick Robb
2. Juraj Linkeš
3. Paul Szczepanek
4. Luca Vizzarro

#####################################################################
Minutes

=====================================================================
General Announcements
* DPDK 24.03 has been released
* UNH Community Lab is experiencing power outages, and we are shutting
down testing for the day after this meeting.
   * Will put in retests once we’re back up and running
* Daylight saving time has hit North America, and will also happen in
Europe between this meeting and the next one. Should we adjust?
   * We will adjust earlier 1 hour
* Server Refresh:
   * GB will vote on this soon (I think over email)
   * Patrick sent Nathan some new information about the ARM Grace
server that ARM is requesting, which Nathan is passing along to GB
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
   * UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
   * Reminder - Final conclusion on policy is:
      * A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
      * B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
      * In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).
* Depends-on support: Patrick pinged Thomas about this this morning.
   * https://github.com/getpatchwork/patchwork/issues/583 and
https://github.com/getpatchwork/git-pw/issues/71
* MSVC: Tech board discussed extending the dpdk libraries which
compile with MSVC in CI testing, and making all new libraries which
will be used by Windows require compile using MSVC
   * Some members mentioned difficulty due to burden of running
Windows VM to test their patches against before CI
      * One solution is GitHub actions
      * Honnappa requested lab host a windows VM as a community
resource. Users could SSH onto the lab VPN, and use that machine.
         * Patrick Robbwill follow up on the mailing list to see
whether the ci group approves of this idea.
* DPDK Summit will most likely be in Montreal
   * Once we have a date, Patrick will suggest to GB and TB that
anyone who is interested can visit the lab the date after
   * CFP:
      * Should probably give a DTS update, which can be from Patrick,
other UNH people, Honnappa, maybe Juraj (remotely)
      * UNH folks can probably do a CI testing update
         * Discuss new hardware
         * Discuss new testing
         * Discuss new reporting functionality, retests, depends-on,
other qol stuff

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Dodji Seketeli is requesting information about the Community Lab’s
ABI jobs to investigate an error on his patch
   * Libabigail version is 2.2.0
   * Patrick will send him the .so abi ref dirs this morning.
* Marvell CN10K:
   * TG is working, Octeon DUT can run DPDK apps and forward packets.
   * Can’t figure out how to reconfigure the link speed on the QSFP
port (want 2x100GbE not 4x 50GbE) - will ask Marvell people to SSH on
to set this
   * Also need to verify the correct meson options for native builds on the DUT
      * right now just using “meson setup -Dplatform=cn10k build” from dpdk docs
      * Juraj states that for ARM cpus (which is on this board) you
should be able to natively compile with default options
* SPDK: Working on these compile jobs
   * Currently compile with:
      * Ubuntu 22.04
      * Debian 11
      * Debian 12
      * CentOS 8
      * CentOS 9
      * Fedora 37
      * Fedora 38
      * Fedora 39
      * Opensuse-Leap 15 but with a warning
   * Cannot compile with:
      * Rhel 8
      * Rhel 9
      * SPDK docs state rhel is “best effort”
   * Questions:
      * Should we run with werror enabled?
      * What versions of SPDK do we test?
      * What versions of DPDK do we test SPDK against?
   * Unit tests pass with the distros which are compiling
* OvS DPDK testing:
   * * Lab sent an email to test-report which got blocked because it
was just above 500kb, which is the limit
* Ts-factory redirect added to dpdk community lab dashboard navbar

---------------------------------------------------------------------
Intel Lab
* None

---------------------------------------------------------------------
Github Actions
* None

---------------------------------------------------------------------
Loongarch Lab
* None

---------------------------------------------------------------------
DTS Improvements & Test Development
* Nick’s hugepages patch will be submitted today (or already is).
   * Forces 2mb hugepages
* Nick is starting on porting the jumboframes testsuite now
   * Starting by manually running scapy, testpmd, tcpdump to verify
the function works, then writing the suite in DTS
* Jeremy is working on the context manager for testpmd to ensure it
closes completely before we attempt to start it again for a subsequent
testcase
* Juraj has provided an initial review of Luca’s testpmd params patch,
the implementation may need to be refactored, but the idea of
simplifying the developer user experience is a good goal
* Jeremy Spewockwill write to Juraj about the capabilities patch. UNH
can test this if needed.
* Other than the testcase capabilities check patch, Juraj will be
renaming the dts execution and doing work for supporting pre-built
DPDK for the SUT
* Luca ran into what may have been a paramiko race condition from when
the interactive shell closes. We are unsure what exactly is happening
but we will probably need to hotfix this. Would likely require some
checks when closing the section.
* Luca tried to run from two intel nics, and could bind to vfio-pci,
but then timed out when trying to rebind to i40e. Left with 1
interface bound to vfio, one interface bound to i40e.
   * Can try rebinding the ports with 1 command, instead of 1 by 1
   * Maybe tried to run dpdk-devbind before all DPDK resources had
been released (just speculation)

=====================================================================
Any other business
* Next Meeting: April 20, 2024

^ permalink raw reply	[relevance 3%]

* [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries
    2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51  3% ` Tyler Retzlaff
  2024-04-04 17:51  2%   ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-06-19 15:01  3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
  2 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v11:
  * correct doxygen comment style for field documentation.

v10:
  * move removal notices in in release notes from 24.03 to 24.07.

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

Tyler Retzlaff (4):
  net/i40e: use inline prefetch function
  mbuf: remove rte marker fields
  security: remove rte marker fields
  cryptodev: remove rte marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_07.rst  |   9 ++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   5 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 202 +++++++++++++++++---------------
 lib/security/rte_security_driver.h      |   5 +-
 7 files changed, 129 insertions(+), 104 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v11 2/4] mbuf: remove rte marker fields
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51  2%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_07.rst |   3 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 202 +++++++++++++++++----------------
 4 files changed, 116 insertions(+), 99 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
 [suppress_type]
 	name = rte_eth_fp_ops
 	has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
+
 
 API Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..726c2cf 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						/** ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						uint8_t inner_esp_next_proto;
+						__extension__
+						struct {
+							/** Inner L2 type. */
+							uint8_t inner_l2_type:4;
+							/** Inner L3 type. */
+							uint8_t inner_l3_type:4;
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
+
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							/** Second 4 flexible bytes */
+							uint32_t lo;
+						};
+						/** First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+						uint32_t hi;
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					/** Hierarchical scheduler : 8 bytes */
+					struct rte_mbuf_sched sched;
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						/** The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+						uint16_t txq;
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					/** User defined tags. See rte_distributor_process() */
+					uint32_t usr;
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
-			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			/** L2 (MAC) Header Length for non-tunneling pkt.
 			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
 			 */
+			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
+			/** L3 (IP) Header Length. */
 			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
-			/**< L3 (IP) Header Length. */
+			/** L4 (TCP/UDP) Header Length. */
 			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
-			/**< L4 (TCP/UDP) Header Length. */
+			/** TCP TSO segment size */
 			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
-			/**< TCP TSO segment size */
-
 			/*
 			 * Fields for Tx offloading of tunnels.
 			 * These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
 			 * Applications are expected to set appropriate tunnel
 			 * offload flags when they fill in these fields.
 			 */
+			/** Outer L3 (IP) Hdr Length. */
 			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
-			/**< Outer L3 (IP) Hdr Length. */
+			/** Outer L2 (MAC) Hdr Length. */
 			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
-			/**< Outer L2 (MAC) Hdr Length. */
 
 			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
 		};
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  @ 2024-04-05  0:46  3%   ` Stephen Hemminger
  2024-04-05 15:15  3%   ` Stephen Hemminger
  2024-08-12  9:28  3%   ` Maxime Coquelin
  2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05  0:46 UTC (permalink / raw)
  To: Nicolas Chautru
  Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas

On Thu,  4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:

> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>

Adding field is an ABI change and will have to wait until 24.11 release

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-04  7:09  4%     ` David Marchand
@ 2024-04-05  0:55  0%       ` Tyler Retzlaff
  2024-04-05  0:56  0%         ` Tyler Retzlaff
  2024-04-05  8:58  0%         ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05  0:55 UTC (permalink / raw)
  To: David Marchand
  Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> Hello Tyler, Marek,
> 
> On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > >  There are link settings parameters available from PMD drivers level
> > >  which are currently not exposed to the user via consistent interface.
> > >  When interface is available for system level those information can
> > >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > >  ETHTOOL_GLINKSETTINGS). There are use cases where
> > >  physical interface is passthrough to dpdk driver and is not available
> > >  from system level. Information provided by ioctl carries information
> > >  useful for link auto negotiation settings among others.
> > >
> > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > ---
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > index 147257d6a2..66aad925d0 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > >  __extension__
> > >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> > >  };
> >
> > this breaks the abi. David does libabigail pick this up i wonder?
> >
> 
> Yes, the CI flagged it.
> 
> Looking at the UNH report (in patchwork):
> http://mails.dpdk.org/archives/test-report/2024-April/631222.html

i'm jealous we don't have libabigail on windows, so helpfull.

> 
> 1 function with some indirect sub-type change:
> 
> [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> rte_ethdev.c:2972:1 has some indirect sub-type changes:
> parameter 2 of type 'rte_eth_link*' has sub-type changes:
> in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> type size hasn't changed
> 2 data member changes:
> 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
> 
> Error: ABI issue reported for abidiff --suppr
> /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> reference/usr/local/include --headers-dir2
> build_install/usr/local/include
> reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> 
> GHA would have caught it too, but the documentation generation failed
> before reaching the ABI check.
> http://mails.dpdk.org/archives/test-report/2024-April/631086.html
> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-05  0:55  0%       ` Tyler Retzlaff
@ 2024-04-05  0:56  0%         ` Tyler Retzlaff
  2024-04-05  8:58  0%         ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05  0:56 UTC (permalink / raw)
  To: David Marchand
  Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

On Thu, Apr 04, 2024 at 05:55:18PM -0700, Tyler Retzlaff wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > Hello Tyler, Marek,
> > 
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > > >  There are link settings parameters available from PMD drivers level
> > > >  which are currently not exposed to the user via consistent interface.
> > > >  When interface is available for system level those information can
> > > >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > > >  ETHTOOL_GLINKSETTINGS). There are use cases where
> > > >  physical interface is passthrough to dpdk driver and is not available
> > > >  from system level. Information provided by ioctl carries information
> > > >  useful for link auto negotiation settings among others.
> > > >
> > > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > > ---
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index 147257d6a2..66aad925d0 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > > >  __extension__
> > > >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > > >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > > > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > > >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > > >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> > > >  };
> > >
> > > this breaks the abi. David does libabigail pick this up i wonder?
> > >
> > 
> > Yes, the CI flagged it.
> > 
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
> 
> i'm jealous we don't have libabigail on windows, so helpfull.

s/ll/l/ end of day bah.

> 
> > 
> > 1 function with some indirect sub-type change:
> > 
> > [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> > rte_ethdev.c:2972:1 has some indirect sub-type changes:
> > parameter 2 of type 'rte_eth_link*' has sub-type changes:
> > in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> > type size hasn't changed
> > 2 data member changes:
> > 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> > 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
> > 
> > Error: ABI issue reported for abidiff --suppr
> > /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> > at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> > reference/usr/local/include --headers-dir2
> > build_install/usr/local/include
> > reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> > build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> > ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> > this as a potential issue).
> > 
> > 
> > GHA would have caught it too, but the documentation generation failed
> > before reaching the ABI check.
> > http://mails.dpdk.org/archives/test-report/2024-April/631086.html
> > 
> > 
> > -- 
> > David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-05  0:55  0%       ` Tyler Retzlaff
  2024-04-05  0:56  0%         ` Tyler Retzlaff
@ 2024-04-05  8:58  0%         ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-04-05  8:58 UTC (permalink / raw)
  To: Tyler Retzlaff, Dodji Seketeli; +Cc: Thomas Monjalon, dev

On Fri, Apr 5, 2024 at 2:55 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > > this breaks the abi. David does libabigail pick this up i wonder?
> >
> > Yes, the CI flagged it.
> >
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
>
> i'm jealous we don't have libabigail on windows, so helpful.

libabigail is written in C++ and relies on the elfutils and libxml2 libraries.
I am unclear about what binary format is used in Windows... so I am
not sure how much work would be required to have it on Windows.

That's more something to discuss with Dodji :-).


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
    2024-04-05  0:46  3%   ` Stephen Hemminger
@ 2024-04-05 15:15  3%   ` Stephen Hemminger
  2024-04-05 18:17  3%     ` Chautru, Nicolas
  2024-08-12  9:28  3%   ` Maxime Coquelin
  2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-05 15:15 UTC (permalink / raw)
  To: Nicolas Chautru
  Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas

On Thu,  4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:

> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>  lib/bbdev/rte_bbdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
> index 0cbfdd1c95..25514c58ac 100644
> --- a/lib/bbdev/rte_bbdev.h
> +++ b/lib/bbdev/rte_bbdev.h
> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
>  	 *     bbdev operation
>  	 */
>  	uint64_t acc_offload_cycles;
> +	/** Available number of enqueue batch on that queue. */
> +	uint16_t enqueue_depth_avail;
>  };
>  
>  /**

Doesn't this break the ABI?

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  2024-04-05 15:15  3%   ` Stephen Hemminger
@ 2024-04-05 18:17  3%     ` Chautru, Nicolas
  0 siblings, 0 replies; 200+ results
From: Chautru, Nicolas @ 2024-04-05 18:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, maxime.coquelin, hemant.agrawal, Marchand, David, Vargas, Hernan

Hi Stephen, 

It is not strictly ABI compatible since the size of the structure increases, hence only updating for 24.11. 


> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, April 5, 2024 8:15 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com; hemant.agrawal@nxp.com;
> Marchand, David <david.marchand@redhat.com>; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
> 
> On Thu,  4 Apr 2024 14:04:45 -0700
> Nicolas Chautru <nicolas.chautru@intel.com> wrote:
> 
> > Capturing additional queue stats counter for the depth of enqueue
> > batch still available on the given queue. This can help application to
> > monitor that depth at run time.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >  lib/bbdev/rte_bbdev.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index
> > 0cbfdd1c95..25514c58ac 100644
> > --- a/lib/bbdev/rte_bbdev.h
> > +++ b/lib/bbdev/rte_bbdev.h
> > @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
> >  	 *     bbdev operation
> >  	 */
> >  	uint64_t acc_offload_cycles;
> > +	/** Available number of enqueue batch on that queue. */
> > +	uint16_t enqueue_depth_avail;
> >  };
> >
> >  /**
> 
> Doesn't this break the ABI?

^ permalink raw reply	[relevance 3%]

* [PATCH v6 6/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-05 21:14  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05 21:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v7 5/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-08 21:18  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-08 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v8 5/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-09  3:40  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-09  3:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  @ 2024-04-10 15:58  3%   ` Ferruh Yigit
    1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-10 15:58 UTC (permalink / raw)
  To: Stephen Hemminger, fengchengwen; +Cc: dev, Dengdui Huang

On 4/10/2024 4:27 PM, Stephen Hemminger wrote:
> On Wed, 10 Apr 2024 17:33:53 +0800
> fengchengwen <fengchengwen@huawei.com> wrote:
> 
>> Last: We think there are two ways to solve this problem.
>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK project.
>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please see above).
>> PS: We prefer first way.
>>
> 
> Please send a patch to replace alias with union.
> 

+1

I am not sure about ABI implications, as size is not changing I expect
it won't be an issue but may be good to verify with libabigail.

> PS: you can also override aliasing for a few lines of code with either pragma's
> or lots of casting. Both are messy and hard to maintain.


^ permalink raw reply	[relevance 3%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  @ 2024-04-10 19:58  3%     ` Tyler Retzlaff
  2024-04-11  3:20  0%       ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-10 19:58 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Stephen Hemminger, fengchengwen, Ferruh Yigit, dev, Dengdui Huang

On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 10 April 2024 17.27
> > 
> > On Wed, 10 Apr 2024 17:33:53 +0800
> > fengchengwen <fengchengwen@huawei.com> wrote:
> > 
> > > Last: We think there are two ways to solve this problem.
> > > 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
> > project.
> > > 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
> > see above).
> > > PS: We prefer first way.
> > >
> > 
> > Please send a patch to replace alias with union.
> 
> +1
> 
> Fixing this specific bug would be good.
> 
> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.

agree, union is the correct way to get defined behavior. there are
valuable optimizatons that the compiler can make with strict aliasing
enabled so -Wstrict-aliasing is a good suggestion as opposed to
disabling it.

also the union won't break the abi if introduced correctly.

^ permalink raw reply	[relevance 3%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  2024-04-10 19:58  3%     ` Tyler Retzlaff
@ 2024-04-11  3:20  0%       ` fengchengwen
  0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2024-04-11  3:20 UTC (permalink / raw)
  To: Tyler Retzlaff, Morten Brørup
  Cc: Stephen Hemminger, Ferruh Yigit, dev, Dengdui Huang

Hi All,

On 2024/4/11 3:58, Tyler Retzlaff wrote:
> On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>> Sent: Wednesday, 10 April 2024 17.27
>>>
>>> On Wed, 10 Apr 2024 17:33:53 +0800
>>> fengchengwen <fengchengwen@huawei.com> wrote:
>>>
>>>> Last: We think there are two ways to solve this problem.
>>>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
>>> project.
>>>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
>>> see above).
>>>> PS: We prefer first way.
>>>>
>>>
>>> Please send a patch to replace alias with union.
>>
>> +1
>>
>> Fixing this specific bug would be good.

OK for this,
and I already send a bugfix which use union.

Thanks

>>
>> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
>> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.
> 
> agree, union is the correct way to get defined behavior. there are
> valuable optimizatons that the compiler can make with strict aliasing
> enabled so -Wstrict-aliasing is a good suggestion as opposed to
> disabling it.
> 
> also the union won't break the abi if introduced correctly.
> .
> 

^ permalink raw reply	[relevance 0%]

* [PATCH 0/3] cryptodev: add API to get used queue pair depth
@ 2024-04-11  8:22  3% Akhil Goyal
  2024-04-12 11:57  3% ` [PATCH v2 " Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2024-04-11  8:22 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
	konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
	ciara.power, Akhil Goyal

Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.

An implementation in cnxk crypto driver is also added along with
a test case in test app.

The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.

Akhil Goyal (3):
  cryptodev: add API to get used queue pair depth
  crypto/cnxk: support queue pair depth API
  test/crypto: add QP depth used count case

 app/test/test_cryptodev.c                | 117 +++++++++++++++++++++++
 devtools/libabigail.abignore             |   3 +
 drivers/crypto/cnxk/cn10k_cryptodev.c    |   1 +
 drivers/crypto/cnxk/cn9k_cryptodev.c     |   2 +
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c |  15 +++
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h |   2 +
 lib/cryptodev/cryptodev_pmd.c            |   1 +
 lib/cryptodev/cryptodev_pmd.h            |   2 +
 lib/cryptodev/cryptodev_trace_points.c   |   3 +
 lib/cryptodev/rte_cryptodev.h            |  45 +++++++++
 lib/cryptodev/rte_cryptodev_core.h       |   7 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h   |   7 ++
 12 files changed, 204 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 3/3] dts: add API doc generation
  @ 2024-04-12 10:14  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-04-12 10:14 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py | 33 +++++++++++++++++++---------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 148 insertions(+), 19 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 39a60d09fa..aea771a64e 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,37 +3,50 @@
 # Copyright(c) 2019 Intel Corporation
 #
 
+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run, PIPE, STDOUT
 from packaging.version import Version
 
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
 # for sphinx version >= 1.7 add parallelism using "-j auto"
-ver = run([sphinx, '--version'], stdout=PIPE,
+ver = run([args.sphinx, '--version'], stdout=PIPE,
           stderr=STDOUT).stdout.decode().split()[-1]
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 if Version(ver) >= Version('1.7'):
     sphinx_cmd += ['-j', 'auto']
 
 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )
 
 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')
 
 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 8c1eb8fafa..d5f823b7f0 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -243,3 +243,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 27afec8b3b..2e08c6a452 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -123,6 +123,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'
 
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
 
 master_doc = 'index'
 
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 47b218b2c6..d1c3c2af7a 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -280,7 +280,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -415,6 +420,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+   the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
  2024-04-11  8:22  3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
@ 2024-04-12 11:57  3% ` Akhil Goyal
  2024-05-29 10:43  0%   ` Anoob Joseph
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2024-04-12 11:57 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
	konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
	ciara.power, Akhil Goyal

Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.

An implementation in cnxk crypto driver is also added along with
a test case in test app.

The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.

v2: fixed shared and clang build issues.

Akhil Goyal (3):
  cryptodev: add API to get used queue pair depth
  crypto/cnxk: support queue pair depth API
  test/crypto: add QP depth used count case

 app/test/test_cryptodev.c                | 117 +++++++++++++++++++++++
 devtools/libabigail.abignore             |   3 +
 drivers/crypto/cnxk/cn10k_cryptodev.c    |   1 +
 drivers/crypto/cnxk/cn9k_cryptodev.c     |   2 +
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c |  16 ++++
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h |   2 +
 lib/cryptodev/cryptodev_pmd.c            |   1 +
 lib/cryptodev/cryptodev_pmd.h            |   2 +
 lib/cryptodev/cryptodev_trace_points.c   |   3 +
 lib/cryptodev/rte_cryptodev.h            |  45 +++++++++
 lib/cryptodev/rte_cryptodev_core.h       |   7 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h   |   7 ++
 lib/cryptodev/version.map                |   3 +
 13 files changed, 208 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Community CI Meeting Minutes - April 18, 2024
@ 2024-04-18 17:49  3% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-18 17:49 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

April 18, 2024

#####################################################################
Attendees
1. Patrick Robb
2. Paul Szczepanek
3. Juraj Linkeš
4. Aaron Conole
5. Ali Alnubani

#####################################################################
Minutes

=====================================================================
General Announcements
* GB is wrapping up voting on the UNH Lab server refresh proposal -
should have more info on this by end of week
   * Patrick Robbshare the list of current servers and servers to be
acquired with Paul
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
   * UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
   * Reminder - Final conclusion on policy is:
      * A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
      * B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
      * In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* ABI binaries got sent to Dodji Seketeli after some abi fails came
in, and he confirmed moving to libabigail 2.4 resolves the issue. Cody
Chengis working on this now.
   * To be submitted to upstream template engine:
https://git.dpdk.org/tools/dpdk-ci/tree/containers/template_engine
* SPDK: Working on these compile jobs
   * Currently compile with:
      * Ubuntu 22.04
      * Debian 11
      * Debian 12
      * CentOS 8
      * CentOS 9
      * Fedora 37
      * Fedora 38
      * Fedora 39
      * Opensuse-Leap 15 but with a warning
   * Cannot compile with:
      * Rhel 8
      * Rhel 9
      * SPDK docs state rhel is “best effort”
   * Questions:
      * Should we run with werror enabled?
      * What versions of SPDK do we test?
      * What versions of DPDK do we test SPDK against?
   * Unit tests pass with the distros which are compiling
   * UPDATE: We are polling SPDK people on their Slack, but current
plan is to bring testing online for only distros which work with
werror compile currently. So, not RHEL, no Opensuse.
* OvS DPDK testing:
   * OvS compile still passing on some distros but failing on others -
Adam is going to circle back on this when he gets time
      * Submit tickets for any outstanding issues
      * Bring ovs compile testing online
   * Plans for performance testing are still pending Aaron & David discussing
* Code coverage for fast tests is now running in CI, 1x per month. You
can download the latest reports here:
https://dpdkdashboard.iol.unh.edu/results/dashboard/code-coverage
   * Open out/coveragereport/index.html
   * Do we need code coverage reports for the other unit tests suites?
(not just fast test)
      * UNH to dry run this, share results
* NVIDIA: Gal has offered to send two CX7 NICs to the UNH lab. This
should allow us to install two CX7 NICs on the DUT, and start
forwarding between the two NICs.
* Pcapng_autotest
   * UNH has some spurious failures reported to patchwork for Debian
12. Need to reconnect with Stephen to debug this further.
* Updating Coverity binaries at UNH

---------------------------------------------------------------------
Intel Lab
* Patrick pinged John M again about a lab contact

---------------------------------------------------------------------
Github Actions
* No new updates

---------------------------------------------------------------------
Loongarch Lab
* None

=====================================================================
DTS Improvements & Test Development
* API docs generation:
   * Reviews are needed for this. Need an ACK from
bruce.richardson@intel.com now that there are some new changes on the
meson side
   * Thomas wants to link DTS api docs from doxygen from dpdk docs
   * UNH folks should provide a review
* Jeremy is switching back to DTS next week, and will be working more
on the 2nd scatter case for MLNX, which will rely on the capabilities
querying (and testcase skipping) patch. Will provide feedback to Juraj
on that patch soon.
* Hugepages patch is updated based on feedback from Morten, but
essentially the same (in approach) as last week.

=====================================================================
Any other business
* DPDK Summit in Montreal will now be late September. This plan is
still being finalized.
* Next Meeting: May 1, 2024

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port
  @ 2024-04-23 11:17  0%   ` lihuisong (C)
  0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2024-04-23 11:17 UTC (permalink / raw)
  To: dev, thomas, ferruh.yigit
  Cc: andrew.rybchenko, fengchengwen, liudongdong3, liuyonglong

Hi Ferruh and Thomas,

It's been almost two years since this issue was reported.
We have discussed a lot before, and also made some progress and consensus.
Can you take a look at it again?  Looking forward to your reply.

BR/
Huisong


在 2024/1/30 14:36, Huisong Li 写道:
> This patchset fix some bugs and support attaching and detaching port
> in primary and secondary.
>
> ---
>   -v7: fix conflicts
>   -v6: adjust rte_eth_dev_is_used position based on alphabetical order
>        in version.map
>   -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
>   -v4: fix a misspelling.
>   -v3:
>     #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
>        for other bus type.
>     #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
>        the probelm in patch 2/5.
>   -v2: resend due to CI unexplained failure.
>
> Huisong Li (5):
>    drivers/bus: restore driver assignment at front of probing
>    ethdev: fix skip valid port in probing callback
>    app/testpmd: check the validity of the port
>    app/testpmd: add attach and detach port for multiple process
>    app/testpmd: stop forwarding in new or destroy event
>
>   app/test-pmd/testpmd.c                   | 47 +++++++++++++++---------
>   app/test-pmd/testpmd.h                   |  1 -
>   drivers/bus/auxiliary/auxiliary_common.c |  9 ++++-
>   drivers/bus/dpaa/dpaa_bus.c              |  9 ++++-
>   drivers/bus/fslmc/fslmc_bus.c            |  8 +++-
>   drivers/bus/ifpga/ifpga_bus.c            | 12 ++++--
>   drivers/bus/pci/pci_common.c             |  9 ++++-
>   drivers/bus/vdev/vdev.c                  | 10 ++++-
>   drivers/bus/vmbus/vmbus_common.c         |  9 ++++-
>   drivers/net/bnxt/bnxt_ethdev.c           |  3 +-
>   drivers/net/bonding/bonding_testpmd.c    |  1 -
>   drivers/net/mlx5/mlx5.c                  |  2 +-
>   lib/ethdev/ethdev_driver.c               | 13 +++++--
>   lib/ethdev/ethdev_driver.h               | 12 ++++++
>   lib/ethdev/ethdev_pci.h                  |  2 +-
>   lib/ethdev/rte_class_eth.c               |  2 +-
>   lib/ethdev/rte_ethdev.c                  |  4 +-
>   lib/ethdev/rte_ethdev.h                  |  4 +-
>   lib/ethdev/version.map                   |  1 +
>   19 files changed, 114 insertions(+), 44 deletions(-)
>

^ permalink raw reply	[relevance 0%]

* getting rid of type argument to rte_malloc().
@ 2024-04-24  4:08  3% Stephen Hemminger
  2024-04-24 10:29  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-24  4:08 UTC (permalink / raw)
  To: dev

For the 24.11 release, I want to remove the unused type string argument
that shows up in rte_malloc() and related functions, then percolates down
through.  It was a idea in the 1.0 release of DPDK, never implemented and
never removed.  Yes it will cause API breakage, a large sweeping change;
probably easily scripted with coccinelle.

Maybe doing ABI version now?

^ permalink raw reply	[relevance 3%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24  4:08  3% getting rid of type argument to rte_malloc() Stephen Hemminger
@ 2024-04-24 10:29  0% ` Ferruh Yigit
  2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 19:06  0%   ` Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2024-04-24 10:29 UTC (permalink / raw)
  To: Stephen Hemminger, dev

On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> For the 24.11 release, I want to remove the unused type string argument
> that shows up in rte_malloc() and related functions, then percolates down
> through.  It was a idea in the 1.0 release of DPDK, never implemented and
> never removed.  Yes it will cause API breakage, a large sweeping change;
> probably easily scripted with coccinelle.
> 
> Maybe doing ABI version now?
>

Won't this impact many applications, is there big enough motivation to
force many DPDK applications to update their code, living with it looks
simpler.


^ permalink raw reply	[relevance 0%]

* Minutes of DPDK Technical Board Meeting, 2024-04-03
@ 2024-04-24 15:24  3% Thomas Monjalon
  2024-04-24 17:25  3% ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-04-24 15:24 UTC (permalink / raw)
  To: dev; +Cc: techboard

Members Attending: 10/11
	- Aaron Conole
	- Bruce Richardson
	- Hemant Agrawal
	- Honnappa Nagarahalli
	- Kevin Traynor
	- Konstantin Ananyev
	- Maxime Coquelin
	- Morten Brørup
	- Stephen Hemminger
	- Thomas Monjalon (Chair)

NOTE: The Technical Board meetings take place every second Wednesday at 3 pm UTC
on https://zoom-lfx.platform.linuxfoundation.org/meeting/96459488340?password=d808f1f6-0a28-4165-929e-5a5bcae7efeb
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes


1/ MSVC

Work to be able to compile DPDK with MSVC is progressing.

Regarding the tooling, UNH CI is testing MSVC in Windows Server 2022 job.
There was an ask for GHA job building with MSVC.
Example:
	https://github.com/danielzsh/spark/blob/master/.github/workflows/compile.yml

We should not break MSVC compilation for enabled libraries.
When creating a new library, we should require to allow MSVC where it makes sense.
Some guidelines could be added in doc/guides/contributing/design.rst


2/ function inlining

There are pros and cons for function inlining.

There should not be inlining in control path functions.
Inlining should be avoided in public headers because of ABI compatibility issue
and structures being exported because of inline requirement.

Inlining should be used with care, with benchmarks as a proof of efficiency.
Having too much inlining will have a drawback on instruction cache,
that's why we should justify any new usage of inline.

Note that the same recommendations apply with the use of prefetch and likely/unlikely.



^ permalink raw reply	[relevance 3%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 10:29  0% ` Ferruh Yigit
@ 2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 16:23  0%     ` Stephen Hemminger
  2024-04-24 17:09  0%     ` Morten Brørup
  2024-04-24 19:06  0%   ` Stephen Hemminger
  1 sibling, 2 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 

Yeah, probably too big an impact but at least:
  - change the documentation to say "do not use" should be NULL
  - add script to remove all usage inside of DPDK
  - get rid of places where useless arg is passed around inside
    of the allocator internals.

^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 16:23  0%   ` Stephen Hemminger
@ 2024-04-24 16:23  0%     ` Stephen Hemminger
  2024-04-24 17:09  0%     ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 

Yeah, probably too big an impact but at least:
  - change the documentation to say "do not use" should be NULL
  - add script to remove all usage inside of DPDK
  - get rid of places where useless arg is passed around inside
    of the allocator internals.

^ permalink raw reply	[relevance 0%]

* RE: getting rid of type argument to rte_malloc().
  2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 16:23  0%     ` Stephen Hemminger
@ 2024-04-24 17:09  0%     ` Morten Brørup
  2024-04-24 19:05  0%       ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:09 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit; +Cc: dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 24 April 2024 18.24
> 
> On Wed, 24 Apr 2024 11:29:51 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
> > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > > For the 24.11 release, I want to remove the unused type string
> argument
> > > that shows up in rte_malloc() and related functions, then percolates
> down
> > > through.  It was a idea in the 1.0 release of DPDK, never
> implemented and
> > > never removed.  Yes it will cause API breakage, a large sweeping
> change;
> > > probably easily scripted with coccinelle.
> > >
> > > Maybe doing ABI version now?
> > >
> >
> > Won't this impact many applications, is there big enough motivation to
> > force many DPDK applications to update their code, living with it
> looks
> > simpler.
> >
> 
> Yeah, probably too big an impact but at least:
>   - change the documentation to say "do not use" should be NULL
>   - add script to remove all usage inside of DPDK
>   - get rid of places where useless arg is passed around inside
>     of the allocator internals.

For the sake of discussion:
Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.

Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?


^ permalink raw reply	[relevance 0%]

* RE: Minutes of DPDK Technical Board Meeting, 2024-04-03
  2024-04-24 15:24  3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
@ 2024-04-24 17:25  3% ` Morten Brørup
  2024-04-24 19:10  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:25 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: techboard

> Inlining should be avoided in public headers because of ABI
> compatibility issue
> and structures being exported because of inline requirement.

This sounds like a techboard decision, which I don't think it was.
Suggested wording:

A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.


Perhaps I'm being paranoid, and the phrase "should be" already suffices.

Whichever wording you prefer,
ACK


^ permalink raw reply	[relevance 3%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 17:09  0%     ` Morten Brørup
@ 2024-04-24 19:05  0%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:05 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ferruh Yigit, dev

On Wed, 24 Apr 2024 19:09:24 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 24 April 2024 18.24
> > 
> > On Wed, 24 Apr 2024 11:29:51 +0100
> > Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >   
> > > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:  
> > > > For the 24.11 release, I want to remove the unused type string  
> > argument  
> > > > that shows up in rte_malloc() and related functions, then percolates  
> > down  
> > > > through.  It was a idea in the 1.0 release of DPDK, never  
> > implemented and  
> > > > never removed.  Yes it will cause API breakage, a large sweeping  
> > change;  
> > > > probably easily scripted with coccinelle.
> > > >
> > > > Maybe doing ABI version now?
> > > >  
> > >
> > > Won't this impact many applications, is there big enough motivation to
> > > force many DPDK applications to update their code, living with it  
> > looks  
> > > simpler.
> > >  
> > 
> > Yeah, probably too big an impact but at least:
> >   - change the documentation to say "do not use" should be NULL
> >   - add script to remove all usage inside of DPDK
> >   - get rid of places where useless arg is passed around inside
> >     of the allocator internals.  
> 
> For the sake of discussion:
> Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.

The name is used by memzone lookup for secondary process etc.

> 
> Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?
> 

Not really.

^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 10:29  0% ` Ferruh Yigit
  2024-04-24 16:23  0%   ` Stephen Hemminger
@ 2024-04-24 19:06  0%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:06 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 


Something like this script, and fix up the result.

From 13ec14dff523f6e896ab55a17a3c66b45bd90bbc Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 24 Apr 2024 09:39:27 -0700
Subject: [PATCH] devtools/cocci: add script to find unnecessary malloc type

The malloc type argument is unused and should be NULL.
This script finds and fixes those places.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/cocci/malloc-type.cocci | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 devtools/cocci/malloc-type.cocci

diff --git a/devtools/cocci/malloc-type.cocci b/devtools/cocci/malloc-type.cocci
new file mode 100644
index 0000000000..cd74797ecb
--- /dev/null
+++ b/devtools/cocci/malloc-type.cocci
@@ -0,0 +1,33 @@
+//
+// The Ting type field in malloc routines was never
+// implemented and should be NULL
+//
+@@
+expression T != NULL;
+expression num, socket, size, align;
+@@
+(
+- rte_malloc(T, size, align)
++ rte_malloc(NULL, size, align)
+|
+- rte_zmalloc(T, size, align)
++ rte_zmalloc(NULL,  size, align)
+|
+- rte_calloc(T, num, size, align)
++ rte_calloc(NULL, num, size, align)
+|
+- rte_realloc(T, size, align)
++ rte_realloc(NULL, size, align)
+|
+- rte_realloc_socket(T, size, align, socket)
++ rte_realloc_socket(NULL, size, align, socket)
+|
+- rte_malloc_socket(T, size, align, socket)
++ rte_malloc_socket(NULL, size, align, socket)
+|
+- rte_zmalloc_socket(T, size, align, socket)
++ rte_zmalloc_socket(NULL, size, align, socket)
+|
+- rte_calloc_socket(T, num, size, align, socket)
++ rte_calloc_socket(NULL, num, size, align, socket)
+)
-- 
2.43.0


^ permalink raw reply	[relevance 0%]

* Re: Minutes of DPDK Technical Board Meeting, 2024-04-03
  2024-04-24 17:25  3% ` Morten Brørup
@ 2024-04-24 19:10  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-04-24 19:10 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev, techboard

24/04/2024 19:25, Morten Brørup:
> > Inlining should be avoided in public headers because of ABI
> > compatibility issue
> > and structures being exported because of inline requirement.
> 
> This sounds like a techboard decision, which I don't think it was.
> Suggested wording:
> 
> A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.
> 
> 
> Perhaps I'm being paranoid, and the phrase "should be" already suffices.
> 
> Whichever wording you prefer,
> ACK

This is the final report sent to dev@dpdk.org :)
Yes I think the word "should" reflect what was said
during the meeting without any formal vote.



^ permalink raw reply	[relevance 0%]

* [PATCH v9 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-04-26 15:48  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-26 15:48 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v10 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-05-01 16:12  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 394 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..181f76a134
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across mulitple queues if required by a flow action. The program is
+loaded into the krnel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a targe architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Usws clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] net/af_packet: fix statistics
  @ 2024-05-01 16:44  3%   ` Stephen Hemminger
  2024-05-01 18:18  0%     ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:44 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, John W. Linville, Mattias Rönnblom

On Wed, 1 May 2024 17:25:59 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >  - Remove the tx_error counter since it was not correct.
> >    When transmit ring is full it is not an error and
> >    the driver correctly returns only the number sent.
> >   
> 
> nack
> Transmit full is not only return case here.
> There are actual errors continue to process relying this error calculation.
> Also there are error cases like interface down.
> Those error cases should be handled individually if we remove this.
> I suggest split this change to separate patch.

I see multiple drivers have copy/pasted same code and consider
transmit full as an error. It is not.

There should be a new statistic at ethdev layer that does record
transmit full, and make it across all drivers, but that would have
to wait for ABI change.

^ permalink raw reply	[relevance 3%]

* RE: [PATCH] net/af_packet: fix statistics
  2024-05-01 16:44  3%   ` Stephen Hemminger
@ 2024-05-01 18:18  0%     ` Morten Brørup
  2024-05-02 13:47  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-05-01 18:18 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: dev, John W. Linville, Mattias Rönnblom

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 1 May 2024 18.45
> 
> On Wed, 1 May 2024 17:25:59 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
> > >  - Remove the tx_error counter since it was not correct.
> > >    When transmit ring is full it is not an error and
> > >    the driver correctly returns only the number sent.
> > >
> >
> > nack
> > Transmit full is not only return case here.
> > There are actual errors continue to process relying this error
> calculation.
> > Also there are error cases like interface down.
> > Those error cases should be handled individually if we remove this.
> > I suggest split this change to separate patch.
> 
> I see multiple drivers have copy/pasted same code and consider
> transmit full as an error. It is not.

+1
Transmit full is certainly not an error!

> 
> There should be a new statistic at ethdev layer that does record
> transmit full, and make it across all drivers, but that would have
> to wait for ABI change.

What happens to these non-transmittable packets depend on the application.
Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.

In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.


^ permalink raw reply	[relevance 0%]

* [PATCH v11 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-05-02  2:49  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02  2:49 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] net/af_packet: fix statistics
  2024-05-01 18:18  0%     ` Morten Brørup
@ 2024-05-02 13:47  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-05-02 13:47 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: dev, John W. Linville, Mattias Rönnblom

On 5/1/2024 7:18 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Wednesday, 1 May 2024 18.45
>>
>> On Wed, 1 May 2024 17:25:59 +0100
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>>>  - Remove the tx_error counter since it was not correct.
>>>>    When transmit ring is full it is not an error and
>>>>    the driver correctly returns only the number sent.
>>>>
>>>
>>> nack
>>> Transmit full is not only return case here.
>>> There are actual errors continue to process relying this error
>> calculation.
>>> Also there are error cases like interface down.
>>> Those error cases should be handled individually if we remove this.
>>> I suggest split this change to separate patch.
>>
>> I see multiple drivers have copy/pasted same code and consider
>> transmit full as an error. It is not.
> 
> +1
> Transmit full is certainly not an error!
> 

I am not referring to the transmit full case, there are error cases in
the driver:
- oversized packets
- vlan inserting failure

In above cases Tx loop continues, which relies at the end of the loop
these packets will be counted as error. We can't just remove error
counter, need to handle above.


- poll on fd fails
- poll on fd returns POLLERR (if down)

In above cases driver Tx loop breaks and all remaining packets counted
as error.


- sendto() fails

All packets sent to af_packet frame counted as error.


As you can see there are real error cases which are handled in the driver.
That is why instead of just removing error counter, I suggest handle it
more properly in a separate patch.

>>
>> There should be a new statistic at ethdev layer that does record
>> transmit full, and make it across all drivers, but that would have
>> to wait for ABI change.
> 
> What happens to these non-transmittable packets depend on the application.
> Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
> Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.
> 
> In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.
> 


^ permalink raw reply	[relevance 0%]

* [PATCH v12 06/12] net/tap: rewrite the RSS BPF program
    @ 2024-05-02 21:31  2%   ` Stephen Hemminger
    2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02 21:31 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD
  @ 2024-05-03 13:24  3%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-05-03 13:24 UTC (permalink / raw)
  To: Tom Jones; +Cc: dev

On Fri, May 03, 2024 at 02:12:58PM +0100, Tom Jones wrote:
> Hi Bruce,
> 
> thanks for letting me know
> 
> I'm not tied to anything particularly. This change isn't compatible with the previous API, but I'm not against making it so if that is really the best thing to do. As is, the dpdk changes and the contigmem changes need to come together because the API changes for getting the physical addresses.
> 

I don't think it's a major problem if the new kernel code doesn't work with the
older DPDK userspace code, we can apply both together in one patch.
However, it would count as an API/ABI change so would need to be deferred
for merge to 24.11 release, I think.


> It is just the sysctl paths that differ. I'm not sure what the compatibility needs to be for DPDK, for all of my usage I have built the kernel module with the package - making API changes easy.
> 
> I'm happy to follow which ever path you think is best.
> 

I'll maybe give more thoughts on this once I try the patch out. Hopefully
I'll get to test it out this afternoon. Don't bother trying to rework
anything until then! :-)

> Sorry for the patch confusion, I'll try to keep the sequence obvious going forward.
> 

No problem. Thanks for the contribution here. FreeBSD support has sadly
been lacking a number of features for some time now, so all changes to
close the feature gap vs linux are very welcome!

/Bruce


^ permalink raw reply	[relevance 3%]

* RE: [RFC v2] net/af_packet: make stats reset reliable
  @ 2024-05-07 16:00  3%           ` Morten Brørup
  2024-05-07 16:54  0%             ` Ferruh Yigit
  2024-05-08  7:48  0%             ` Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2024-05-07 16:00 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
	Mattias Rönnblom

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 7 May 2024 16.51

> I would prefer that the SW statistics be handled generically by ethdev
> layers and used by all such drivers.

I agree.

Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)

Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.

> 
> The most complete version of SW stats now is in the virtio driver.

It looks like the virtio PMD maintains the counters; they are not retrieved from the host.

Considering a DPDK application running as a virtual machine (guest) on a host server...

If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?

Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?

> If reset needs to be reliable (debatable), then it needs to be done without
> atomics.

Let's modify that slightly: Without performance degradation in the fast path.
I'm not sure that all atomic operations are slow.
But you are right that it needs to be done without _Atomic counters; they seem to be slow.


^ permalink raw reply	[relevance 3%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:00  3%           ` Morten Brørup
@ 2024-05-07 16:54  0%             ` Ferruh Yigit
  2024-05-07 18:47  0%               ` Stephen Hemminger
  2024-05-08  7:48  0%             ` Mattias Rönnblom
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-05-07 16:54 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
	Mattias Rönnblom

On 5/7/2024 5:00 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
> 
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
> 
> I agree.
> 
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> 
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> 

I am against ehtdev layer being aware of SW drivers and behave
differently for them.
This is dev_ops and can be managed per driver. We can add helper
functions for drivers if there is a common pattern.

>>
>> The most complete version of SW stats now is in the virtio driver.
> 
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
> 
> Considering a DPDK application running as a virtual machine (guest) on a host server...
> 
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
> 
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
> 
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
> 
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.
> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
> 


^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:54  0%             ` Ferruh Yigit
@ 2024-05-07 18:47  0%               ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-07 18:47 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Morten Brørup, Mattias Rönnblom, John W. Linville,
	Thomas Monjalon, dev, Mattias Rönnblom

On Tue, 7 May 2024 17:54:18 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 5/7/2024 5:00 PM, Morten Brørup wrote:
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Tuesday, 7 May 2024 16.51  
> >   
> >> I would prefer that the SW statistics be handled generically by ethdev
> >> layers and used by all such drivers.  
> > 
> > I agree.
> > 
> > Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> > 
> > Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> >   
> 
> I am against ehtdev layer being aware of SW drivers and behave
> differently for them.
> This is dev_ops and can be managed per driver. We can add helper
> functions for drivers if there is a common pattern.

It is more about having a set of helper routines for SW only drivers.
I have something in progress for this.

^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:00  3%           ` Morten Brørup
  2024-05-07 16:54  0%             ` Ferruh Yigit
@ 2024-05-08  7:48  0%             ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-05-08  7:48 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, Ferruh Yigit
  Cc: John W. Linville, Thomas Monjalon, dev, Mattias Rönnblom

On 2024-05-07 18:00, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
> 
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
> 
> I agree.
> 
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> 
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> 
>>
>> The most complete version of SW stats now is in the virtio driver.
> 
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
> 
> Considering a DPDK application running as a virtual machine (guest) on a host server...
> 
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
> 
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
> 
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
> 
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.

Relaxed atomic loads from and stores to naturally aligned addresses are 
for free on ARM and x86_64 up to at least 64 bits.

"For free" is not entirely true, since both C11 relaxed stores and 
stores through volatile may prevent vectorization in GCC. I don't see 
why, but in practice that seems to be the case. That is very much a 
corner case.

Also, as mentioned before, C11 atomic store effectively has volatile 
semantics, which in turn may prevent some compiler optimizations.

On 32-bit x86, 64-bit atomic stores use xmm registers, but those are 
going to be used anyway, since you'll have a 64-bit add.

> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
> 

_Atomic is not slower than atomics without _Atomic, when you actually 
need atomic operations.

^ permalink raw reply	[relevance 0%]

* [PATCH 1/9] doc: reword design section in contributors guidelines
  @ 2024-05-13 15:59  6% ` Nandini Persad
    1 sibling, 0 replies; 200+ results
From: Nandini Persad @ 2024-05-13 15:59 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

minor editing for grammar and syntax of design section

Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
---
 .mailmap                           |  1 +
 doc/guides/contributing/design.rst | 79 ++++++++++++++----------------
 doc/guides/linux_gsg/sys_reqs.rst  |  2 +-
 3 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/.mailmap b/.mailmap
index 66ebc20666..7d4929c5d1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1002,6 +1002,7 @@ Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
 Nalla Pradeep <pnalla@marvell.com>
 Na Na <nana.nn@alibaba-inc.com>
 Nan Chen <whutchennan@gmail.com>
+Nandini Persad <nandinipersad361@gmail.com>
 Nannan Lu <nannan.lu@intel.com>
 Nan Zhou <zhounan14@huawei.com>
 Narcisa Vasile <navasile@linux.microsoft.com> <navasile@microsoft.com> <narcisa.vasile@microsoft.com>
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index b724177ba1..921578aec5 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -8,22 +8,26 @@ Design
 Environment or Architecture-specific Sources
 --------------------------------------------
 
-In DPDK and DPDK applications, some code is specific to an architecture (i686, x86_64) or to an executive environment (freebsd or linux) and so on.
-As far as is possible, all such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+In DPDK and DPDK applications, some code is architecture-specific (i686, x86_64) or  environment-specific (FreeBsd or Linux, etc.).
+When feasible, such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
 
-By convention, a file is common if it is not located in a directory indicating that it is specific.
-For instance, a file located in a subdir of "x86_64" directory is specific to this architecture.
+By convention, a file is specific if the directory is indicated. Otherwise, it is common.
+
+For example:
+
+A file located in a subdir of "x86_64" directory is specific to this architecture.
 A file located in a subdir of "linux" is specific to this execution environment.
 
 .. note::
 
    Code in DPDK libraries and applications should be generic.
-   The correct location for architecture or executive environment specific code is in the EAL.
+   The correct location for architecture or executive environment-specific code is in the EAL.
+
+When necessary, there are several ways to handle specific code:
 
-When absolutely necessary, there are several ways to handle specific code:
 
-* Use a ``#ifdef`` with a build definition macro in the C code.
-  This can be done when the differences are small and they can be embedded in the same C file:
+* When the differences are small and they can be embedded in the same C file, use a ``#ifdef`` with a build definition macro in the C code.
+
 
   .. code-block:: c
 
@@ -33,9 +37,9 @@ When absolutely necessary, there are several ways to handle specific code:
      titi();
      #endif
 
-* Use build definition macros and conditions in the Meson build file. This is done when the differences are more significant.
-  In this case, the code is split into two separate files that are architecture or environment specific.
-  This should only apply inside the EAL library.
+* When the differences are more significant, use build definition macros and conditions in the Meson build file.
+In this case, the code is split into two separate files that are architecture or environment specific.
+This should only apply inside the EAL library.
 
 Per Architecture Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +47,7 @@ Per Architecture Sources
 The following macro options can be used:
 
 * ``RTE_ARCH`` is a string that contains the name of the architecture.
-* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined only if we are building for those architectures.
+* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined when building for these architectures.
 
 Per Execution Environment Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -51,30 +55,21 @@ Per Execution Environment Sources
 The following macro options can be used:
 
 * ``RTE_EXEC_ENV`` is a string that contains the name of the executive environment.
-* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only if we are building for this execution environment.
+* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only when building for this execution environment.
 
 Mbuf features
 -------------
 
-The ``rte_mbuf`` structure must be kept small (128 bytes).
-
-In order to add new features without wasting buffer space for unused features,
-some fields and flags can be registered dynamically in a shared area.
-The "dynamic" mbuf area is the default choice for the new features.
-
-The "dynamic" area is eating the remaining space in mbuf,
-and some existing "static" fields may need to become "dynamic".
+A designated area in mbuf stores "dynamically" registered fields and flags. It is the default choice for accomodating new features. The "dynamic" area consumes the remaining space in the mbuf, indicating that it's being efficiently utilized. However, the ``rte_mbuf`` structure must be kept small (128 bytes).
 
-Adding a new static field or flag must be an exception matching many criteria
-like (non exhaustive): wide usage, performance, size.
+As more features are added, the space for existinG=g "static" fields (fields that are allocated statically) may need to be reconsidered and possibly converted to "dynamic" allocation. Adding a new static field or flag should be an exception. It must meet specific criteria including widespread usage, performance impact, and size considerations. Before adding a new static feature, it must be justified by its necessity and its impact on the system's efficiency.
 
 
 Runtime Information - Logging, Tracing and Telemetry
 ----------------------------------------------------
 
-It is often desirable to provide information to the end-user
-as to what is happening to the application at runtime.
-DPDK provides a number of built-in mechanisms to provide this introspection:
+The end user may inquire as to what is happening to the application at runtime.
+DPDK provides several built-in mechanisms to provide these insights:
 
 * :ref:`Logging <dynamic_logging>`
 * :doc:`Tracing <../prog_guide/trace_lib>`
@@ -82,11 +77,11 @@ DPDK provides a number of built-in mechanisms to provide this introspection:
 
 Each of these has its own strengths and suitabilities for use within DPDK components.
 
-Below are some guidelines for when each should be used:
+Here are guidelines for when each mechanism should be used:
 
 * For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
-  Depending on the severity of the issue, the appropriate log level, for example,
-  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+  For example, depending on the severity of the issue, the appropriate log level,
+  ``ERROR``, ``WARNING`` or ``NOTICE`` should be used.
 
 .. note::
 
@@ -96,22 +91,21 @@ Below are some guidelines for when each should be used:
 
 * For component initialization, or other cases where a path through the code
   is only likely to be taken once,
-  either *logging* at ``DEBUG`` level or *tracing* may be used, or potentially both.
+  either *logging* at ``DEBUG`` level or *tracing* may be used, or both.
   In the latter case, tracing can provide basic information as to the code path taken,
   with debug-level logging providing additional details on internal state,
-  not possible to emit via tracing.
+  which is not possible to emit via tracing.
 
 * For a component's data-path, where a path is to be taken multiple times within a short timeframe,
   *tracing* should be used.
   Since DPDK tracing uses `Common Trace Format <https://diamon.org/ctf/>`_ for its tracing logs,
   post-analysis can be done using a range of external tools.
 
-* For numerical or statistical data generated by a component, for example, per-packet statistics,
+* For numerical or statistical data generated by a component, such as per-packet statistics,
   *telemetry* should be used.
 
-* For any data where the data may need to be gathered at any point in the execution
-  to help assess the state of the application component,
-  for example, core configuration, device information, *telemetry* should be used.
+* For any data that may need to be gathered at any point during the execution
+  to help assess the state of the application component (for example, core configuration, device information) *telemetry* should be used.
   Telemetry callbacks should not modify any program state, but be "read-only".
 
 Many libraries also include a ``rte_<libname>_dump()`` function as part of their API,
@@ -135,13 +129,12 @@ requirements for preventing ABI changes when implementing statistics.
 Mechanism to allow the application to turn library statistics on and off
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Having runtime support for enabling/disabling library statistics is recommended,
-as build-time options should be avoided. However, if build-time options are used,
-for example as in the table library, the options can be set using c_args.
-When this flag is set, all the counters supported by current library are
+Having runtime support for enabling/disabling library statistics is recommended
+as build-time options should be avoided. However, if build-time options are used, as in the table library, the options can be set using c_args.
+When this flag is set, all the counters supported by the current library are
 collected for all the instances of every object type provided by the library.
 When this flag is cleared, none of the counters supported by the current library
-are collected for any instance of any object type provided by the library:
+are collected for any instance of any object type provided by the library.
 
 
 Prevention of ABI changes due to library statistics support
@@ -165,8 +158,8 @@ Motivation to allow the application to turn library statistics on and off
 
 It is highly recommended that each library provides statistics counters to allow
 an application to monitor the library-level run-time events. Typical counters
-are: number of packets received/dropped/transmitted, number of buffers
-allocated/freed, number of occurrences for specific events, etc.
+are: the number of packets received/dropped/transmitted, the number of buffers
+allocated/freed, the number of occurrences for specific events, etc.
 
 However, the resources consumed for library-level statistics counter collection
 have to be spent out of the application budget and the counters collected by
@@ -229,5 +222,5 @@ Developers should work with the Linux Kernel community to get the required
 functionality upstream. PF functionality should only be added to DPDK for
 testing and prototyping purposes while the kernel work is ongoing. It should
 also be marked with an "EXPERIMENTAL" tag. If the functionality isn't
-upstreamable then a case can be made to maintain the PF functionality in DPDK
+upstreamable, then a case can be made to maintain the PF functionality in DPDK
 without the EXPERIMENTAL tag.
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 13be715933..0569c5cae6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -99,7 +99,7 @@ e.g. :doc:`../nics/index`
 Running DPDK Applications
 -------------------------
 
-To run a DPDK application, some customization may be required on the target machine.
+To run a DPDK application, customization may be required on the target machine.
 
 System Software
 ~~~~~~~~~~~~~~~
-- 
2.34.1


^ permalink raw reply	[relevance 6%]

* Re: [PATCH v12 02/12] net/tap: do not duplicate fd's
  @ 2024-05-20 18:16  3%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-20 18:16 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Mon, 20 May 2024 18:46:30 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> > The TAP device can use same file descriptor for both rx and tx queues.
> > This allows up to 8 queues (versus 4) to be used with secondary process.
> >   
> 
> It would be nice to briefly update where this limit comes from, as
> removing this limitation can be longer term solution for this issue.

Sure, the limit comes from a too low value in RTE_MP_MAX_FD_NUM (8)
It should have been set to the max Linux supports which is SCM_MAX_FD
(253).

But fixing it their breaks the ABI, and needs to wait for 24.11.
Impacts AF_XDP as well.

By not duplicating we can make TAP work with 8 queues and still
get MP support. Good idea anyway not to waste fd's.

> 
> > Bugzilla ID: 1381
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >  
> 
> Can you please move the relevant release notes update to this patch?
> So we can distribute the release notes update to patches instead of
> dedicated update for it.
> 
> Except from above change,
> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
> 


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
  @ 2024-05-20 21:42  3%         ` Luca Boccassi
  2024-05-20 22:08  0%           ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2024-05-20 21:42 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ferruh Yigit, Christian Ehrhardt, Patrick Robb, dpdklab,
	Aaron Conole, dev

On Mon, 20 May 2024 at 19:43, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 20 May 2024 18:49:19 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> > On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> > > There were multiple issues in the RSS queue support in the TAP
> > > driver. This required extensive rework of the BPF support.
> > >
> > > Change the BPF loading to use bpftool to
> > > create a skeleton header file, and load with libbpf.
> > > The BPF is always compiled from source so less chance that
> > > source and instructions diverge. Also resolves issue where
> > > libbpf and source get out of sync. The program
> > > is only loaded once, so if multiple rules are created
> > > only one BPF program is loaded in kernel.
> > >
> > > The new BPF program only needs a single action.
> > > No need for action and re-classification step.
> > >
> > > It also fixes the missing bits from the original.
> > >     - supports setting RSS key per flow
> > >     - level of hash can be L3 or L3/L4.
> > >
> > > Bugzilla ID: 1329
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > >
> >
> >
> > The libbpf version in my Ubuntu box, installed with package manager, is
> > 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
> > is not compiled for me.
> >
> >
> > @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
> > there a reason Ubuntu stick to this version? And can we expect an update
> > soon?
> >
> >
> > @Patric, I assume test environment also doesn't have 'libbpf', version:
> > '>= 1.0' which we need to test this feature.
> > Is it possible to update test environment to justify this dependency?
> >
> > I think we need to verify at least build (with and without dependency
> > met) for the set.
>
> The BPF API changed a lot, and it is not really possible to support
> both.

It can be done, but it is a _lot_ of work and requires a lot of shims,
so for something optional it's not really worth it. Given libbpf 1.0
also broke ABI, Ubuntu 22.04 and older cannot really get a new version
as it's incompatible, so this pmd will simply be skipped there. I
think it's fine. 24.04 has a new one.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
  2024-05-20 21:42  3%         ` Luca Boccassi
@ 2024-05-20 22:08  0%           ` Ferruh Yigit
  2024-05-20 22:25  0%             ` Luca Boccassi
  2024-05-20 23:20  0%             ` Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2024-05-20 22:08 UTC (permalink / raw)
  To: Luca Boccassi, Stephen Hemminger
  Cc: Christian Ehrhardt, Patrick Robb, dpdklab, Aaron Conole, dev

On 5/20/2024 10:42 PM, Luca Boccassi wrote:
> On Mon, 20 May 2024 at 19:43, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>>
>> On Mon, 20 May 2024 18:49:19 +0100
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
>>>> There were multiple issues in the RSS queue support in the TAP
>>>> driver. This required extensive rework of the BPF support.
>>>>
>>>> Change the BPF loading to use bpftool to
>>>> create a skeleton header file, and load with libbpf.
>>>> The BPF is always compiled from source so less chance that
>>>> source and instructions diverge. Also resolves issue where
>>>> libbpf and source get out of sync. The program
>>>> is only loaded once, so if multiple rules are created
>>>> only one BPF program is loaded in kernel.
>>>>
>>>> The new BPF program only needs a single action.
>>>> No need for action and re-classification step.
>>>>
>>>> It also fixes the missing bits from the original.
>>>>     - supports setting RSS key per flow
>>>>     - level of hash can be L3 or L3/L4.
>>>>
>>>> Bugzilla ID: 1329
>>>>
>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>
>>>
>>>
>>> The libbpf version in my Ubuntu box, installed with package manager, is
>>> 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
>>> is not compiled for me.
>>>
>>>
>>> @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
>>> there a reason Ubuntu stick to this version? And can we expect an update
>>> soon?
>>>
>>>
>>> @Patric, I assume test environment also doesn't have 'libbpf', version:
>>> '>= 1.0' which we need to test this feature.
>>> Is it possible to update test environment to justify this dependency?
>>>
>>> I think we need to verify at least build (with and without dependency
>>> met) for the set.
>>
>> The BPF API changed a lot, and it is not really possible to support
>> both.
> 
> It can be done, but it is a _lot_ of work and requires a lot of shims,
> so for something optional it's not really worth it. Given libbpf 1.0
> also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> as it's incompatible, so this pmd will simply be skipped there. I
> think it's fine. 24.04 has a new one.
>

Does Ubuntu 24.04 have libbpf >= 1.0 ?

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
  2024-05-20 22:08  0%           ` Ferruh Yigit
@ 2024-05-20 22:25  0%             ` Luca Boccassi
  2024-05-20 23:20  0%             ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2024-05-20 22:25 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Stephen Hemminger, Christian Ehrhardt, Patrick Robb, dpdklab,
	Aaron Conole, dev

On Mon, 20 May 2024 at 23:08, Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 5/20/2024 10:42 PM, Luca Boccassi wrote:
> > On Mon, 20 May 2024 at 19:43, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> >>
> >> On Mon, 20 May 2024 18:49:19 +0100
> >> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >>> On 5/2/2024 10:31 PM, Stephen Hemminger wrote:
> >>>> There were multiple issues in the RSS queue support in the TAP
> >>>> driver. This required extensive rework of the BPF support.
> >>>>
> >>>> Change the BPF loading to use bpftool to
> >>>> create a skeleton header file, and load with libbpf.
> >>>> The BPF is always compiled from source so less chance that
> >>>> source and instructions diverge. Also resolves issue where
> >>>> libbpf and source get out of sync. The program
> >>>> is only loaded once, so if multiple rules are created
> >>>> only one BPF program is loaded in kernel.
> >>>>
> >>>> The new BPF program only needs a single action.
> >>>> No need for action and re-classification step.
> >>>>
> >>>> It also fixes the missing bits from the original.
> >>>>     - supports setting RSS key per flow
> >>>>     - level of hash can be L3 or L3/L4.
> >>>>
> >>>> Bugzilla ID: 1329
> >>>>
> >>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>>
> >>>
> >>>
> >>> The libbpf version in my Ubuntu box, installed with package manager, is
> >>> 'libbpf.so.0.5.0', so it doesn't satisfy the requirement and bpf support
> >>> is not compiled for me.
> >>>
> >>>
> >>> @Christian, 'libbpf.so.0.5.0'seems old, it is from 2021, do you know is
> >>> there a reason Ubuntu stick to this version? And can we expect an update
> >>> soon?
> >>>
> >>>
> >>> @Patric, I assume test environment also doesn't have 'libbpf', version:
> >>> '>= 1.0' which we need to test this feature.
> >>> Is it possible to update test environment to justify this dependency?
> >>>
> >>> I think we need to verify at least build (with and without dependency
> >>> met) for the set.
> >>
> >> The BPF API changed a lot, and it is not really possible to support
> >> both.
> >
> > It can be done, but it is a _lot_ of work and requires a lot of shims,
> > so for something optional it's not really worth it. Given libbpf 1.0
> > also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> > as it's incompatible, so this pmd will simply be skipped there. I
> > think it's fine. 24.04 has a new one.
> >
>
> Does Ubuntu 24.04 have libbpf >= 1.0 ?

Yes:

https://packages.ubuntu.com/search?keywords=libbpf-dev&searchon=names&suite=all&section=all

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v12 07/12] net/tap: use libbpf to load new BPF program
  2024-05-20 22:08  0%           ` Ferruh Yigit
  2024-05-20 22:25  0%             ` Luca Boccassi
@ 2024-05-20 23:20  0%             ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-20 23:20 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Luca Boccassi, Christian Ehrhardt, Patrick Robb, dpdklab,
	Aaron Conole, dev

On Mon, 20 May 2024 23:08:04 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> > 
> > It can be done, but it is a _lot_ of work and requires a lot of shims,
> > so for something optional it's not really worth it. Given libbpf 1.0
> > also broke ABI, Ubuntu 22.04 and older cannot really get a new version
> > as it's incompatible, so this pmd will simply be skipped there. I
> > think it's fine. 24.04 has a new one.
> >  
> 
> Does Ubuntu 24.04 have libbpf >= 1.0 ?

Yes it does.

Tried this on a 24.04 VM, needed to install pkg-config and clang.
But then it builds.

It does have some other fortify warnings (in rte_pcapng.c) but
these are unrelated and exist on main branch as well.

^ permalink raw reply	[relevance 0%]

* [PATCH v13 02/11] net/tap: do not duplicate fd's
  @ 2024-05-21  2:47  2%   ` Stephen Hemminger
  2024-05-21  2:47  2%   ` [PATCH v13 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21  2:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.

MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).

The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/rel_notes/release_24_07.rst |   4 +
 drivers/net/tap/rte_eth_tap.c          | 192 ++++++++++---------------
 drivers/net/tap/rte_eth_tap.h          |   3 +-
 drivers/net/tap/tap_flow.c             |   3 +-
 drivers/net/tap/tap_intr.c             |   7 +-
 5 files changed, 89 insertions(+), 120 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..fa9692924b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Update Tap PMD driver.
+
+  * Updated to support up to 8 queues when used by secondary process.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
 /* Message header to synchronize queues via IPC */
 struct ipc_queues {
 	char port_name[RTE_DEV_NAME_MAX_LEN];
-	int rxq_count;
-	int txq_count;
+	int q_count;
 	/*
 	 * The file descriptors are in the dedicated part
 	 * of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint16_t data_off = rte_pktmbuf_headroom(mbuf);
 		int len;
 
-		len = readv(process_private->rxq_fds[rxq->queue_id],
+		len = readv(process_private->fds[rxq->queue_id],
 			*rxq->iovecs,
 			1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
 			     rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		}
 
 		/* copy the tx frame data */
-		n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+		n = writev(process_private->fds[txq->queue_id], iovecs, k);
 		if (n <= 0)
 			return -1;
 
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	struct rte_mp_msg msg;
 	struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
 	int err;
-	int fd_iterator = 0;
 	struct pmd_process_private *process_private = dev->process_private;
 	int i;
 
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
 	strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
 	msg.len_param = sizeof(*request_param);
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->txq_fds[i];
-		msg.num_fds++;
-		request_param->txq_count++;
-	}
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
-		msg.num_fds++;
-		request_param->rxq_count++;
-	}
+
+	/* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		msg.fds[i] = process_private->fds[i];
+
+	request_param->q_count = dev->data->nb_rx_queues;
+	msg.num_fds = dev->data->nb_rx_queues;
 
 	err = rte_mp_sendmsg(&msg);
 	if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 	struct rte_eth_dev *dev;
 	const struct ipc_queues *request_param =
 		(const struct ipc_queues *)request->param;
-	int fd_iterator;
-	int queue;
 	struct pmd_process_private *process_private;
 
 	dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 			request_param->port_name);
 		return -1;
 	}
+
 	process_private = dev->process_private;
-	fd_iterator = 0;
-	TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
-		request_param->txq_count);
-	for (queue = 0; queue < request_param->txq_count; queue++)
-		process_private->txq_fds[queue] = request->fds[fd_iterator++];
-	for (queue = 0; queue < request_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+	TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+	for (int q = 0; q < request_param->q_count; q++)
+		process_private->fds[q] = request->fds[q];
+
 
 	return 0;
 }
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+	if (process_private->fds[qid] != -1) {
+		close(process_private->fds[qid]);
+		process_private->fds[qid] = -1;
+	}
+}
+
 static int
 tap_dev_close(struct rte_eth_dev *dev)
 {
 	int i;
 	struct pmd_internals *internals = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
-	struct rx_queue *rxq;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
 	}
 
 	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		if (process_private->rxq_fds[i] != -1) {
-			rxq = &internals->rxq[i];
-			close(process_private->rxq_fds[i]);
-			process_private->rxq_fds[i] = -1;
-			tap_rxq_pool_free(rxq->pool);
-			rte_free(rxq->iovecs);
-			rxq->pool = NULL;
-			rxq->iovecs = NULL;
-		}
-		if (process_private->txq_fds[i] != -1) {
-			close(process_private->txq_fds[i]);
-			process_private->txq_fds[i] = -1;
-		}
+		struct rx_queue *rxq = &internals->rxq[i];
+
+		tap_queue_close(process_private, i);
+
+		tap_rxq_pool_free(rxq->pool);
+		rte_free(rxq->iovecs);
+		rxq->pool = NULL;
+		rxq->iovecs = NULL;
 	}
 
 	if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!rxq)
 		return;
+
 	process_private = rte_eth_devices[rxq->in_port].process_private;
-	if (process_private->rxq_fds[rxq->queue_id] != -1) {
-		close(process_private->rxq_fds[rxq->queue_id]);
-		process_private->rxq_fds[rxq->queue_id] = -1;
-		tap_rxq_pool_free(rxq->pool);
-		rte_free(rxq->iovecs);
-		rxq->pool = NULL;
-		rxq->iovecs = NULL;
-	}
+
+	tap_rxq_pool_free(rxq->pool);
+	rte_free(rxq->iovecs);
+	rxq->pool = NULL;
+	rxq->iovecs = NULL;
+
+	if (dev->data->tx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!txq)
 		return;
-	process_private = rte_eth_devices[txq->out_port].process_private;
 
-	if (process_private->txq_fds[txq->queue_id] != -1) {
-		close(process_private->txq_fds[txq->queue_id]);
-		process_private->txq_fds[txq->queue_id] = -1;
-	}
+	process_private = rte_eth_devices[txq->out_port].process_private;
+	if (dev->data->rx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
 		uint16_t qid,
 		int is_rx)
 {
-	int ret;
-	int *fd;
-	int *other_fd;
-	const char *dir;
+	int fd, ret;
 	struct pmd_internals *pmd = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
 	struct rx_queue *rx = &internals->rxq[qid];
 	struct tx_queue *tx = &internals->txq[qid];
-	struct rte_gso_ctx *gso_ctx;
+	struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+	const char *dir = is_rx ? "rx" : "tx";
 
-	if (is_rx) {
-		fd = &process_private->rxq_fds[qid];
-		other_fd = &process_private->txq_fds[qid];
-		dir = "rx";
-		gso_ctx = NULL;
-	} else {
-		fd = &process_private->txq_fds[qid];
-		other_fd = &process_private->rxq_fds[qid];
-		dir = "tx";
-		gso_ctx = &tx->gso_ctx;
-	}
-	if (*fd != -1) {
+	fd = process_private->fds[qid];
+	if (fd != -1) {
 		/* fd for this queue already exists */
 		TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
-			pmd->name, *fd, dir, qid);
+			pmd->name, fd, dir, qid);
 		gso_ctx = NULL;
-	} else if (*other_fd != -1) {
-		/* Only other_fd exists. dup it */
-		*fd = dup(*other_fd);
-		if (*fd < 0) {
-			*fd = -1;
-			TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
-			return -1;
-		}
-		TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
-			pmd->name, *other_fd, dir, qid, *fd);
 	} else {
-		/* Both RX and TX fds do not exist (equal -1). Create fd */
-		*fd = tun_alloc(pmd, 0, 0);
-		if (*fd < 0) {
-			*fd = -1; /* restore original value */
+		fd = tun_alloc(pmd, 0, 0);
+		if (fd < 0) {
 			TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
 			return -1;
 		}
+
 		TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
-			pmd->name, dir, qid, *fd);
+			pmd->name, dir, qid, fd);
+
+		process_private->fds[qid] = fd;
 	}
 
 	tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
 
 	tx->type = pmd->type;
 
-	return *fd;
+	return fd;
 }
 
 static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
 
 	TAP_LOG(DEBUG, "  RX TUNTAP device name %s, qid %d on fd %d",
 		internals->name, rx_queue_id,
-		process_private->rxq_fds[rx_queue_id]);
+		process_private->fds[rx_queue_id]);
 
 	return 0;
 
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
 	TAP_LOG(DEBUG,
 		"  TX TUNTAP device name %s, qid %d on fd %d csum %s",
 		internals->name, tx_queue_id,
-		process_private->txq_fds[tx_queue_id],
+		process_private->fds[tx_queue_id],
 		txq->csum ? "on" : "off");
 
 	return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
 	dev->intr_handle = pmd->intr_handle;
 
 	/* Presetup the fds to -1 as being not valid */
-	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		process_private->rxq_fds[i] = -1;
-		process_private->txq_fds[i] = -1;
-	}
+	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+		process_private->fds[i] = -1;
+
 
 	if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
 		if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	struct ipc_queues *request_param = (struct ipc_queues *)request.param;
 	struct ipc_queues *reply_param;
 	struct pmd_process_private *process_private = dev->process_private;
-	int queue, fd_iterator;
 
 	/* Prepare the request */
 	memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
 
 	/* Attach the queues from received file descriptors */
-	if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+	if (reply_param->q_count != reply->num_fds) {
 		TAP_LOG(ERR, "Unexpected number of fds received");
 		return -1;
 	}
 
-	dev->data->nb_rx_queues = reply_param->rxq_count;
-	dev->data->nb_tx_queues = reply_param->txq_count;
-	fd_iterator = 0;
-	for (queue = 0; queue < reply_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
-	for (queue = 0; queue < reply_param->txq_count; queue++)
-		process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+	dev->data->nb_rx_queues = reply_param->q_count;
+	dev->data->nb_tx_queues = reply_param->q_count;
+
+	for (int q = 0; q < reply_param->q_count; q++)
+		process_private->fds[q] = reply->fds[q];
+
 	free(reply);
 	return 0;
 }
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
 
 	/* Fill file descriptors for all queues */
 	reply.num_fds = 0;
-	reply_param->rxq_count = 0;
-	if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
-			RTE_MP_MAX_FD_NUM){
-		TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+	reply_param->q_count = 0;
+
+	RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+	if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+		TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+			dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
 		return -1;
 	}
 
 	for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
-		reply_param->rxq_count++;
-	}
-	RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
-	reply_param->txq_count = 0;
-	for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
-		reply_param->txq_count++;
+		reply.fds[reply.num_fds++] = process_private->fds[queue];
+		reply_param->q_count++;
 	}
-	RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
 
 	/* Send reply */
 	strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
 };
 
 struct pmd_process_private {
-	int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
-	int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+	int fds[RTE_PMD_TAP_MAX_QUEUES];
 };
 
 /* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	 * If netdevice is there, setup appropriate flow rules immediately.
 	 * Otherwise it will be set when bringing up the netdevice (tun_alloc).
 	 */
-	if (process_private->rxq_fds[0] == -1)
+	if (process_private->fds[0] == -1)
 		return 0;
+
 	if (set) {
 		struct rte_flow *remote_flow;
 
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 	}
 	for (i = 0; i < n; i++) {
 		struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+		int fd = process_private->fds[i];
 
 		/* Skip queues that cannot request interrupts. */
-		if (!rxq || process_private->rxq_fds[i] == -1) {
+		if (!rxq || fd == -1) {
+			/* Use invalid intr_vec[] index to disable entry. */
 			/* Use invalid intr_vec[] index to disable entry. */
 			if (rte_intr_vec_list_index_set(intr_handle, i,
 			RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 		if (rte_intr_vec_list_index_set(intr_handle, i,
 					RTE_INTR_VEC_RXTX_OFFSET + count))
 			return -rte_errno;
-		if (rte_intr_efds_index_set(intr_handle, count,
-						   process_private->rxq_fds[i]))
+		if (rte_intr_efds_index_set(intr_handle, count, fd))
 			return -rte_errno;
 		count++;
 	}
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v13 06/11] net/tap: rewrite the RSS BPF program
    2024-05-21  2:47  2%   ` [PATCH v13 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21  2:47  2%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21  2:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v14 02/11] net/tap: do not duplicate fd's
  @ 2024-05-21 17:06  2%   ` Stephen Hemminger
  2024-05-21 17:06  2%   ` [PATCH v14 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 17:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.

MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).

The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/rel_notes/release_24_07.rst |   4 +
 drivers/net/tap/rte_eth_tap.c          | 192 ++++++++++---------------
 drivers/net/tap/rte_eth_tap.h          |   3 +-
 drivers/net/tap/tap_flow.c             |   3 +-
 drivers/net/tap/tap_intr.c             |   7 +-
 5 files changed, 89 insertions(+), 120 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..fa9692924b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Update Tap PMD driver.
+
+  * Updated to support up to 8 queues when used by secondary process.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
 /* Message header to synchronize queues via IPC */
 struct ipc_queues {
 	char port_name[RTE_DEV_NAME_MAX_LEN];
-	int rxq_count;
-	int txq_count;
+	int q_count;
 	/*
 	 * The file descriptors are in the dedicated part
 	 * of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint16_t data_off = rte_pktmbuf_headroom(mbuf);
 		int len;
 
-		len = readv(process_private->rxq_fds[rxq->queue_id],
+		len = readv(process_private->fds[rxq->queue_id],
 			*rxq->iovecs,
 			1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
 			     rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		}
 
 		/* copy the tx frame data */
-		n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+		n = writev(process_private->fds[txq->queue_id], iovecs, k);
 		if (n <= 0)
 			return -1;
 
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	struct rte_mp_msg msg;
 	struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
 	int err;
-	int fd_iterator = 0;
 	struct pmd_process_private *process_private = dev->process_private;
 	int i;
 
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
 	strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
 	msg.len_param = sizeof(*request_param);
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->txq_fds[i];
-		msg.num_fds++;
-		request_param->txq_count++;
-	}
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
-		msg.num_fds++;
-		request_param->rxq_count++;
-	}
+
+	/* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		msg.fds[i] = process_private->fds[i];
+
+	request_param->q_count = dev->data->nb_rx_queues;
+	msg.num_fds = dev->data->nb_rx_queues;
 
 	err = rte_mp_sendmsg(&msg);
 	if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 	struct rte_eth_dev *dev;
 	const struct ipc_queues *request_param =
 		(const struct ipc_queues *)request->param;
-	int fd_iterator;
-	int queue;
 	struct pmd_process_private *process_private;
 
 	dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 			request_param->port_name);
 		return -1;
 	}
+
 	process_private = dev->process_private;
-	fd_iterator = 0;
-	TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
-		request_param->txq_count);
-	for (queue = 0; queue < request_param->txq_count; queue++)
-		process_private->txq_fds[queue] = request->fds[fd_iterator++];
-	for (queue = 0; queue < request_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+	TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+	for (int q = 0; q < request_param->q_count; q++)
+		process_private->fds[q] = request->fds[q];
+
 
 	return 0;
 }
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+	if (process_private->fds[qid] != -1) {
+		close(process_private->fds[qid]);
+		process_private->fds[qid] = -1;
+	}
+}
+
 static int
 tap_dev_close(struct rte_eth_dev *dev)
 {
 	int i;
 	struct pmd_internals *internals = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
-	struct rx_queue *rxq;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
 	}
 
 	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		if (process_private->rxq_fds[i] != -1) {
-			rxq = &internals->rxq[i];
-			close(process_private->rxq_fds[i]);
-			process_private->rxq_fds[i] = -1;
-			tap_rxq_pool_free(rxq->pool);
-			rte_free(rxq->iovecs);
-			rxq->pool = NULL;
-			rxq->iovecs = NULL;
-		}
-		if (process_private->txq_fds[i] != -1) {
-			close(process_private->txq_fds[i]);
-			process_private->txq_fds[i] = -1;
-		}
+		struct rx_queue *rxq = &internals->rxq[i];
+
+		tap_queue_close(process_private, i);
+
+		tap_rxq_pool_free(rxq->pool);
+		rte_free(rxq->iovecs);
+		rxq->pool = NULL;
+		rxq->iovecs = NULL;
 	}
 
 	if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!rxq)
 		return;
+
 	process_private = rte_eth_devices[rxq->in_port].process_private;
-	if (process_private->rxq_fds[rxq->queue_id] != -1) {
-		close(process_private->rxq_fds[rxq->queue_id]);
-		process_private->rxq_fds[rxq->queue_id] = -1;
-		tap_rxq_pool_free(rxq->pool);
-		rte_free(rxq->iovecs);
-		rxq->pool = NULL;
-		rxq->iovecs = NULL;
-	}
+
+	tap_rxq_pool_free(rxq->pool);
+	rte_free(rxq->iovecs);
+	rxq->pool = NULL;
+	rxq->iovecs = NULL;
+
+	if (dev->data->tx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!txq)
 		return;
-	process_private = rte_eth_devices[txq->out_port].process_private;
 
-	if (process_private->txq_fds[txq->queue_id] != -1) {
-		close(process_private->txq_fds[txq->queue_id]);
-		process_private->txq_fds[txq->queue_id] = -1;
-	}
+	process_private = rte_eth_devices[txq->out_port].process_private;
+	if (dev->data->rx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
 		uint16_t qid,
 		int is_rx)
 {
-	int ret;
-	int *fd;
-	int *other_fd;
-	const char *dir;
+	int fd, ret;
 	struct pmd_internals *pmd = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
 	struct rx_queue *rx = &internals->rxq[qid];
 	struct tx_queue *tx = &internals->txq[qid];
-	struct rte_gso_ctx *gso_ctx;
+	struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+	const char *dir = is_rx ? "rx" : "tx";
 
-	if (is_rx) {
-		fd = &process_private->rxq_fds[qid];
-		other_fd = &process_private->txq_fds[qid];
-		dir = "rx";
-		gso_ctx = NULL;
-	} else {
-		fd = &process_private->txq_fds[qid];
-		other_fd = &process_private->rxq_fds[qid];
-		dir = "tx";
-		gso_ctx = &tx->gso_ctx;
-	}
-	if (*fd != -1) {
+	fd = process_private->fds[qid];
+	if (fd != -1) {
 		/* fd for this queue already exists */
 		TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
-			pmd->name, *fd, dir, qid);
+			pmd->name, fd, dir, qid);
 		gso_ctx = NULL;
-	} else if (*other_fd != -1) {
-		/* Only other_fd exists. dup it */
-		*fd = dup(*other_fd);
-		if (*fd < 0) {
-			*fd = -1;
-			TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
-			return -1;
-		}
-		TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
-			pmd->name, *other_fd, dir, qid, *fd);
 	} else {
-		/* Both RX and TX fds do not exist (equal -1). Create fd */
-		*fd = tun_alloc(pmd, 0, 0);
-		if (*fd < 0) {
-			*fd = -1; /* restore original value */
+		fd = tun_alloc(pmd, 0, 0);
+		if (fd < 0) {
 			TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
 			return -1;
 		}
+
 		TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
-			pmd->name, dir, qid, *fd);
+			pmd->name, dir, qid, fd);
+
+		process_private->fds[qid] = fd;
 	}
 
 	tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
 
 	tx->type = pmd->type;
 
-	return *fd;
+	return fd;
 }
 
 static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
 
 	TAP_LOG(DEBUG, "  RX TUNTAP device name %s, qid %d on fd %d",
 		internals->name, rx_queue_id,
-		process_private->rxq_fds[rx_queue_id]);
+		process_private->fds[rx_queue_id]);
 
 	return 0;
 
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
 	TAP_LOG(DEBUG,
 		"  TX TUNTAP device name %s, qid %d on fd %d csum %s",
 		internals->name, tx_queue_id,
-		process_private->txq_fds[tx_queue_id],
+		process_private->fds[tx_queue_id],
 		txq->csum ? "on" : "off");
 
 	return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
 	dev->intr_handle = pmd->intr_handle;
 
 	/* Presetup the fds to -1 as being not valid */
-	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		process_private->rxq_fds[i] = -1;
-		process_private->txq_fds[i] = -1;
-	}
+	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+		process_private->fds[i] = -1;
+
 
 	if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
 		if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	struct ipc_queues *request_param = (struct ipc_queues *)request.param;
 	struct ipc_queues *reply_param;
 	struct pmd_process_private *process_private = dev->process_private;
-	int queue, fd_iterator;
 
 	/* Prepare the request */
 	memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
 
 	/* Attach the queues from received file descriptors */
-	if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+	if (reply_param->q_count != reply->num_fds) {
 		TAP_LOG(ERR, "Unexpected number of fds received");
 		return -1;
 	}
 
-	dev->data->nb_rx_queues = reply_param->rxq_count;
-	dev->data->nb_tx_queues = reply_param->txq_count;
-	fd_iterator = 0;
-	for (queue = 0; queue < reply_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
-	for (queue = 0; queue < reply_param->txq_count; queue++)
-		process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+	dev->data->nb_rx_queues = reply_param->q_count;
+	dev->data->nb_tx_queues = reply_param->q_count;
+
+	for (int q = 0; q < reply_param->q_count; q++)
+		process_private->fds[q] = reply->fds[q];
+
 	free(reply);
 	return 0;
 }
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
 
 	/* Fill file descriptors for all queues */
 	reply.num_fds = 0;
-	reply_param->rxq_count = 0;
-	if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
-			RTE_MP_MAX_FD_NUM){
-		TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+	reply_param->q_count = 0;
+
+	RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+	if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+		TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+			dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
 		return -1;
 	}
 
 	for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
-		reply_param->rxq_count++;
-	}
-	RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
-	reply_param->txq_count = 0;
-	for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
-		reply_param->txq_count++;
+		reply.fds[reply.num_fds++] = process_private->fds[queue];
+		reply_param->q_count++;
 	}
-	RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
 
 	/* Send reply */
 	strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
 };
 
 struct pmd_process_private {
-	int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
-	int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+	int fds[RTE_PMD_TAP_MAX_QUEUES];
 };
 
 /* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	 * If netdevice is there, setup appropriate flow rules immediately.
 	 * Otherwise it will be set when bringing up the netdevice (tun_alloc).
 	 */
-	if (process_private->rxq_fds[0] == -1)
+	if (process_private->fds[0] == -1)
 		return 0;
+
 	if (set) {
 		struct rte_flow *remote_flow;
 
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 	}
 	for (i = 0; i < n; i++) {
 		struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+		int fd = process_private->fds[i];
 
 		/* Skip queues that cannot request interrupts. */
-		if (!rxq || process_private->rxq_fds[i] == -1) {
+		if (!rxq || fd == -1) {
+			/* Use invalid intr_vec[] index to disable entry. */
 			/* Use invalid intr_vec[] index to disable entry. */
 			if (rte_intr_vec_list_index_set(intr_handle, i,
 			RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 		if (rte_intr_vec_list_index_set(intr_handle, i,
 					RTE_INTR_VEC_RXTX_OFFSET + count))
 			return -rte_errno;
-		if (rte_intr_efds_index_set(intr_handle, count,
-						   process_private->rxq_fds[i]))
+		if (rte_intr_efds_index_set(intr_handle, count, fd))
 			return -rte_errno;
 		count++;
 	}
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v14 06/11] net/tap: rewrite the RSS BPF program
    2024-05-21 17:06  2%   ` [PATCH v14 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21 17:06  2%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 17:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v15 02/11] net/tap: do not duplicate fd's
  @ 2024-05-21 20:12  2%   ` Stephen Hemminger
  2024-05-21 20:12  2%   ` [PATCH v15 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 20:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The TAP device can use same file descriptor for both rx and tx queues
which reduces the number of fd's required.

MP process support passes file descriptors from primary
to secondary process; but because of the restriction on
max fd's passed RTE_MP_MAX_FD_NUM (8) the TAP device was restricted
to only 4 queues if using secondary.
This allows up to 8 queues (versus 4).

The restriction on max fd's should be changed in eal in
future, but it will break ABI compatibility.
The max Linux supports which is SCM_MAX_FD (253).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/rel_notes/release_24_07.rst |   4 +
 drivers/net/tap/rte_eth_tap.c          | 192 ++++++++++---------------
 drivers/net/tap/rte_eth_tap.h          |   3 +-
 drivers/net/tap/tap_flow.c             |   3 +-
 drivers/net/tap/tap_intr.c             |   7 +-
 5 files changed, 89 insertions(+), 120 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..a6295359b1 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Update Tap PMD driver.**
+
+  * Updated to support up to 8 queues when used by secondary process.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695b..b84fc01856 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
 /* Message header to synchronize queues via IPC */
 struct ipc_queues {
 	char port_name[RTE_DEV_NAME_MAX_LEN];
-	int rxq_count;
-	int txq_count;
+	int q_count;
 	/*
 	 * The file descriptors are in the dedicated part
 	 * of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint16_t data_off = rte_pktmbuf_headroom(mbuf);
 		int len;
 
-		len = readv(process_private->rxq_fds[rxq->queue_id],
+		len = readv(process_private->fds[rxq->queue_id],
 			*rxq->iovecs,
 			1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
 			     rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		}
 
 		/* copy the tx frame data */
-		n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+		n = writev(process_private->fds[txq->queue_id], iovecs, k);
 		if (n <= 0)
 			return -1;
 
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	struct rte_mp_msg msg;
 	struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
 	int err;
-	int fd_iterator = 0;
 	struct pmd_process_private *process_private = dev->process_private;
 	int i;
 
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
 	strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
 	msg.len_param = sizeof(*request_param);
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->txq_fds[i];
-		msg.num_fds++;
-		request_param->txq_count++;
-	}
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
-		msg.num_fds++;
-		request_param->rxq_count++;
-	}
+
+	/* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		msg.fds[i] = process_private->fds[i];
+
+	request_param->q_count = dev->data->nb_rx_queues;
+	msg.num_fds = dev->data->nb_rx_queues;
 
 	err = rte_mp_sendmsg(&msg);
 	if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 	struct rte_eth_dev *dev;
 	const struct ipc_queues *request_param =
 		(const struct ipc_queues *)request->param;
-	int fd_iterator;
-	int queue;
 	struct pmd_process_private *process_private;
 
 	dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 			request_param->port_name);
 		return -1;
 	}
+
 	process_private = dev->process_private;
-	fd_iterator = 0;
-	TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
-		request_param->txq_count);
-	for (queue = 0; queue < request_param->txq_count; queue++)
-		process_private->txq_fds[queue] = request->fds[fd_iterator++];
-	for (queue = 0; queue < request_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+	TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+	for (int q = 0; q < request_param->q_count; q++)
+		process_private->fds[q] = request->fds[q];
+
 
 	return 0;
 }
@@ -1115,13 +1107,21 @@ tap_stats_reset(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+	if (process_private->fds[qid] != -1) {
+		close(process_private->fds[qid]);
+		process_private->fds[qid] = -1;
+	}
+}
+
 static int
 tap_dev_close(struct rte_eth_dev *dev)
 {
 	int i;
 	struct pmd_internals *internals = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
-	struct rx_queue *rxq;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		rte_free(dev->process_private);
@@ -1141,19 +1141,14 @@ tap_dev_close(struct rte_eth_dev *dev)
 	}
 
 	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		if (process_private->rxq_fds[i] != -1) {
-			rxq = &internals->rxq[i];
-			close(process_private->rxq_fds[i]);
-			process_private->rxq_fds[i] = -1;
-			tap_rxq_pool_free(rxq->pool);
-			rte_free(rxq->iovecs);
-			rxq->pool = NULL;
-			rxq->iovecs = NULL;
-		}
-		if (process_private->txq_fds[i] != -1) {
-			close(process_private->txq_fds[i]);
-			process_private->txq_fds[i] = -1;
-		}
+		struct rx_queue *rxq = &internals->rxq[i];
+
+		tap_queue_close(process_private, i);
+
+		tap_rxq_pool_free(rxq->pool);
+		rte_free(rxq->iovecs);
+		rxq->pool = NULL;
+		rxq->iovecs = NULL;
 	}
 
 	if (internals->remote_if_index) {
@@ -1206,15 +1201,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!rxq)
 		return;
+
 	process_private = rte_eth_devices[rxq->in_port].process_private;
-	if (process_private->rxq_fds[rxq->queue_id] != -1) {
-		close(process_private->rxq_fds[rxq->queue_id]);
-		process_private->rxq_fds[rxq->queue_id] = -1;
-		tap_rxq_pool_free(rxq->pool);
-		rte_free(rxq->iovecs);
-		rxq->pool = NULL;
-		rxq->iovecs = NULL;
-	}
+
+	tap_rxq_pool_free(rxq->pool);
+	rte_free(rxq->iovecs);
+	rxq->pool = NULL;
+	rxq->iovecs = NULL;
+
+	if (dev->data->tx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static void
@@ -1225,12 +1221,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!txq)
 		return;
-	process_private = rte_eth_devices[txq->out_port].process_private;
 
-	if (process_private->txq_fds[txq->queue_id] != -1) {
-		close(process_private->txq_fds[txq->queue_id]);
-		process_private->txq_fds[txq->queue_id] = -1;
-	}
+	process_private = rte_eth_devices[txq->out_port].process_private;
+	if (dev->data->rx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static int
@@ -1482,52 +1476,31 @@ tap_setup_queue(struct rte_eth_dev *dev,
 		uint16_t qid,
 		int is_rx)
 {
-	int ret;
-	int *fd;
-	int *other_fd;
-	const char *dir;
+	int fd, ret;
 	struct pmd_internals *pmd = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
 	struct rx_queue *rx = &internals->rxq[qid];
 	struct tx_queue *tx = &internals->txq[qid];
-	struct rte_gso_ctx *gso_ctx;
+	struct rte_gso_ctx *gso_ctx = is_rx ? NULL : &tx->gso_ctx;
+	const char *dir = is_rx ? "rx" : "tx";
 
-	if (is_rx) {
-		fd = &process_private->rxq_fds[qid];
-		other_fd = &process_private->txq_fds[qid];
-		dir = "rx";
-		gso_ctx = NULL;
-	} else {
-		fd = &process_private->txq_fds[qid];
-		other_fd = &process_private->rxq_fds[qid];
-		dir = "tx";
-		gso_ctx = &tx->gso_ctx;
-	}
-	if (*fd != -1) {
+	fd = process_private->fds[qid];
+	if (fd != -1) {
 		/* fd for this queue already exists */
 		TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
-			pmd->name, *fd, dir, qid);
+			pmd->name, fd, dir, qid);
 		gso_ctx = NULL;
-	} else if (*other_fd != -1) {
-		/* Only other_fd exists. dup it */
-		*fd = dup(*other_fd);
-		if (*fd < 0) {
-			*fd = -1;
-			TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
-			return -1;
-		}
-		TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
-			pmd->name, *other_fd, dir, qid, *fd);
 	} else {
-		/* Both RX and TX fds do not exist (equal -1). Create fd */
-		*fd = tun_alloc(pmd, 0, 0);
-		if (*fd < 0) {
-			*fd = -1; /* restore original value */
+		fd = tun_alloc(pmd, 0, 0);
+		if (fd < 0) {
 			TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
 			return -1;
 		}
+
 		TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
-			pmd->name, dir, qid, *fd);
+			pmd->name, dir, qid, fd);
+
+		process_private->fds[qid] = fd;
 	}
 
 	tx->mtu = &dev->data->mtu;
@@ -1540,7 +1513,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
 
 	tx->type = pmd->type;
 
-	return *fd;
+	return fd;
 }
 
 static int
@@ -1620,7 +1593,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
 
 	TAP_LOG(DEBUG, "  RX TUNTAP device name %s, qid %d on fd %d",
 		internals->name, rx_queue_id,
-		process_private->rxq_fds[rx_queue_id]);
+		process_private->fds[rx_queue_id]);
 
 	return 0;
 
@@ -1664,7 +1637,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
 	TAP_LOG(DEBUG,
 		"  TX TUNTAP device name %s, qid %d on fd %d csum %s",
 		internals->name, tx_queue_id,
-		process_private->txq_fds[tx_queue_id],
+		process_private->fds[tx_queue_id],
 		txq->csum ? "on" : "off");
 
 	return 0;
@@ -2001,10 +1974,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
 	dev->intr_handle = pmd->intr_handle;
 
 	/* Presetup the fds to -1 as being not valid */
-	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		process_private->rxq_fds[i] = -1;
-		process_private->txq_fds[i] = -1;
-	}
+	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+		process_private->fds[i] = -1;
+
 
 	if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
 		if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2304,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	struct ipc_queues *request_param = (struct ipc_queues *)request.param;
 	struct ipc_queues *reply_param;
 	struct pmd_process_private *process_private = dev->process_private;
-	int queue, fd_iterator;
 
 	/* Prepare the request */
 	memset(&request, 0, sizeof(request));
@@ -2352,18 +2323,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
 
 	/* Attach the queues from received file descriptors */
-	if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+	if (reply_param->q_count != reply->num_fds) {
 		TAP_LOG(ERR, "Unexpected number of fds received");
 		return -1;
 	}
 
-	dev->data->nb_rx_queues = reply_param->rxq_count;
-	dev->data->nb_tx_queues = reply_param->txq_count;
-	fd_iterator = 0;
-	for (queue = 0; queue < reply_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
-	for (queue = 0; queue < reply_param->txq_count; queue++)
-		process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+	dev->data->nb_rx_queues = reply_param->q_count;
+	dev->data->nb_tx_queues = reply_param->q_count;
+
+	for (int q = 0; q < reply_param->q_count; q++)
+		process_private->fds[q] = reply->fds[q];
+
 	free(reply);
 	return 0;
 }
@@ -2393,25 +2363,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
 
 	/* Fill file descriptors for all queues */
 	reply.num_fds = 0;
-	reply_param->rxq_count = 0;
-	if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
-			RTE_MP_MAX_FD_NUM){
-		TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+	reply_param->q_count = 0;
+
+	RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+	if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+		TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+			dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
 		return -1;
 	}
 
 	for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
-		reply_param->rxq_count++;
-	}
-	RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
-	reply_param->txq_count = 0;
-	for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
-		reply_param->txq_count++;
+		reply.fds[reply.num_fds++] = process_private->fds[queue];
+		reply_param->q_count++;
 	}
-	RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
 
 	/* Send reply */
 	strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e9..dc8201020b 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
 };
 
 struct pmd_process_private {
-	int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
-	int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+	int fds[RTE_PMD_TAP_MAX_QUEUES];
 };
 
 /* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 79cd6a12ca..a78fd50cd4 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	 * If netdevice is there, setup appropriate flow rules immediately.
 	 * Otherwise it will be set when bringing up the netdevice (tun_alloc).
 	 */
-	if (process_private->rxq_fds[0] == -1)
+	if (process_private->fds[0] == -1)
 		return 0;
+
 	if (set) {
 		struct rte_flow *remote_flow;
 
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a..1908f71f97 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,9 +68,11 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 	}
 	for (i = 0; i < n; i++) {
 		struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+		int fd = process_private->fds[i];
 
 		/* Skip queues that cannot request interrupts. */
-		if (!rxq || process_private->rxq_fds[i] == -1) {
+		if (!rxq || fd == -1) {
+			/* Use invalid intr_vec[] index to disable entry. */
 			/* Use invalid intr_vec[] index to disable entry. */
 			if (rte_intr_vec_list_index_set(intr_handle, i,
 			RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
@@ -80,8 +82,7 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 		if (rte_intr_vec_list_index_set(intr_handle, i,
 					RTE_INTR_VEC_RXTX_OFFSET + count))
 			return -rte_errno;
-		if (rte_intr_efds_index_set(intr_handle, count,
-						   process_private->rxq_fds[i]))
+		if (rte_intr_efds_index_set(intr_handle, count, fd))
 			return -rte_errno;
 		count++;
 	}
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v15 06/11] net/tap: rewrite the RSS BPF program
    2024-05-21 20:12  2%   ` [PATCH v15 02/11] net/tap: do not duplicate fd's Stephen Hemminger
@ 2024-05-21 20:12  2%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-21 20:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
  2024-04-12 11:57  3% ` [PATCH v2 " Akhil Goyal
@ 2024-05-29 10:43  0%   ` Anoob Joseph
  2024-05-30  9:19  0%     ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Anoob Joseph @ 2024-05-29 10:43 UTC (permalink / raw)
  To: Akhil Goyal, dev
  Cc: thomas, david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, g.singh, fanzhang.oss,
	jianjay.zhou, asomalap, ruifeng.wang, konstantin.v.ananyev,
	radu.nicolau, ajit.khaparde, Nagadheeraj Rottela, ciara.power,
	Akhil Goyal

> 
> Added a new fast path API to get the number of used crypto device queue pair
> depth at any given point.
> 
> An implementation in cnxk crypto driver is also added along with a test case in
> test app.
> 
> The addition of new API causes an ABI warning.
> This is suppressed as the updated struct rte_crypto_fp_ops is an internal
> structure and not to be used by application directly.
> 

Series Acked-by: Anoob Joseph <anoobj@marvell.com>



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5] graph: expose node context as pointers
  @ 2024-05-29 17:54  0%   ` Nithin Dabilpuram
  2024-06-18 12:33  4%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Nithin Dabilpuram @ 2024-05-29 17:54 UTC (permalink / raw)
  To: Robin Jarry
  Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
	Tyler Retzlaff

Acked-by: Nithin Dabilpuram <ndabilpuram@marvell.com>

On Wed, Mar 27, 2024 at 2:47 PM Robin Jarry <rjarry@redhat.com> wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. And without
> intermediate structures, casting to opaque pointers is hard without
> violating strict aliasing rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
> preserve existing alignment on architectures where cache lines are 128
> bytes.
>
> Add a static assert to ensure that the unnamed union is not larger than
> the context array (RTE_NODE_CTX_SZ).
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
>
> Notes:
>     v5:
>
>     * Helper functions to hide casting proved to be harder than expected.
>       Naive casting may even be impossible without breaking strict aliasing
>       rules. The only other option would be to use explicit memcpy calls.
>     * Unnamed union tentative again. As suggested by Tyler (thank you!),
>       using an intermediate unnamed struct to carry the alignment produces
>       consistent ABI in C and C++.
>     * Also, Tyler (thank you!) suggested that the fast path area alignment
>       size may be incorrect for architectures where the cache line is not 64
>       bytes. There will be a 64 bytes hole in the structure at the end of
>       the unnamed struct before the zero length next nodes array. Use
>       __rte_cache_min_aligned to preserve existing alignment.
>
>     v4:
>
>     * Replaced the unnamed union with helper inline functions.
>
>     v3:
>
>     * Added __extension__ to the unnamed struct inside the union.
>     * Fixed C++ header checks.
>     * Replaced alignas() with an explicit static_assert.
>
>  lib/graph/rte_graph_worker_common.h | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..84d4997bbbf6 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -12,7 +12,9 @@
>   * process, enqueue and move streams of objects to the next nodes.
>   */
>
> +#include <assert.h>
>  #include <stdalign.h>
> +#include <stddef.h>
>
>  #include <rte_common.h>
>  #include <rte_cycles.h>
> @@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
>                 } dispatch;
>         };
>         /* Fast path area  */
> +       __extension__ struct __rte_cache_min_aligned {
>  #define RTE_NODE_CTX_SZ 16
> -       alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
> -       uint16_t size;          /**< Total number of objects available. */
> -       uint16_t idx;           /**< Number of objects used. */
> -       rte_graph_off_t off;    /**< Offset of node in the graph reel. */
> -       uint64_t total_cycles;  /**< Cycles spent in this node. */
> -       uint64_t total_calls;   /**< Calls done to this node. */
> -       uint64_t total_objs;    /**< Objects processed by this node. */
> +               union {
> +                       uint8_t ctx[RTE_NODE_CTX_SZ];
> +                       __extension__ struct {
> +                               void *ctx_ptr;
> +                               void *ctx_ptr2;
> +                       };
> +               }; /**< Node Context. */
> +               uint16_t size;          /**< Total number of objects available. */
> +               uint16_t idx;           /**< Number of objects used. */
> +               rte_graph_off_t off;    /**< Offset of node in the graph reel. */
> +               uint64_t total_cycles;  /**< Cycles spent in this node. */
> +               uint64_t total_calls;   /**< Calls done to this node. */
> +               uint64_t total_objs;    /**< Objects processed by this node. */
>                 union {
>                         void **objs;       /**< Array of object pointers. */
>                         uint64_t objs_u64;
> @@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
>                         rte_node_process_t process; /**< Process function. */
>                         uint64_t process_u64;
>                 };
> +       };
>         alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
>  };
>
> +static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, ctx) == RTE_NODE_CTX_SZ,
> +       "rte_node context must be RTE_NODE_CTX_SZ bytes exactly");
> +
>  /**
>   * @internal
>   *
> --
> 2.44.0
>

^ permalink raw reply	[relevance 0%]

* [PATCH v10 01/20] mbuf: replace term sanity check
  @ 2024-05-29 23:33  2% ` Stephen Hemminger
    1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-29 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Andrew Rybchenko, Morten Brørup

Replace rte_mbuf_sanity_check() with rte_mbuf_verify()
to match the similar macro RTE_VERIFY() in rte_debug.h

The term sanity check is on the Tier 2 list of words
that should be replaced.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_mbuf.c                 | 28 +++++------
 doc/guides/prog_guide/mbuf_lib.rst   |  4 +-
 doc/guides/rel_notes/deprecation.rst |  3 ++
 drivers/net/avp/avp_ethdev.c         | 18 +++----
 drivers/net/sfc/sfc_ef100_rx.c       |  6 +--
 drivers/net/sfc/sfc_ef10_essb_rx.c   |  4 +-
 drivers/net/sfc/sfc_ef10_rx.c        |  4 +-
 drivers/net/sfc/sfc_rx.c             |  2 +-
 examples/ipv4_multicast/main.c       |  2 +-
 lib/mbuf/rte_mbuf.c                  | 23 +++++----
 lib/mbuf/rte_mbuf.h                  | 71 +++++++++++++++-------------
 lib/mbuf/version.map                 |  1 +
 12 files changed, 90 insertions(+), 76 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 17be977f31..3fbb5dea8b 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -262,8 +262,8 @@ test_one_pktmbuf(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("Buffer should be continuous");
 	memset(hdr, 0x55, MBUF_TEST_HDR2_LEN);
 
-	rte_mbuf_sanity_check(m, 1);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 1);
+	rte_mbuf_verify(m, 0);
 	rte_pktmbuf_dump(stdout, m, 0);
 
 	/* this prepend should fail */
@@ -1162,7 +1162,7 @@ test_refcnt_mbuf(void)
 
 #ifdef RTE_EXEC_ENV_WINDOWS
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	RTE_SET_USED(pktmbuf_pool);
 	return TEST_SKIPPED;
@@ -1181,12 +1181,12 @@ mbuf_check_pass(struct rte_mbuf *buf)
 }
 
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	struct rte_mbuf *buf;
 	struct rte_mbuf badbuf;
 
-	printf("Checking rte_mbuf_sanity_check for failure conditions\n");
+	printf("Checking rte_mbuf_verify for failure conditions\n");
 
 	/* get a good mbuf to use to make copies */
 	buf = rte_pktmbuf_alloc(pktmbuf_pool);
@@ -1708,7 +1708,7 @@ test_mbuf_validate_tx_offload(const char *test_name,
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 	m->ol_flags = ol_flags;
 	m->tso_segsz = segsize;
 	ret = rte_validate_tx_offload(m);
@@ -1915,7 +1915,7 @@ test_pktmbuf_read(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	data = rte_pktmbuf_append(m, MBUF_TEST_DATA_LEN2);
 	if (data == NULL)
@@ -1964,7 +1964,7 @@ test_pktmbuf_read_from_offset(struct rte_mempool *pktmbuf_pool)
 
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	/* prepend an ethernet header */
 	hdr = (struct ether_hdr *)rte_pktmbuf_prepend(m, hdr_len);
@@ -2109,7 +2109,7 @@ create_packet(struct rte_mempool *pktmbuf_pool,
 			GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 		if (rte_pktmbuf_pkt_len(pkt_seg) != 0)
 			GOTO_FAIL("%s: Bad packet length\n", __func__);
-		rte_mbuf_sanity_check(pkt_seg, 0);
+		rte_mbuf_verify(pkt_seg, 0);
 		/* Add header only for the first segment */
 		if (test_data->flags == MBUF_HEADER && seg == 0) {
 			hdr_len = sizeof(struct rte_ether_hdr);
@@ -2321,7 +2321,7 @@ test_pktmbuf_ext_shinfo_init_helper(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	ext_buf_addr = rte_malloc("External buffer", buf_len,
 			RTE_CACHE_LINE_SIZE);
@@ -2482,8 +2482,8 @@ test_pktmbuf_ext_pinned_buffer(struct rte_mempool *std_pool)
 		GOTO_FAIL("%s: test_pktmbuf_copy(pinned) failed\n",
 			  __func__);
 
-	if (test_failing_mbuf_sanity_check(pinned_pool) < 0)
-		GOTO_FAIL("%s: test_failing_mbuf_sanity_check(pinned)"
+	if (test_failing_mbuf_verify(pinned_pool) < 0)
+		GOTO_FAIL("%s: test_failing_mbuf_verify(pinned)"
 			  " failed\n", __func__);
 
 	if (test_mbuf_linearize_check(pinned_pool) < 0)
@@ -2857,8 +2857,8 @@ test_mbuf(void)
 		goto err;
 	}
 
-	if (test_failing_mbuf_sanity_check(pktmbuf_pool) < 0) {
-		printf("test_failing_mbuf_sanity_check() failed\n");
+	if (test_failing_mbuf_verify(pktmbuf_pool) < 0) {
+		printf("test_failing_mbuf_verify() failed\n");
 		goto err;
 	}
 
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 049357c755..0accb51a98 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -266,8 +266,8 @@ can be found in several of the sample applications, for example, the IPv4 Multic
 Debug
 -----
 
-In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption,
-bad type, and so on).
+In debug mode, the functions of the mbuf library perform consistency checks
+before any operation (such as, buffer corruption, bad type, and so on).
 
 Use Cases
 ---------
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..6b4a3102ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,6 @@ Deprecation Notices
   will be deprecated and subsequently removed in DPDK 24.11 release.
   Before this, the new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status.
+
+* mbuf: The function ``rte_mbuf_sanity_check`` is deprecated.
+  Use the new function ``rte_mbuf_verify`` instead.
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index 6733462c86..bafc08fd60 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -1231,7 +1231,7 @@ _avp_mac_filter(struct avp_dev *avp, struct rte_mbuf *m)
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 static inline void
-__avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
+__avp_dev_buffer_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 {
 	struct rte_avp_desc *first_buf;
 	struct rte_avp_desc *pkt_buf;
@@ -1272,12 +1272,12 @@ __avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 			  first_buf->pkt_len, pkt_len);
 }
 
-#define avp_dev_buffer_sanity_check(a, b) \
-	__avp_dev_buffer_sanity_check((a), (b))
+#define avp_dev_buffer_check(a, b) \
+	__avp_dev_buffer_check((a), (b))
 
 #else /* RTE_LIBRTE_AVP_DEBUG_BUFFERS */
 
-#define avp_dev_buffer_sanity_check(a, b) do {} while (0)
+#define avp_dev_buffer_check(a, b) do {} while (0)
 
 #endif
 
@@ -1302,7 +1302,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	void *pkt_data;
 	unsigned int i;
 
-	avp_dev_buffer_sanity_check(avp, buf);
+	avp_dev_buffer_check(avp, buf);
 
 	/* setup the first source buffer */
 	pkt_buf = avp_dev_translate_buffer(avp, buf);
@@ -1370,7 +1370,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	rte_pktmbuf_pkt_len(m) = total_length;
 	m->vlan_tci = vlan_tci;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	return m;
 }
@@ -1614,7 +1614,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 	char *pkt_data;
 	unsigned int i;
 
-	__rte_mbuf_sanity_check(mbuf, 1);
+	__rte_mbuf_verify(mbuf, 1);
 
 	m = mbuf;
 	src_offset = 0;
@@ -1680,7 +1680,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 		first_buf->vlan_tci = mbuf->vlan_tci;
 	}
 
-	avp_dev_buffer_sanity_check(avp, buffers[0]);
+	avp_dev_buffer_check(avp, buffers[0]);
 
 	return total_length;
 }
@@ -1798,7 +1798,7 @@ avp_xmit_scattered_pkts(void *tx_queue,
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 	for (i = 0; i < nb_pkts; i++)
-		avp_dev_buffer_sanity_check(avp, tx_bufs[i]);
+		avp_dev_buffer_check(avp, tx_bufs[i]);
 #endif
 
 	/* send the packets */
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index e283879e6b..5ebfba4dcf 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -179,7 +179,7 @@ sfc_ef100_rx_qrefill(struct sfc_ef100_rxq *rxq)
 			struct sfc_ef100_rx_sw_desc *rxd;
 			rte_iova_t dma_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			dma_addr = rte_mbuf_data_iova_default(m);
 			if (rxq->flags & SFC_EF100_RXQ_NIC_DMA_MAP) {
@@ -551,7 +551,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 		rxq->ready_pkts--;
 
 		pkt = sfc_ef100_rx_next_mbuf(rxq);
-		__rte_mbuf_raw_sanity_check(pkt);
+		__rte_mbuf_raw_verify(pkt);
 
 		RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data[0]) !=
 				 sizeof(rxq->rearm_data));
@@ -575,7 +575,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 			struct rte_mbuf *seg;
 
 			seg = sfc_ef100_rx_next_mbuf(rxq);
-			__rte_mbuf_raw_sanity_check(seg);
+			__rte_mbuf_raw_verify(seg);
 
 			seg->data_off = RTE_PKTMBUF_HEADROOM;
 
diff --git a/drivers/net/sfc/sfc_ef10_essb_rx.c b/drivers/net/sfc/sfc_ef10_essb_rx.c
index 78bd430363..74647e2792 100644
--- a/drivers/net/sfc/sfc_ef10_essb_rx.c
+++ b/drivers/net/sfc/sfc_ef10_essb_rx.c
@@ -125,7 +125,7 @@ sfc_ef10_essb_next_mbuf(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -136,7 +136,7 @@ sfc_ef10_essb_mbuf_by_index(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + idx * rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
diff --git a/drivers/net/sfc/sfc_ef10_rx.c b/drivers/net/sfc/sfc_ef10_rx.c
index 60442930b3..f4fc815570 100644
--- a/drivers/net/sfc/sfc_ef10_rx.c
+++ b/drivers/net/sfc/sfc_ef10_rx.c
@@ -148,7 +148,7 @@ sfc_ef10_rx_qrefill(struct sfc_ef10_rxq *rxq)
 			struct sfc_ef10_rx_sw_desc *rxd;
 			rte_iova_t phys_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			SFC_ASSERT((id & ~ptr_mask) == 0);
 			rxd = &rxq->sw_ring[id];
@@ -297,7 +297,7 @@ sfc_ef10_rx_process_event(struct sfc_ef10_rxq *rxq, efx_qword_t rx_ev,
 		rxd = &rxq->sw_ring[pending++ & ptr_mask];
 		m = rxd->mbuf;
 
-		__rte_mbuf_raw_sanity_check(m);
+		__rte_mbuf_raw_verify(m);
 
 		m->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_pktmbuf_data_len(m) = seg_len;
diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index a193229265..c885ce2b05 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -120,7 +120,7 @@ sfc_efx_rx_qrefill(struct sfc_efx_rxq *rxq)
 		     ++i, id = (id + 1) & rxq->ptr_mask) {
 			m = objs[i];
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			rxd = &rxq->sw_desc[id];
 			rxd->mbuf = m;
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 1eed645d02..3bfab37012 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -258,7 +258,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	hdr->pkt_len = (uint16_t)(hdr->data_len + pkt->pkt_len);
 	hdr->nb_segs = pkt->nb_segs + 1;
 
-	__rte_mbuf_sanity_check(hdr, 1);
+	__rte_mbuf_verify(hdr, 1);
 	return hdr;
 }
 /* >8 End of mcast_out_kt. */
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 559d5ad8a7..fc5d4ba29d 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -367,9 +367,9 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
 	return mp;
 }
 
-/* do some sanity checks on a mbuf: panic if it fails */
+/* do some checks on a mbuf: panic if it fails */
 void
-rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header)
 {
 	const char *reason;
 
@@ -377,6 +377,13 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
 		rte_panic("%s\n", reason);
 }
 
+/* For ABI compatibility, to be removed in next release */
+void
+rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+{
+	rte_mbuf_verify(m, is_header);
+}
+
 int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason)
 {
@@ -496,7 +503,7 @@ void rte_pktmbuf_free_bulk(struct rte_mbuf **mbufs, unsigned int count)
 		if (unlikely(m == NULL))
 			continue;
 
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 		do {
 			m_next = m->next;
@@ -546,7 +553,7 @@ rte_pktmbuf_clone(struct rte_mbuf *md, struct rte_mempool *mp)
 		return NULL;
 	}
 
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -596,7 +603,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	struct rte_mbuf *mc, *m_last, **prev;
 
 	/* garbage in check */
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	/* check for request to copy at offset past end of mbuf */
 	if (unlikely(off >= m->pkt_len))
@@ -660,7 +667,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	}
 
 	/* garbage out check */
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -671,7 +678,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	unsigned int len;
 	unsigned int nb_segs;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	fprintf(f, "dump mbuf at %p, iova=%#" PRIx64 ", buf_len=%u\n", m, rte_mbuf_iova_get(m),
 		m->buf_len);
@@ -689,7 +696,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	nb_segs = m->nb_segs;
 
 	while (m && nb_segs != 0) {
-		__rte_mbuf_sanity_check(m, 0);
+		__rte_mbuf_verify(m, 0);
 
 		fprintf(f, "  segment at %p, data=%p, len=%u, off=%u, refcnt=%u\n",
 			m, rte_pktmbuf_mtod(m, void *),
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b788..380663a089 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -339,13 +339,13 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
+/**  do mbuf type in debug mode */
+#define __rte_mbuf_verify(m, is_h) rte_mbuf_verify(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
+/**  ignore mbuf checks if not in debug mode */
+#define __rte_mbuf_verify(m, is_h) do { } while (0)
 
 #endif /*  RTE_LIBRTE_MBUF_DEBUG */
 
@@ -514,10 +514,9 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 
 
 /**
- * Sanity checks on an mbuf.
+ * Check that the mbuf is valid and panic if corrupted.
  *
- * Check the consistency of the given mbuf. The function will cause a
- * panic if corruption is detected.
+ * Acts assertion that mbuf is consistent. If not it calls rte_panic().
  *
  * @param m
  *   The mbuf to be checked.
@@ -526,13 +525,17 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
  *   of a packet (in this case, some fields like nb_segs are not checked)
  */
 void
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header);
+
+/* Older deprecated name for rte_mbuf_verify() */
+void __rte_deprecated
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 /**
- * Sanity checks on a mbuf.
+ * Do consistency checks on a mbuf.
  *
- * Almost like rte_mbuf_sanity_check(), but this function gives the reason
- * if corruption is detected rather than panic.
+ * Check the consistency of the given mbuf and if not valid
+ * return the reason.
  *
  * @param m
  *   The mbuf to be checked.
@@ -551,7 +554,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason);
 
 /**
- * Sanity checks on a reinitialized mbuf in debug mode.
+ * Do checks on a reinitialized mbuf in debug mode.
  *
  * Check the consistency of the given reinitialized mbuf.
  * The function will cause a panic if corruption is detected.
@@ -563,16 +566,16 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
  *   The mbuf to be checked.
  */
 static __rte_always_inline void
-__rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
+__rte_mbuf_raw_verify(__rte_unused const struct rte_mbuf *m)
 {
 	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
 	RTE_ASSERT(m->next == NULL);
 	RTE_ASSERT(m->nb_segs == 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 }
 
 /** For backwards compatibility. */
-#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check(m)
+#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_verify(m)
 
 /**
  * Allocate an uninitialized mbuf from mempool *mp*.
@@ -599,7 +602,7 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
 
 	if (rte_mempool_get(mp, (void **)&m) < 0)
 		return NULL;
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -622,7 +625,7 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
 {
 	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
 		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	rte_mempool_put(m->pool, m);
 }
 
@@ -885,7 +888,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 	rte_pktmbuf_reset_headroom(m);
 
 	m->data_len = 0;
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 }
 
 /**
@@ -941,22 +944,22 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
 	switch (count % 4) {
 	case 0:
 		while (idx != count) {
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 3:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 2:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 1:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
@@ -1184,8 +1187,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 	mi->pkt_len = mi->data_len;
 	mi->nb_segs = 1;
 
-	__rte_mbuf_sanity_check(mi, 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(mi, 1);
+	__rte_mbuf_verify(m, 0);
 }
 
 /**
@@ -1340,7 +1343,7 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 static __rte_always_inline struct rte_mbuf *
 rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 
 	if (likely(rte_mbuf_refcnt_read(m) == 1)) {
 
@@ -1411,7 +1414,7 @@ static inline void rte_pktmbuf_free(struct rte_mbuf *m)
 	struct rte_mbuf *m_next;
 
 	if (m != NULL)
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 	while (m != NULL) {
 		m_next = m->next;
@@ -1492,7 +1495,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
  */
 static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	do {
 		rte_mbuf_refcnt_update(m, v);
@@ -1509,7 +1512,7 @@ static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
  */
 static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return m->data_off;
 }
 
@@ -1523,7 +1526,7 @@ static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
  */
 static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return (uint16_t)(m->buf_len - rte_pktmbuf_headroom(m) -
 			  m->data_len);
 }
@@ -1538,7 +1541,7 @@ static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
  */
 static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	while (m->next != NULL)
 		m = m->next;
 	return m;
@@ -1582,7 +1585,7 @@ static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 static inline char *rte_pktmbuf_prepend(struct rte_mbuf *m,
 					uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > rte_pktmbuf_headroom(m)))
 		return NULL;
@@ -1617,7 +1620,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
 	void *tail;
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > rte_pktmbuf_tailroom(m_last)))
@@ -1645,7 +1648,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
  */
 static inline char *rte_pktmbuf_adj(struct rte_mbuf *m, uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > m->data_len))
 		return NULL;
@@ -1677,7 +1680,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
 {
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > m_last->data_len))
@@ -1699,7 +1702,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
  */
 static inline int rte_pktmbuf_is_contiguous(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	return m->nb_segs == 1;
 }
 
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index daa65e2bbd..c85370e430 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -31,6 +31,7 @@ DPDK_24 {
 	rte_mbuf_set_platform_mempool_ops;
 	rte_mbuf_set_user_mempool_ops;
 	rte_mbuf_user_mempool_ops;
+	rte_mbuf_verify;
 	rte_pktmbuf_clone;
 	rte_pktmbuf_copy;
 	rte_pktmbuf_dump;
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
  2024-05-29 10:43  0%   ` Anoob Joseph
@ 2024-05-30  9:19  0%     ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-05-30  9:19 UTC (permalink / raw)
  To: Anoob Joseph, dev
  Cc: thomas, david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, g.singh, fanzhang.oss,
	jianjay.zhou, asomalap, ruifeng.wang, konstantin.v.ananyev,
	radu.nicolau, ajit.khaparde, Nagadheeraj Rottela, ciara.power

> Subject: RE: [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
> 
> >
> > Added a new fast path API to get the number of used crypto device queue pair
> > depth at any given point.
> >
> > An implementation in cnxk crypto driver is also added along with a test case in
> > test app.
> >
> > The addition of new API causes an ABI warning.
> > This is suppressed as the updated struct rte_crypto_fp_ops is an internal
> > structure and not to be used by application directly.
> >
> 
> Series Acked-by: Anoob Joseph <anoobj@marvell.com>
> 
Applied to dpdk-next-crypto


^ permalink raw reply	[relevance 0%]

* Re: [PATCH 2/2] eal: add Arm WFET in power management intrinsics
  @ 2024-06-04 15:41  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-06-04 15:41 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang, dev, nd,
	Dhruv Tripathi, Honnappa Nagarahalli, Jack Bond-Preston,
	Nick Connolly, Vinod Krishna

On Tue,  4 Jun 2024 04:44:01 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> --- a/lib/eal/arm/include/rte_cpuflags_64.h
> +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> @@ -35,6 +35,7 @@ enum rte_cpu_flag_t {
>  	RTE_CPUFLAG_SVEF32MM,
>  	RTE_CPUFLAG_SVEF64MM,
>  	RTE_CPUFLAG_SVEBF16,
> +	RTE_CPUFLAG_WFXT,
>  	RTE_CPUFLAG_AARCH64,
>  };

Adding new entry in middle of enum will cause ABI to change.

^ permalink raw reply	[relevance 3%]

* [PATCH 1/1] net/ena: restructure the llq policy user setting
  @ 2024-06-06 13:33  3% ` shaibran
  2024-07-05 17:32  4%   ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: shaibran @ 2024-06-06 13:33 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, Shai Brandes

From: Shai Brandes <shaibran@amazon.com>

Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
devargs with a new shared devarg named `llq_policy` that
implements the same logic and accepts the following values:
0 - Disable LLQ.
    Use with extreme caution as it leads to a huge performance
    degradation on AWS instances from 6th generation onwards.
1 - Accept device recommended LLQ policy (Default).
    Device can recommend normal or large LLQ policy.
2 - Enforce normal LLQ policy.
3 - Enforce large LLQ policy.
    Required for packets with header that exceed 96 bytes on
    AWS instances prior to 5th generation.

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
---
 doc/guides/nics/ena.rst                |  25 ++----
 doc/guides/rel_notes/release_24_07.rst |   8 ++
 drivers/net/ena/ena_ethdev.c           | 104 +++++++++----------------
 drivers/net/ena/ena_ethdev.h           |   3 -
 4 files changed, 51 insertions(+), 89 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 2b105834a0..8f693ac3c9 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -107,15 +107,15 @@ Configuration
 Runtime Configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-   * **large_llq_hdr** (default 0)
+   * **llq_policy** (default 1)
 
-     Enables or disables usage of large LLQ headers. This option will have
-     effect only if the device also supports large LLQ headers. Otherwise, the
-     default value will be used.
-
-   * **normal_llq_hdr** (default 0)
-
-     Enforce normal LLQ policy.
+      Controls whether use device recommended header policy or override it.
+      0 - Disable LLQ.
+         **Use with extreme caution as it leads to a huge performance
+           degradation on AWS instances from 6th generation onwards.**
+      1 - Accept device recommended LLQ policy (Default).
+      2 - Enforce normal LLQ policy.
+      3 - Enforce large LLQ policy.
 
    * **miss_txc_to** (default 5)
 
@@ -126,15 +126,6 @@ Runtime Configuration
      timer service. Setting this parameter to 0 disables this feature. Maximum
      allowed value is 60 seconds.
 
-   * **enable_llq** (default 1)
-
-     Determines whenever the driver should use the LLQ (if it's available) or
-     not.
-
-     **NOTE: On the 6th generation AWS instances disabling LLQ may lead to a
-     huge performance degradation. In general disabling LLQ is highly not
-     recommended!**
-
    * **control_poll_interval** (default 0)
 
      Enable polling-based functionality of the admin queues,
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..1fa678864d 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -73,6 +73,12 @@ New Features
   ``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
   container.
 
+* **Updated Amazon ena (Elastic Network Adapter) net driver.**
+
+  * Modified the PMD API that controls the LLQ header policy.
+    Replaced ``enable_llq``, ``normal_llq_hdr`` and ``large_llq_hdr`` devargs
+    with a new shared devarg ``llq_policy`` that keeps the same logic.
+
 * **Update Tap PMD driver.**
 
   * Updated to support up to 8 queues when used by secondary process.
@@ -117,6 +123,8 @@ API Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =======================================================
+* drivers/net/ena: Removed ``enable_llq``, ``normal_llq_hdr`` and ``large_llq_hdr`` devargs
+  and replaced it with a new shared devarg ``llq_policy`` that keeps the same logic.
 
 
 ABI Changes
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 66fc287faf..e3c2696ae1 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -81,18 +81,27 @@ struct ena_stats {
 	ENA_STAT_ENTRY(stat, srd)
 
 /* Device arguments */
-#define ENA_DEVARG_LARGE_LLQ_HDR "large_llq_hdr"
-#define ENA_DEVARG_NORMAL_LLQ_HDR "normal_llq_hdr"
+
+/* Controls whether to disable LLQ, use device recommended header policy
+ * or overriding the device recommendation.
+ * 0 - Disable LLQ.
+ *     Use with extreme caution as it leads to a huge performance
+ *     degradation on AWS instances from 6th generation onwards.
+ * 1 - Accept device recommended LLQ policy (Default).
+ *     Device can recommend normal or large LLQ policy.
+ * 2 - Enforce normal LLQ policy.
+ * 3 - Enforce large LLQ policy.
+ *     Required for packets with header that exceed 96 bytes on
+ *     AWS instances prior to 5th generation.
+ */
+ #define ENA_DEVARG_LLQ_POLICY "llq_policy"
+
+
 /* Timeout in seconds after which a single uncompleted Tx packet should be
  * considered as a missing.
  */
 #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
-/*
- * Controls whether LLQ should be used (if available). Enabled by default.
- * NOTE: It's highly not recommended to disable the LLQ, as it may lead to a
- * huge performance degradation on 6th generation AWS instances.
- */
-#define ENA_DEVARG_ENABLE_LLQ "enable_llq"
+
 /*
  * Controls the period of time (in milliseconds) between two consecutive inspections of
  * the control queues when the driver is in poll mode and not using interrupts.
@@ -296,9 +305,9 @@ static int ena_xstats_get_by_id(struct rte_eth_dev *dev,
 				const uint64_t *ids,
 				uint64_t *values,
 				unsigned int n);
-static int ena_process_bool_devarg(const char *key,
-				   const char *value,
-				   void *opaque);
+static int ena_process_llq_policy_devarg(const char *key,
+					 const char *value,
+					 void *opaque);
 static int ena_parse_devargs(struct ena_adapter *adapter,
 			     struct rte_devargs *devargs);
 static void ena_copy_customer_metrics(struct ena_adapter *adapter,
@@ -314,7 +323,6 @@ static int ena_rx_queue_intr_disable(struct rte_eth_dev *dev,
 static int ena_configure_aenq(struct ena_adapter *adapter);
 static int ena_mp_primary_handle(const struct rte_mp_msg *mp_msg,
 				 const void *peer);
-static ena_llq_policy ena_define_llq_hdr_policy(struct ena_adapter *adapter);
 static bool ena_use_large_llq_hdr(struct ena_adapter *adapter, uint8_t recommended_entry_size);
 
 static const struct eth_dev_ops ena_dev_ops = {
@@ -2292,9 +2300,6 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 
 	/* Assign default devargs values */
 	adapter->missing_tx_completion_to = ENA_TX_TIMEOUT;
-	adapter->enable_llq = true;
-	adapter->use_large_llq_hdr = false;
-	adapter->use_normal_llq_hdr = false;
 
 	/* Get user bypass */
 	rc = ena_parse_devargs(adapter, pci_dev->device.devargs);
@@ -2302,7 +2307,6 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 		PMD_INIT_LOG(CRIT, "Failed to parse devargs\n");
 		goto err;
 	}
-	adapter->llq_header_policy = ena_define_llq_hdr_policy(adapter);
 
 	rc = ena_com_allocate_customer_metrics_buffer(ena_dev);
 	if (rc != 0) {
@@ -3736,44 +3740,29 @@ static int ena_process_uint_devarg(const char *key,
 	return 0;
 }
 
-static int ena_process_bool_devarg(const char *key,
-				   const char *value,
-				   void *opaque)
+static int ena_process_llq_policy_devarg(const char *key, const char *value, void *opaque)
 {
 	struct ena_adapter *adapter = opaque;
-	bool bool_value;
+	uint32_t policy;
 
-	/* Parse the value. */
-	if (strcmp(value, "1") == 0) {
-		bool_value = true;
-	} else if (strcmp(value, "0") == 0) {
-		bool_value = false;
+	policy = strtoul(value, NULL, DECIMAL_BASE);
+	if (policy < ENA_LLQ_POLICY_LAST) {
+		adapter->llq_header_policy = policy;
 	} else {
-		PMD_INIT_LOG(ERR,
-			"Invalid value: '%s' for key '%s'. Accepted: '0' or '1'\n",
-			value, key);
+		PMD_INIT_LOG(ERR, "Invalid value: '%s' for key '%s'. valid [0-3]\n", value, key);
 		return -EINVAL;
 	}
-
-	/* Now, assign it to the proper adapter field. */
-	if (strcmp(key, ENA_DEVARG_LARGE_LLQ_HDR) == 0)
-		adapter->use_large_llq_hdr = bool_value;
-	else if (strcmp(key, ENA_DEVARG_NORMAL_LLQ_HDR) == 0)
-		adapter->use_normal_llq_hdr = bool_value;
-	else if (strcmp(key, ENA_DEVARG_ENABLE_LLQ) == 0)
-		adapter->enable_llq = bool_value;
-
+	PMD_DRV_LOG(INFO,
+		"LLQ policy is %u [0 - disabled, 1 - device recommended, 2 - normal, 3 - large]\n",
+		adapter->llq_header_policy);
 	return 0;
 }
 
-static int ena_parse_devargs(struct ena_adapter *adapter,
-			     struct rte_devargs *devargs)
+static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *devargs)
 {
 	static const char * const allowed_args[] = {
-		ENA_DEVARG_LARGE_LLQ_HDR,
-		ENA_DEVARG_NORMAL_LLQ_HDR,
+		ENA_DEVARG_LLQ_POLICY,
 		ENA_DEVARG_MISS_TXC_TO,
-		ENA_DEVARG_ENABLE_LLQ,
 		ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
 		NULL,
 	};
@@ -3785,27 +3774,17 @@ static int ena_parse_devargs(struct ena_adapter *adapter,
 
 	kvlist = rte_kvargs_parse(devargs->args, allowed_args);
 	if (kvlist == NULL) {
-		PMD_INIT_LOG(ERR, "Invalid device arguments: %s\n",
-			devargs->args);
+		PMD_INIT_LOG(ERR, "Invalid device arguments: %s\n", devargs->args);
 		return -EINVAL;
 	}
-
-	rc = rte_kvargs_process(kvlist, ENA_DEVARG_LARGE_LLQ_HDR,
-		ena_process_bool_devarg, adapter);
-	if (rc != 0)
-		goto exit;
-	rc = rte_kvargs_process(kvlist, ENA_DEVARG_NORMAL_LLQ_HDR,
-		ena_process_bool_devarg, adapter);
+	rc = rte_kvargs_process(kvlist, ENA_DEVARG_LLQ_POLICY,
+		ena_process_llq_policy_devarg, adapter);
 	if (rc != 0)
 		goto exit;
 	rc = rte_kvargs_process(kvlist, ENA_DEVARG_MISS_TXC_TO,
 		ena_process_uint_devarg, adapter);
 	if (rc != 0)
 		goto exit;
-	rc = rte_kvargs_process(kvlist, ENA_DEVARG_ENABLE_LLQ,
-		ena_process_bool_devarg, adapter);
-	if (rc != 0)
-		goto exit;
 	rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
 		ena_process_uint_devarg, adapter);
 	if (rc != 0)
@@ -4029,9 +4008,7 @@ RTE_PMD_REGISTER_PCI(net_ena, rte_ena_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ena, pci_id_ena_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ena, "* igb_uio | uio_pci_generic | vfio-pci");
 RTE_PMD_REGISTER_PARAM_STRING(net_ena,
-	ENA_DEVARG_LARGE_LLQ_HDR "=<0|1> "
-	ENA_DEVARG_NORMAL_LLQ_HDR "=<0|1> "
-	ENA_DEVARG_ENABLE_LLQ "=<0|1> "
+	ENA_DEVARG_LLQ_POLICY "=<0|1|2|3> "
 	ENA_DEVARG_MISS_TXC_TO "=<uint>"
 	ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>");
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE);
@@ -4219,17 +4196,6 @@ ena_mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
 	return rte_mp_reply(&mp_rsp, peer);
 }
 
-static ena_llq_policy ena_define_llq_hdr_policy(struct ena_adapter *adapter)
-{
-	if (!adapter->enable_llq)
-		return ENA_LLQ_POLICY_DISABLED;
-	if (adapter->use_large_llq_hdr)
-		return ENA_LLQ_POLICY_LARGE;
-	if (adapter->use_normal_llq_hdr)
-		return ENA_LLQ_POLICY_NORMAL;
-	return ENA_LLQ_POLICY_RECOMMENDED;
-}
-
 static bool ena_use_large_llq_hdr(struct ena_adapter *adapter, uint8_t recommended_entry_size)
 {
 	if (adapter->llq_header_policy == ENA_LLQ_POLICY_LARGE) {
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 7d82d222ce..fe7d4a2d65 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -337,9 +337,6 @@ struct ena_adapter {
 	uint32_t active_aenq_groups;
 
 	bool trigger_reset;
-	bool enable_llq;
-	bool use_large_llq_hdr;
-	bool use_normal_llq_hdr;
 	ena_llq_policy llq_header_policy;
 
 	uint32_t last_tx_comp_qid;
-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* [PATCH v5 1/2] eventdev/dma: reorganize event DMA ops
  @ 2024-06-07 10:36  3% ` pbhagavatula
  2024-06-08  6:16  9%   ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: pbhagavatula @ 2024-06-07 10:36 UTC (permalink / raw)
  To: jerinj, Amit Prakash Shukla, Vamsi Attunuru; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Re-organize event DMA ops structure to allow holding
source and destination pointers without the need for
additional memory, the mempool allocating memory for
rte_event_dma_adapter_ops can size the structure to
accommodate all the needed source and destination
pointers.

Add multiple words for holding user metadata, adapter
implementation specific metadata and event metadata.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
 v5 Changes:
 - Update release notes with Experimental API changes.
 v4 Changes:
 - Reduce unreleated driver changes and move to 2/2.
 v3 Changes:
 - Fix stdatomic compilation.
 v2 Changes:
 - Fix 32bit compilation

 app/test-eventdev/test_perf_common.c        | 26 ++++--------
 app/test/test_event_dma_adapter.c           | 20 +++------
 doc/guides/prog_guide/event_dma_adapter.rst |  2 +-
 doc/guides/rel_notes/release_24_07.rst      |  3 ++
 drivers/dma/cnxk/cnxk_dmadev_fp.c           | 20 ++++-----
 lib/eventdev/rte_event_dma_adapter.c        | 27 ++++--------
 lib/eventdev/rte_event_dma_adapter.h        | 46 +++++++++++++++------
 7 files changed, 69 insertions(+), 75 deletions(-)

diff --git a/app/test-eventdev/test_perf_common.c b/app/test-eventdev/test_perf_common.c
index 93e6132de8..db0f9c1f3b 100644
--- a/app/test-eventdev/test_perf_common.c
+++ b/app/test-eventdev/test_perf_common.c
@@ -1503,7 +1503,6 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
 		prod = 0;
 		for (; port < perf_nb_event_ports(opt); port++) {
 			struct prod_data *p = &t->prod[port];
-			struct rte_event *response_info;
 			uint32_t flow_id;

 			p->dev_id = opt->dev_id;
@@ -1523,13 +1522,10 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
 			for (flow_id = 0; flow_id < t->nb_flows; flow_id++) {
 				rte_mempool_get(t->da_op_pool, (void **)&op);

-				op->src_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-				op->dst_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-
-				op->src_seg->addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
-				op->dst_seg->addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
-				op->src_seg->length = 1024;
-				op->dst_seg->length = 1024;
+				op->src_dst_seg[0].addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
+				op->src_dst_seg[1].addr = rte_pktmbuf_iova(rte_pktmbuf_alloc(pool));
+				op->src_dst_seg[0].length = 1024;
+				op->src_dst_seg[1].length = 1024;
 				op->nb_src = 1;
 				op->nb_dst = 1;
 				op->flags = RTE_DMA_OP_FLAG_SUBMIT;
@@ -1537,12 +1533,6 @@ perf_event_dev_port_setup(struct evt_test *test, struct evt_options *opt,
 				op->dma_dev_id = dma_dev_id;
 				op->vchan = vchan_id;

-				response_info = (struct rte_event *)((uint8_t *)op +
-						 sizeof(struct rte_event_dma_adapter_op));
-				response_info->queue_id = p->queue_id;
-				response_info->sched_type = RTE_SCHED_TYPE_ATOMIC;
-				response_info->flow_id = flow_id;
-
 				p->da.dma_op[flow_id] = op;
 			}

@@ -2036,7 +2026,7 @@ perf_dmadev_setup(struct evt_test *test, struct evt_options *opt)
 		return -ENODEV;
 	}

-	elt_size = sizeof(struct rte_event_dma_adapter_op) + sizeof(struct rte_event);
+	elt_size = sizeof(struct rte_event_dma_adapter_op) + (sizeof(struct rte_dma_sge) * 2);
 	t->da_op_pool = rte_mempool_create("dma_op_pool", opt->pool_sz, elt_size, 256,
 					   0, NULL, NULL, NULL, NULL, rte_socket_id(), 0);
 	if (t->da_op_pool == NULL) {
@@ -2085,10 +2075,8 @@ perf_dmadev_destroy(struct evt_test *test, struct evt_options *opt)
 		for (flow_id = 0; flow_id < t->nb_flows; flow_id++) {
 			op = p->da.dma_op[flow_id];

-			rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_seg->addr);
-			rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->dst_seg->addr);
-			rte_free(op->src_seg);
-			rte_free(op->dst_seg);
+			rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_dst_seg[0].addr);
+			rte_pktmbuf_free((struct rte_mbuf *)(uintptr_t)op->src_dst_seg[1].addr);
 			rte_mempool_put(op->op_mp, op);
 		}

diff --git a/app/test/test_event_dma_adapter.c b/app/test/test_event_dma_adapter.c
index 35b417b69f..d9dff4ff7d 100644
--- a/app/test/test_event_dma_adapter.c
+++ b/app/test/test_event_dma_adapter.c
@@ -235,7 +235,6 @@ test_op_forward_mode(void)
 	struct rte_mbuf *dst_mbuf[TEST_MAX_OP];
 	struct rte_event_dma_adapter_op *op;
 	struct rte_event ev[TEST_MAX_OP];
-	struct rte_event response_info;
 	int ret, i;

 	ret = rte_pktmbuf_alloc_bulk(params.src_mbuf_pool, src_mbuf, TEST_MAX_OP);
@@ -253,14 +252,11 @@ test_op_forward_mode(void)
 		rte_mempool_get(params.op_mpool, (void **)&op);
 		TEST_ASSERT_NOT_NULL(op, "Failed to allocate dma operation struct\n");

-		op->src_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-		op->dst_seg = rte_malloc(NULL, sizeof(struct rte_dma_sge), 0);
-
 		/* Update Op */
-		op->src_seg->addr = rte_pktmbuf_iova(src_mbuf[i]);
-		op->dst_seg->addr = rte_pktmbuf_iova(dst_mbuf[i]);
-		op->src_seg->length = PACKET_LENGTH;
-		op->dst_seg->length = PACKET_LENGTH;
+		op->src_dst_seg[0].addr = rte_pktmbuf_iova(src_mbuf[i]);
+		op->src_dst_seg[1].addr = rte_pktmbuf_iova(dst_mbuf[i]);
+		op->src_dst_seg[0].length = PACKET_LENGTH;
+		op->src_dst_seg[1].length = PACKET_LENGTH;
 		op->nb_src = 1;
 		op->nb_dst = 1;
 		op->flags = RTE_DMA_OP_FLAG_SUBMIT;
@@ -268,10 +264,6 @@ test_op_forward_mode(void)
 		op->dma_dev_id = TEST_DMA_DEV_ID;
 		op->vchan = TEST_DMA_VCHAN_ID;

-		response_info.event = dma_response_info.event;
-		rte_memcpy((uint8_t *)op + sizeof(struct rte_event_dma_adapter_op), &response_info,
-			   sizeof(struct rte_event));
-
 		/* Fill in event info and update event_ptr with rte_event_dma_adapter_op */
 		memset(&ev[i], 0, sizeof(struct rte_event));
 		ev[i].event = 0;
@@ -294,8 +286,6 @@ test_op_forward_mode(void)

 		TEST_ASSERT_EQUAL(ret, 0, "Data mismatch for dma adapter\n");

-		rte_free(op->src_seg);
-		rte_free(op->dst_seg);
 		rte_mempool_put(op->op_mp, op);
 	}

@@ -400,7 +390,7 @@ configure_dmadev(void)
 						       rte_socket_id());
 	RTE_TEST_ASSERT_NOT_NULL(params.dst_mbuf_pool, "Can't create DMA_DST_MBUFPOOL\n");

-	elt_size = sizeof(struct rte_event_dma_adapter_op) + sizeof(struct rte_event);
+	elt_size = sizeof(struct rte_event_dma_adapter_op) + (sizeof(struct rte_dma_sge) * 2);
 	params.op_mpool = rte_mempool_create("EVENT_DMA_OP_POOL", DMA_OP_POOL_SIZE, elt_size, 0,
 					     0, NULL, NULL, NULL, NULL, rte_socket_id(), 0);
 	RTE_TEST_ASSERT_NOT_NULL(params.op_mpool, "Can't create DMA_OP_POOL\n");
diff --git a/doc/guides/prog_guide/event_dma_adapter.rst b/doc/guides/prog_guide/event_dma_adapter.rst
index 3443b6a803..1fb9b0a07b 100644
--- a/doc/guides/prog_guide/event_dma_adapter.rst
+++ b/doc/guides/prog_guide/event_dma_adapter.rst
@@ -144,7 +144,7 @@ on which it enqueues events towards the DMA adapter using ``rte_event_enqueue_bu
    uint32_t cap;
    int ret;

-   /* Fill in event info and update event_ptr with rte_dma_op */
+   /* Fill in event info and update event_ptr with rte_event_dma_adapter_op */
    memset(&ev, 0, sizeof(ev));
    .
    .
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..7800cb4c31 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -84,6 +84,9 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================

+* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
+  to optimize the memory layout and improve performance.
+

 ABI Changes
 -----------
diff --git a/drivers/dma/cnxk/cnxk_dmadev_fp.c b/drivers/dma/cnxk/cnxk_dmadev_fp.c
index f6562b603e..8a3c0c1008 100644
--- a/drivers/dma/cnxk/cnxk_dmadev_fp.c
+++ b/drivers/dma/cnxk/cnxk_dmadev_fp.c
@@ -490,8 +490,8 @@ cn10k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events)
 		hdr[1] = ((uint64_t)comp_ptr);
 		hdr[2] = cnxk_dma_adapter_format_event(rsp_info->event);

-		src = &op->src_seg[0];
-		dst = &op->dst_seg[0];
+		src = &op->src_dst_seg[0];
+		dst = &op->src_dst_seg[op->nb_src];

 		if (CNXK_TAG_IS_HEAD(work->gw_rdata) ||
 		    ((CNXK_TT_FROM_TAG(work->gw_rdata) == SSO_TT_ORDERED) &&
@@ -566,12 +566,12 @@ cn9k_dma_adapter_dual_enqueue(void *ws, struct rte_event ev[], uint16_t nb_event
 		 * For all other cases, src pointers are first pointers.
 		 */
 		if (((dpi_conf->cmd.u >> 48) & DPI_HDR_XTYPE_MASK) == DPI_XTYPE_INBOUND) {
-			fptr = &op->dst_seg[0];
-			lptr = &op->src_seg[0];
+			fptr = &op->src_dst_seg[nb_src];
+			lptr = &op->src_dst_seg[0];
 			RTE_SWAP(nb_src, nb_dst);
 		} else {
-			fptr = &op->src_seg[0];
-			lptr = &op->dst_seg[0];
+			fptr = &op->src_dst_seg[0];
+			lptr = &op->src_dst_seg[nb_src];
 		}

 		hdr[0] = ((uint64_t)nb_dst << 54) | (uint64_t)nb_src << 48;
@@ -647,12 +647,12 @@ cn9k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events)
 		 * For all other cases, src pointers are first pointers.
 		 */
 		if (((dpi_conf->cmd.u >> 48) & DPI_HDR_XTYPE_MASK) == DPI_XTYPE_INBOUND) {
-			fptr = &op->dst_seg[0];
-			lptr = &op->src_seg[0];
+			fptr = &op->src_dst_seg[nb_src];
+			lptr = &op->src_dst_seg[0];
 			RTE_SWAP(nb_src, nb_dst);
 		} else {
-			fptr = &op->src_seg[0];
-			lptr = &op->dst_seg[0];
+			fptr = &op->src_dst_seg[0];
+			lptr = &op->src_dst_seg[nb_src];
 		}

 		hdr[0] = ((uint64_t)nb_dst << 54) | (uint64_t)nb_src << 48;
diff --git a/lib/eventdev/rte_event_dma_adapter.c b/lib/eventdev/rte_event_dma_adapter.c
index 24dff556db..e52ef46a1b 100644
--- a/lib/eventdev/rte_event_dma_adapter.c
+++ b/lib/eventdev/rte_event_dma_adapter.c
@@ -236,9 +236,9 @@ edma_circular_buffer_flush_to_dma_dev(struct event_dma_adapter *adapter,
 				      uint16_t vchan, uint16_t *nb_ops_flushed)
 {
 	struct rte_event_dma_adapter_op *op;
-	struct dma_vchan_info *tq;
 	uint16_t *head = &bufp->head;
 	uint16_t *tail = &bufp->tail;
+	struct dma_vchan_info *tq;
 	uint16_t n;
 	uint16_t i;
 	int ret;
@@ -257,11 +257,13 @@ edma_circular_buffer_flush_to_dma_dev(struct event_dma_adapter *adapter,
 	for (i = 0; i < n; i++)	{
 		op = bufp->op_buffer[*head];
 		if (op->nb_src == 1 && op->nb_dst == 1)
-			ret = rte_dma_copy(dma_dev_id, vchan, op->src_seg->addr, op->dst_seg->addr,
-					   op->src_seg->length, op->flags);
+			ret = rte_dma_copy(dma_dev_id, vchan, op->src_dst_seg[0].addr,
+					   op->src_dst_seg[1].addr, op->src_dst_seg[0].length,
+					   op->flags);
 		else
-			ret = rte_dma_copy_sg(dma_dev_id, vchan, op->src_seg, op->dst_seg,
-					      op->nb_src, op->nb_dst, op->flags);
+			ret = rte_dma_copy_sg(dma_dev_id, vchan, &op->src_dst_seg[0],
+					      &op->src_dst_seg[op->nb_src], op->nb_src, op->nb_dst,
+					      op->flags);
 		if (ret < 0)
 			break;

@@ -511,8 +513,7 @@ edma_enq_to_dma_dev(struct event_dma_adapter *adapter, struct rte_event *ev, uns
 		if (dma_op == NULL)
 			continue;

-		/* Expected to have response info appended to dma_op. */
-
+		dma_op->impl_opaque[0] = ev[i].event;
 		dma_dev_id = dma_op->dma_dev_id;
 		vchan = dma_op->vchan;
 		vchan_qinfo = &adapter->dma_devs[dma_dev_id].vchanq[vchan];
@@ -647,7 +648,6 @@ edma_ops_enqueue_burst(struct event_dma_adapter *adapter, struct rte_event_dma_a
 	uint8_t event_port_id = adapter->event_port_id;
 	uint8_t event_dev_id = adapter->eventdev_id;
 	struct rte_event events[DMA_BATCH_SIZE];
-	struct rte_event *response_info;
 	uint16_t nb_enqueued, nb_ev;
 	uint8_t retry;
 	uint8_t i;
@@ -659,16 +659,7 @@ edma_ops_enqueue_burst(struct event_dma_adapter *adapter, struct rte_event_dma_a
 	for (i = 0; i < num; i++) {
 		struct rte_event *ev = &events[nb_ev++];

-		/* Expected to have response info appended to dma_op. */
-		response_info = (struct rte_event *)((uint8_t *)ops[i] +
-							  sizeof(struct rte_event_dma_adapter_op));
-		if (unlikely(response_info == NULL)) {
-			if (ops[i] != NULL && ops[i]->op_mp != NULL)
-				rte_mempool_put(ops[i]->op_mp, ops[i]);
-			continue;
-		}
-
-		rte_memcpy(ev, response_info, sizeof(struct rte_event));
+		ev->event = ops[i]->impl_opaque[0];
 		ev->event_ptr = ops[i];
 		ev->event_type = RTE_EVENT_TYPE_DMADEV;
 		if (adapter->implicit_release_disabled)
diff --git a/lib/eventdev/rte_event_dma_adapter.h b/lib/eventdev/rte_event_dma_adapter.h
index e924ab673d..048ddba3f3 100644
--- a/lib/eventdev/rte_event_dma_adapter.h
+++ b/lib/eventdev/rte_event_dma_adapter.h
@@ -157,24 +157,46 @@ extern "C" {
  * instance.
  */
 struct rte_event_dma_adapter_op {
-	struct rte_dma_sge *src_seg;
-	/**< Source segments. */
-	struct rte_dma_sge *dst_seg;
-	/**< Destination segments. */
-	uint16_t nb_src;
-	/**< Number of source segments. */
-	uint16_t nb_dst;
-	/**< Number of destination segments. */
 	uint64_t flags;
 	/**< Flags related to the operation.
 	 * @see RTE_DMA_OP_FLAG_*
 	 */
-	int16_t dma_dev_id;
-	/**< DMA device ID to be used */
-	uint16_t vchan;
-	/**< DMA vchan ID to be used */
 	struct rte_mempool *op_mp;
 	/**< Mempool from which op is allocated. */
+	enum rte_dma_status_code status;
+	/**< Status code for this operation. */
+	uint32_t rsvd;
+	/**< Reserved for future use. */
+	uint64_t impl_opaque[2];
+	/**< Implementation-specific opaque data.
+	 * An dma device implementation use this field to hold
+	 * implementation specific values to share between dequeue and enqueue
+	 * operations.
+	 * The application should not modify this field.
+	 */
+	uint64_t user_meta;
+	/**<  Memory to store user specific metadata.
+	 * The dma device implementation should not modify this area.
+	 */
+	uint64_t event_meta;
+	/**< Event metadata that defines event attributes when used in OP_NEW mode.
+	 * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_NEW
+	 * @see struct rte_event::event
+	 */
+	int16_t dma_dev_id;
+	/**< DMA device ID to be used with OP_FORWARD mode.
+	 * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_FORWARD
+	 */
+	uint16_t vchan;
+	/**< DMA vchan ID to be used with OP_FORWARD mode
+	 * @see rte_event_dma_adapter_mode::RTE_EVENT_DMA_ADAPTER_OP_FORWARD
+	 */
+	uint16_t nb_src;
+	/**< Number of source segments. */
+	uint16_t nb_dst;
+	/**< Number of destination segments. */
+	struct rte_dma_sge src_dst_seg[0];
+	/**< Source and destination segments. */
 };

 /**
--
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v5 1/2] eventdev/dma: reorganize event DMA ops
  2024-06-07 10:36  3% ` [PATCH v5 " pbhagavatula
@ 2024-06-08  6:16  9%   ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2024-06-08  6:16 UTC (permalink / raw)
  To: pbhagavatula; +Cc: jerinj, Amit Prakash Shukla, Vamsi Attunuru, dev

On Fri, Jun 7, 2024 at 11:53 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Re-organize event DMA ops structure to allow holding
> source and destination pointers without the need for
> additional memory, the mempool allocating memory for
> rte_event_dma_adapter_ops can size the structure to
> accommodate all the needed source and destination
> pointers.
>
> Add multiple words for holding user metadata, adapter
> implementation specific metadata and event metadata.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> ---
>  v5 Changes:
>  - Update release notes with Experimental API changes.
>  v4 Changes:
>  - Reduce unreleated driver changes and move to 2/2.
>  v3 Changes:
>  - Fix stdatomic compilation.
>  v2 Changes:
>  - Fix 32bit compilation
>

>     .
> diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
> index a69f24cf99..7800cb4c31 100644
> --- a/doc/guides/rel_notes/release_24_07.rst
> +++ b/doc/guides/rel_notes/release_24_07.rst
> @@ -84,6 +84,9 @@ API Changes

It is not API change. Applied following diff and Applied series to
dpdk-next-eventdev/for-main. Thanks


[for-main][dpdk-next-eventdev] $ git diff
diff --git a/doc/guides/rel_notes/release_24_07.rst
b/doc/guides/rel_notes/release_24_07.rst
index 09e58dddf2..14bd5d37b1 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -91,9 +91,6 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================

-* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
-  to optimize the memory layout and improve performance.
-

 ABI Changes
 -----------
@@ -112,6 +109,9 @@ ABI Changes

 * No ABI change that would break compatibility with 23.11.

+* eventdev/dma: Reorganize the experimental fastpath structure
``rte_event_dma_adapter_op``
+  to optimize the memory layout and improve performance.
+

>     Also, make sure to start the actual text at the margin.
>     =======================================================
>
> +* eventdev: Reorganize the fastpath structure ``rte_event_dma_adapter_op``
> +  to optimize the memory layout and improve performance.
> +
>
>  ABI Changes

^ permalink raw reply	[relevance 9%]

* Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
  @ 2024-06-11 14:52  3%     ` Ferruh Yigit
  2024-06-12  1:25  0%       ` rongwei liu
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-06-11 14:52 UTC (permalink / raw)
  To: Rongwei Liu, dev, matan, viacheslavo, orika, suanmingm, thomas
  Cc: Dariusz Sosnowski, Aman Singh, Yuying Zhang, Andrew Rybchenko

On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> diff --git a/lib/net/rte_vxlan.h b/lib/net/rte_vxlan.h
> index 997fc784fc..57300fb442 100644
> --- a/lib/net/rte_vxlan.h
> +++ b/lib/net/rte_vxlan.h
> @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
>  			uint8_t    flags;    /**< Should be 8 (I flag). */
>  			uint8_t    rsvd0[3]; /**< Reserved. */
>  			uint8_t    vni[3];   /**< VXLAN identifier. */
> -			uint8_t    rsvd1;    /**< Reserved. */
> +			union {
> +				uint8_t    rsvd1;        /**< Reserved. */
> +				uint8_t    last_rsvd;    /**< Reserved. */
> +			};
>

Is there a plan to remove 'rsvd1' in next ABI break release?
We can keep both, but I guess it is not logically necessary to keep it,
to prevent bloat by time, we can remove the old one.
If decided to remove, sending a 'deprecation.rst' update helps us to
remember doing it.



^ permalink raw reply	[relevance 3%]

* Re: [RFC 0/2] ethdev: update GENEVE option item structure
  @ 2024-06-11 17:07  4% ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-06-11 17:07 UTC (permalink / raw)
  To: Michael Baum, dev; +Cc: Dariusz Sosnowski, Thomas Monjalon, Ori Kam

On 4/17/2024 8:23 AM, Michael Baum wrote:
> The "rte_flow_item_geneve_opt" structure describes the GENEVE TLV option
> header according to RFC 8926 [1]:
> 
> struct rte_flow_item_geneve_opt {
> 	rte_be16_t option_class;
> 	uint8_t option_type;
> 	uint8_t option_len;
> 	uint32_t *data;
> };
> 
> The "option_len" field is used for two different purposes:
>  1. item field for matching with value/mask.
>  2. descriptor for data array size.
> 

For the long run solution, we may consider adding geneve option header
to net/rte_geneve.h and make "struct rte_flow_item_geneve_opt" + data size ?

> Those two different purposes might limit each other. For example, when
> matching on length with full mask (0x1f), the data array in the mask
> structure might be taken as size 31 and read invalid memory.
> 
> This problem appears in conversion API. In current implementation, the
> "rte_flow_conv" API copies the "rte_flow_item_geneve_opt" structure
> without taking care about data deep-copy. The attempt to solve this
> revealed the problem in determining the size of the mask data array. To
> resolve this issue, two solutions are suggested.
> 

Are we having this problem only with geneve options because data size is
not fixed / defined for the header?

> Immediate Workaround:
> The data array size in the "mask" structure is determined by
> "option_len" field in the "spec" structure. This workaround can be
> integrated soon to avoid deep-copy missing.
> 

This requires a geneve specific pointer in the item spec, which is not
really nice, although it is temporary solution. Perhaps we can skip
this, can you please check below comment.

> Long Run Solution:
> Add a new field into "rte_flow_item_geneve_opt" structure regardless to
> "option_len" field. This solution should wait to "24.11" version since
> it contains API change.
>

I was expecting the same, but CI seems passed ABI test case [1], it may
be because new field appended end of the struct.
Can you please double check, if ABI is not broken, we can go with this
solution directly?

[1]
https://mails.dpdk.org/archives/test-report/2024-April/643570.html

> When the API is changed, I'll take the opportunity to add documentation
> for this item in "rte_flow.rst" file and update the data type to
> "rte_be32_t".
> 

If we can go with updating struct in this release, adding protocol
option struct in net library can wait v24.11 release.
So "rte_be32_t" type change in this struct won't be a thing.

> [1] https://datatracker.ietf.org/doc/html/rfc8926
> 
> Michael Baum (2):
>   ethdev: fix GENEVE option item conversion
>   ethdev: add data size field to GENEVE option item
> 
> 

@Ori, Can you please help reviewing this patch?
At worst, it can be good to address the fix in this release.


^ permalink raw reply	[relevance 4%]

* RE: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
  2024-06-11 14:52  3%     ` Ferruh Yigit
@ 2024-06-12  1:25  0%       ` rongwei liu
  2024-06-25 14:46  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: rongwei liu @ 2024-06-12  1:25 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	Suanming Mou, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Andrew Rybchenko
  Cc: Dariusz Sosnowski, Aman Singh, Yuying Zhang



BR
Rongwei

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Tuesday, June 11, 2024 22:53
> To: rongwei liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> > diff --git a/lib/net/rte_vxlan.h b/lib/net/rte_vxlan.h index
> > 997fc784fc..57300fb442 100644
> > --- a/lib/net/rte_vxlan.h
> > +++ b/lib/net/rte_vxlan.h
> > @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
> >                       uint8_t    flags;    /**< Should be 8 (I flag). */
> >                       uint8_t    rsvd0[3]; /**< Reserved. */
> >                       uint8_t    vni[3];   /**< VXLAN identifier. */
> > -                     uint8_t    rsvd1;    /**< Reserved. */
> > +                     union {
> > +                             uint8_t    rsvd1;        /**< Reserved. */
> > +                             uint8_t    last_rsvd;    /**< Reserved. */
> > +                     };
> >
> 
> Is there a plan to remove 'rsvd1' in next ABI break release?
> We can keep both, but I guess it is not logically necessary to keep it, to prevent
> bloat by time, we can remove the old one.
> If decided to remove, sending a 'deprecation.rst' update helps us to remember
> doing it.
> 
I think it should. @NBU-Contact-Thomas Monjalon (EXTERNAL) @Andrew Rybchenko@Ori Kam what do you think?

^ permalink raw reply	[relevance 0%]

* [PATCH v2 0/2] power: introduce PM QoS interface
  @ 2024-06-13 11:20  4% ` Huisong Li
  2024-06-13 11:20  5%   ` [PATCH v2 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-06-19  6:31  4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-06-13 11:20 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, liuyonglong, lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  22 +++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  29 +++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 116 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  70 +++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 245 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v2 1/2] power: introduce PM QoS API on CPU wide
  2024-06-13 11:20  4% ` [PATCH v2 0/2] power: " Huisong Li
@ 2024-06-13 11:20  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-06-13 11:20 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, liuyonglong, lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 doc/guides/prog_guide/power_man.rst    |  22 +++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 116 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  70 +++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 216 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..3ff46f06c1 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,28 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can effect the work
+CPU's idle state selection and just allow to enter the shallowest idle state
+if set to zero (strict resume latency) for this CPU.
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can obtain the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..7c0d36e389 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
 
   * Added SSE/NEON vector datapath.
 
+* **Introduce PM QoS interface.**
+
+  * Introduce PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..706f8432ee
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT	((int)(UINT32_MAX >> 1))
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[BUFSIZ] = {0};
+	FILE *f;
+	int ret;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+			  lcore_id, RTE_MAX_LCORE - 1U);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		sprintf(buf, "%s", "n/a");
+	else if (latency == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		sprintf(buf, "%u", 0);
+	else
+		sprintf(buf, "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[BUFSIZ];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+				lcore_id, RTE_MAX_LCORE - 1U);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? PM_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..1ba9568d1b
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define PM_QOS_STRICT_LATENCY_VALUE             0
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH v9 4/4] hash: add SVE support for bulk key lookup
  @ 2024-06-14 13:42  4%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-14 13:42 UTC (permalink / raw)
  To: Yoan Picchi
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
	dev, nd, Harjot Singh, Nathan Brown, Ruifeng Wang

On Tue, Apr 30, 2024 at 6:28 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> - Implemented SVE code for comparing signatures in bulk lookup.
> - New SVE code is ~5% slower than optimized NEON for N2 processor for
> 128b vectors.
>
> Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
> Signed-off-by: Harjot Singh <harjot.singh@arm.com>
> Reviewed-by: Nathan Brown <nathan.brown@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/hash/arch/arm/compare_signatures.h | 58 ++++++++++++++++++++++++++
>  lib/hash/rte_cuckoo_hash.c             |  7 +++-
>  lib/hash/rte_cuckoo_hash.h             |  1 +
>  3 files changed, 65 insertions(+), 1 deletion(-)
>
> diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/compare_signatures.h
> index 72bd171484..b4b4cf04e9 100644
> --- a/lib/hash/arch/arm/compare_signatures.h
> +++ b/lib/hash/arch/arm/compare_signatures.h
> @@ -47,6 +47,64 @@ compare_signatures_dense(uint16_t *hitmask_buffer,
>                 *hitmask_buffer = vaddvq_u16(hit2);
>                 }
>                 break;
> +#endif
> +#if defined(RTE_HAS_SVE_ACLE)
> +       case RTE_HASH_COMPARE_SVE: {
> +               svuint16_t vsign, shift, sv_matches;
> +               svbool_t pred, match, bucket_wide_pred;
> +               int i = 0;
> +               uint64_t vl = svcnth();
> +
> +               vsign = svdup_u16(sig);
> +               shift = svindex_u16(0, 1);
> +
> +               if (vl >= 2 * RTE_HASH_BUCKET_ENTRIES && RTE_HASH_BUCKET_ENTRIES <= 8) {
> +                       svuint16_t primary_array_vect, secondary_array_vect;
> +                       bucket_wide_pred = svwhilelt_b16(0, RTE_HASH_BUCKET_ENTRIES);
> +                       primary_array_vect = svld1_u16(bucket_wide_pred, prim_bucket_sigs);
> +                       secondary_array_vect = svld1_u16(bucket_wide_pred, sec_bucket_sigs);
> +
> +                       /* We merged the two vectors so we can do both comparisons at once */
> +                       primary_array_vect = svsplice_u16(bucket_wide_pred,
> +                               primary_array_vect,
> +                               secondary_array_vect);
> +                       pred = svwhilelt_b16(0, 2*RTE_HASH_BUCKET_ENTRIES);
> +
> +                       /* Compare all signatures in the buckets */
> +                       match = svcmpeq_u16(pred, vsign, primary_array_vect);
> +                       if (svptest_any(svptrue_b16(), match)) {
> +                               sv_matches = svdup_u16(1);
> +                               sv_matches = svlsl_u16_z(match, sv_matches, shift);
> +                               *hitmask_buffer = svorv_u16(svptrue_b16(), sv_matches);
> +                       }
> +               } else {
> +                       do {
> +                               pred = svwhilelt_b16(i, RTE_HASH_BUCKET_ENTRIES);
> +                               uint16_t lower_half = 0;
> +                               uint16_t upper_half = 0;
> +                               /* Compare all signatures in the primary bucket */
> +                               match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
> +                                                       &prim_bucket_sigs[i]));
> +                               if (svptest_any(svptrue_b16(), match)) {
> +                                       sv_matches = svdup_u16(1);
> +                                       sv_matches = svlsl_u16_z(match, sv_matches, shift);
> +                                       lower_half = svorv_u16(svptrue_b16(), sv_matches);
> +                               }
> +                               /* Compare all signatures in the secondary bucket */
> +                               match = svcmpeq_u16(pred, vsign, svld1_u16(pred,
> +                                                       &sec_bucket_sigs[i]));
> +                               if (svptest_any(svptrue_b16(), match)) {
> +                                       sv_matches = svdup_u16(1);
> +                                       sv_matches = svlsl_u16_z(match, sv_matches, shift);
> +                                       upper_half = svorv_u16(svptrue_b16(), sv_matches)
> +                                               << RTE_HASH_BUCKET_ENTRIES;
> +                               }
> +                               hitmask_buffer[i / 8] = upper_half | lower_half;
> +                               i += vl;
> +                       } while (i < RTE_HASH_BUCKET_ENTRIES);
> +               }
> +               }
> +               break;
>  #endif
>         default:
>                 for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index 0697743cdf..75f555ba2c 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -450,8 +450,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
>                 h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
>         else
>  #elif defined(RTE_ARCH_ARM64)
> -       if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON))
> +       if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
>                 h->sig_cmp_fn = RTE_HASH_COMPARE_NEON;
> +#if defined(RTE_HAS_SVE_ACLE)
> +               if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
> +                       h->sig_cmp_fn = RTE_HASH_COMPARE_SVE;
> +#endif
> +       }
>         else
>  #endif
>                 h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index a528f1d1a0..01ad01c258 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -139,6 +139,7 @@ enum rte_hash_sig_compare_function {
>         RTE_HASH_COMPARE_SCALAR = 0,
>         RTE_HASH_COMPARE_SSE,
>         RTE_HASH_COMPARE_NEON,
> +       RTE_HASH_COMPARE_SVE,
>         RTE_HASH_COMPARE_NUM
>  };

I am surprised the ABI check does not complain over this change.
RTE_HASH_COMPARE_NUM is not used and knowing the number of compare
function implementations should not be of interest for an application.
But it still seem an ABI breakage to me.

RTE_HASH_COMPARE_NUM can be removed in v24.11.
And ideally, sig_cmp_fn should be made opaque (or moved to an opaque
struct out of the rte_hash public struct).


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v5] graph: expose node context as pointers
    2024-05-29 17:54  0%   ` Nithin Dabilpuram
@ 2024-06-18 12:33  4%   ` David Marchand
  2024-06-25 15:22  0%     ` Robin Jarry
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2024-06-18 12:33 UTC (permalink / raw)
  To: Robin Jarry
  Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
	Tyler Retzlaff

Re Robin,

On Wed, Mar 27, 2024 at 10:17 AM Robin Jarry <rjarry@redhat.com> wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. And without
> intermediate structures, casting to opaque pointers is hard without
> violating strict aliasing rules.
>
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
> preserve existing alignment on architectures where cache lines are 128
> bytes.
>
> Add a static assert to ensure that the unnamed union is not larger than
> the context array (RTE_NODE_CTX_SZ).
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
>
> Notes:
>     v5:
>
>     * Helper functions to hide casting proved to be harder than expected.
>       Naive casting may even be impossible without breaking strict aliasing
>       rules. The only other option would be to use explicit memcpy calls.
>     * Unnamed union tentative again. As suggested by Tyler (thank you!),
>       using an intermediate unnamed struct to carry the alignment produces
>       consistent ABI in C and C++.
>     * Also, Tyler (thank you!) suggested that the fast path area alignment
>       size may be incorrect for architectures where the cache line is not 64
>       bytes. There will be a 64 bytes hole in the structure at the end of
>       the unnamed struct before the zero length next nodes array. Use
>       __rte_cache_min_aligned to preserve existing alignment.

- There is still an issue with that approach on 128 bytes cache line
arches, like ARM.
This results in a ABI breakage:

Functions changes summary: 0 Removed, 1 Changed (9 filtered out), 0
Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some indirect sub-type change:

  [C] 'function bool
__rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
some indirect sub-type changes:
    parameter 1 of type 'rte_node*' has sub-type changes:
      in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
        type size changed from 3072 to 2048 (in bits)
        7 data member deletions:
          'uint8_t ctx[16]', at offset 2048 (in bits) at
rte_graph_worker_common.h:115:1
          'uint16_t size', at offset 2176 (in bits) at
rte_graph_worker_common.h:116:1
          'uint16_t idx', at offset 2192 (in bits) at
rte_graph_worker_common.h:117:1
          'rte_graph_off_t off', at offset 2208 (in bits) at
rte_graph_worker_common.h:118:1
          'uint64_t total_cycles', at offset 2240 (in bits) at
rte_graph_worker_common.h:119:1
          'uint64_t total_calls', at offset 2304 (in bits) at
rte_graph_worker_common.h:120:1
          'uint64_t total_objs', at offset 2368 (in bits) at
rte_graph_worker_common.h:121:1
        1 data member insertion:
          'struct {union {uint8_t ctx[16]; struct {void* ctx_ptr;
void* ctx_ptr2;};}; uint16_t size; uint16_t idx; rte_graph_off_t off;
uint64_t total_cycles; uint64_t total_calls; uint64_t total_objs;
union {void** objs; uint64_t objs_u64;}; union {rte_node_process_t
process; uint64_t process_u64;};}', at offset 1536 (in bits)
        1 data member changes (1 filtered):
          'rte_node* nodes[]' offset changed from 2560 to 2048 (in
bits) (by -512 bits)


Before the patch, the rte_node object layout was:

struct rte_node {
...
        /* XXX 64 bytes hole, try to pack */

        /* --- cacheline 4 boundary (256 bytes) --- */
        uint8_t                    ctx[16]
__attribute__((__aligned__(128))); /*   256    16 */
        uint16_t                   size;                 /*   272     2 */
        uint16_t                   idx;                  /*   274     2 */
        rte_graph_off_t            off;                  /*   276     4 */
        uint64_t                   total_cycles;         /*   280     8 */
        uint64_t                   total_calls;          /*   288     8 */
        uint64_t                   total_objs;           /*   296     8 */
        union {
                void * *           objs;                 /*   304     8 */
                uint64_t           objs_u64;             /*   304     8 */
        };                                               /*   304     8 */
        union {
                rte_node_process_t process;              /*   312     8 */
                uint64_t           process_u64;          /*   312     8 */
        };                                               /*   312     8 */
        /* --- cacheline 5 boundary (320 bytes) --- */
        struct rte_node *          nodes[]
__attribute__((__aligned__(64))); /*   320     0 */

        /* size: 384, cachelines: 6, members: 20 */
        /* sum members: 250, holes: 3, sum holes: 70 */
        /* padding: 64 */
        /* forced alignments: 2, forced holes: 1, sum forced holes: 64 */
} __attribute__((__aligned__(128)));


After this patch:

struct rte_node {
...
        /* --- cacheline 3 boundary (192 bytes) --- */
        struct {
                union {
                        uint8_t    ctx[16];              /*   192    16 */
                        struct {
                                void * ctx_ptr;          /*   192     8 */
                                void * ctx_ptr2;         /*   200     8 */
                        };                               /*   192    16 */
                };                                       /*   192    16 */
                uint16_t           size;                 /*   208     2 */
                uint16_t           idx;                  /*   210     2 */
                rte_graph_off_t    off;                  /*   212     4 */
                uint64_t           total_cycles;         /*   216     8 */
                uint64_t           total_calls;          /*   224     8 */
                uint64_t           total_objs;           /*   232     8 */
                union {
                        void * *   objs;                 /*   240     8 */
                        uint64_t   objs_u64;             /*   240     8 */
                };                                       /*   240     8 */
                union {
                        rte_node_process_t process;      /*   248     8 */
                        uint64_t   process_u64;          /*   248     8 */
                };                                       /*   248     8 */
        } __attribute__((__aligned__(64)))
__attribute__((__aligned__(64)));              /*   192    64 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        struct rte_node *          nodes[]
__attribute__((__aligned__(64))); /*   256     0 */

        /* size: 256, cachelines: 4, members: 12 */
        /* sum members: 250, holes: 2, sum holes: 6 */
        /* forced alignments: 2 */
} __attribute__((__aligned__(128)));

The introduced anonymous structure gets aligned on the minimum cache
line size (64 bytes): with this change, ctx[] move from offset 256, to
offset 192.
Similarly, nodes[] moves from offset 320 to offset 256.

As we discussed offlist, there are a few options to workaround this
issue (like moving nodes[] inside the anonymous struct though it still
results in an increased rte_node struct, or like adding an explicit
padding field right before the newly introduced anonymous struct,
...).


- Additionally, anonymous structures are not correctly handled with
libabigail 2.4 which is the version used in the CI.
At the moment, the ABI check in GHA and UNH will fail on x86 with:

1 function with some indirect sub-type change:

  [C] 'function bool
__rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*,
rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has
some indirect sub-type changes:
    parameter 1 of type 'rte_node*' has sub-type changes:
      in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:1:
        type size hasn't changed
        2 data member deletions:
          'union {void** objs; uint64_t objs_u64;}', at offset 1920 (in bits)
          'union {rte_node_process_t process; uint64_t process_u64;}',
at offset 1984 (in bits)
        no data member changes (2 filtered);

On this topic, we have to either put a suppression rule on the
rte_node structure, or bump the libabigail version in UNH, GHA, and
the maintainers build env (though the latter won't happen overnight,
and we are really close to rc1).



For those two reasons, it is better to revisit this patch and have it
ready for the next release.
While at it, it may be worth cleaning up the rte_node structure in
v24.11, if so, please announce in a deprecation notice for this
planned ABI breakage.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* [PATCH v3 0/2] power: introduce PM QoS interface
    2024-06-13 11:20  4% ` [PATCH v2 0/2] power: " Huisong Li
@ 2024-06-19  6:31  4% ` Huisong Li
  2024-06-19  6:31  5%   ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-06-19  6:59  0%   ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
  2024-06-27  6:00  4% ` [PATCH v4 " Huisong Li
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 200+ results
From: Huisong Li @ 2024-06-19  6:31 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, david.marchand, liuyonglong, lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  22 +++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  29 +++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  71 +++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 244 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
  2024-06-19  6:31  4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-06-19  6:31  5%   ` Huisong Li
  2024-06-19 15:32  0%     ` Thomas Monjalon
  2024-06-19  6:59  0%   ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
  1 sibling, 1 reply; 200+ results
From: Huisong Li @ 2024-06-19  6:31 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, david.marchand, liuyonglong, lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 doc/guides/prog_guide/power_man.rst    |  22 +++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  71 +++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 215 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..3ff46f06c1 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,28 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can effect the work
+CPU's idle state selection and just allow to enter the shallowest idle state
+if set to zero (strict resume latency) for this CPU.
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can obtain the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..7c0d36e389 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
 
   * Added SSE/NEON vector datapath.
 
+* **Introduce PM QoS interface.**
+
+  * Introduce PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..b131cf58e7
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[BUFSIZ] = {0};
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		sprintf(buf, "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		sprintf(buf, "%u", 0);
+	else
+		sprintf(buf, "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[BUFSIZ];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..2b25d0d4c1
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* RE: [PATCH v3 0/2] power: introduce PM QoS interface
  2024-06-19  6:31  4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
  2024-06-19  6:31  5%   ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
@ 2024-06-19  6:59  0%   ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-06-19  6:59 UTC (permalink / raw)
  To: Huisong Li, dev
  Cc: thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, david.marchand, liuyonglong

> From: Huisong Li [mailto:lihuisong@huawei.com]
> Sent: Wednesday, 19 June 2024 08.32
> 
> The deeper the idle state, the lower the power consumption, but the longer
> the resume time. Some service are delay sensitive and very except the low
> resume time, like interrupt packet receiving mode.
> 
> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> interface is used to set and get the resume latency limit on the cpuX for
> userspace. Please see the description in kernel document[1].
> Each cpuidle governor in Linux select which idle state to enter based on
> this CPU resume latency in their idle task.
> 
> The per-CPU PM QoS API can be used to control this CPU's idle state
> selection and limit just enter the shallowest idle state to low the delay
> after sleep by setting strict resume latency (zero value).
> 
> [1] https://www.kernel.org/doc/html/latest/admin-guide/abi-
> testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-
> resume-latency-us
> 
> ---
>  v3:
>   - add RTE_POWER_xxx prefix for some macro in header
>   - add the check for lcore_id with rte_lcore_is_enabled
>  v2:
>   - use PM QoS on CPU wide to replace the one on system wide

Series-acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[relevance 0%]

* [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries
    2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-06-19 15:01  3% ` David Marchand
  2024-06-19 15:01  6%   ` [PATCH v12 2/4] mbuf: remove marker fields David Marchand
  2 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-06-19 15:01 UTC (permalink / raw)
  To: dev; +Cc: roretzla

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v12:
  * rebased,
  * did some cosmetic changes,
  * and resending to double check CI (v11 had an issue in UNH Debian
    containers),

v11:
  * correct doxygen comment style for field documentation.

v10:
  * move removal notices in in release notes from 24.03 to 24.07.

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

-- 
David Marchand


Tyler Retzlaff (4):
  net/i40e: use mbuf prefetch helper
  mbuf: remove marker fields
  security: remove marker fields
  cryptodev: remove marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_07.rst  |   3 +
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   3 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 188 ++++++++++++------------
 lib/security/rte_security_driver.h      |   3 +-
 7 files changed, 112 insertions(+), 97 deletions(-)

-- 
2.45.1


^ permalink raw reply	[relevance 3%]

* [PATCH v12 2/4] mbuf: remove marker fields
  2024-06-19 15:01  3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
@ 2024-06-19 15:01  6%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-19 15:01 UTC (permalink / raw)
  To: dev; +Cc: roretzla, Morten Brørup, Stephen Hemminger, Thomas Monjalon

From: Tyler Retzlaff <roretzla@linux.microsoft.com>

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
Changes since v11:
- moved libabigail suppression,
- moved RN update to API change,
- updated one comment in rte_mbuf_core.h referring to cacheline0,
- removed (unrelated) doxygen updates,

---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_07.rst |   3 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 188 +++++++++++++------------
 4 files changed, 109 insertions(+), 92 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 32a2ea309e..96b16059a8 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,12 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
+
 [suppress_type]
 	name = rte_pipeline_table_entry
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index ccd0f8e598..7c88de381b 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -178,6 +178,9 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
+
 
 ABI Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b788..4c4722e002 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@ int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@ static inline void
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f580769cf..a0df265b5d 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -474,7 +472,7 @@ struct __rte_cache_aligned rte_mbuf {
 	 * This field is undefined if the build is configured to use only
 	 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
 	 * Force alignment to 8-bytes, so as to ensure we have the exact
-	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * layout for the first cache line for 32-bit and 64-bit. This makes
 	 * working on vector drivers easier.
 	 */
 	alignas(sizeof(rte_iova_t)) rte_iova_t buf_iova;
@@ -488,127 +486,137 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						uint8_t inner_esp_next_proto;
+						/**< ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						__extension__
+						struct {
+							uint8_t inner_l2_type:4;
+							/**< Inner L2 type. */
+							uint8_t inner_l3_type:4;
+							/**< Inner L3 type. */
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							uint32_t lo;
+							/**< Second 4 flexible bytes */
+						};
+						uint32_t hi;
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						uint16_t txq;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					uint32_t usr;
+					/**< User defined tags. See rte_distributor_process() */
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
-- 
2.45.1


^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
  2024-06-19  6:31  5%   ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
@ 2024-06-19 15:32  0%     ` Thomas Monjalon
  2024-06-20  2:32  0%       ` lihuisong (C)
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-06-19 15:32 UTC (permalink / raw)
  To: Huisong Li
  Cc: dev, mb, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, david.marchand, liuyonglong, lihuisong

19/06/2024 08:31, Huisong Li:
> --- /dev/null
> +++ b/lib/power/rte_power_qos.h
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#ifndef RTE_POWER_QOS_H
> +#define RTE_POWER_QOS_H
> +
> +#include <rte_compat.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file rte_power_qos.h
> + *
> + * PM QoS API.
> + *
> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle
> + * state selection in each cpuidle governor.
> + * Please see the PM QoS on CPU wide in the following link:
> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
> + *
> + * The deeper the idle state, the lower the power consumption, but the
> + * longer the resume time. Some service are delay sensitive and very except the
> + * low resume time, like interrupt packet receiving mode.
> + *
> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle
> + * state selection and limit just enter the shallowest idle state to low the
> + * delay after sleep by setting strict resume latency (zero value).
> + */
> +
> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))

stdint.h include is missing




^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 1/2] power: introduce PM QoS API on CPU wide
  2024-06-19 15:32  0%     ` Thomas Monjalon
@ 2024-06-20  2:32  0%       ` lihuisong (C)
  0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2024-06-20  2:32 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, mb, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, david.marchand, liuyonglong


在 2024/6/19 23:32, Thomas Monjalon 写道:
> 19/06/2024 08:31, Huisong Li:
>> --- /dev/null
>> +++ b/lib/power/rte_power_qos.h
>> @@ -0,0 +1,71 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2024 HiSilicon Limited
>> + */
>> +
>> +#ifndef RTE_POWER_QOS_H
>> +#define RTE_POWER_QOS_H
>> +
>> +#include <rte_compat.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * @file rte_power_qos.h
>> + *
>> + * PM QoS API.
>> + *
>> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle
>> + * state selection in each cpuidle governor.
>> + * Please see the PM QoS on CPU wide in the following link:
>> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
>> + *
>> + * The deeper the idle state, the lower the power consumption, but the
>> + * longer the resume time. Some service are delay sensitive and very except the
>> + * low resume time, like interrupt packet receiving mode.
>> + *
>> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle
>> + * state selection and limit just enter the shallowest idle state to low the
>> + * delay after sleep by setting strict resume latency (zero value).
>> + */
>> +
>> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
>> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
> stdint.h include is missing
Yes, it desn't satisfy self-contained header files.
will add it in next version, thanks.
>
>
>
> .

^ permalink raw reply	[relevance 0%]

* [PATCH v2 3/9] doc: reword design section in contributors guidelines
  @ 2024-06-21  2:32  6%   ` Nandini Persad
  0 siblings, 0 replies; 200+ results
From: Nandini Persad @ 2024-06-21  2:32 UTC (permalink / raw)
  To: dev

Minor editing was made for grammar and syntax of design section.

Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
---
 .mailmap                           |  1 +
 doc/guides/contributing/design.rst | 86 +++++++++++++++---------------
 doc/guides/linux_gsg/sys_reqs.rst  |  2 +-
 3 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/.mailmap b/.mailmap
index 66ebc20666..7d4929c5d1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1002,6 +1002,7 @@ Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
 Nalla Pradeep <pnalla@marvell.com>
 Na Na <nana.nn@alibaba-inc.com>
 Nan Chen <whutchennan@gmail.com>
+Nandini Persad <nandinipersad361@gmail.com>
 Nannan Lu <nannan.lu@intel.com>
 Nan Zhou <zhounan14@huawei.com>
 Narcisa Vasile <navasile@linux.microsoft.com> <navasile@microsoft.com> <narcisa.vasile@microsoft.com>
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index b724177ba1..3d1f5aeb91 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -1,6 +1,7 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright 2018 The DPDK contributors
 
+
 Design
 ======
 
@@ -8,22 +9,26 @@ Design
 Environment or Architecture-specific Sources
 --------------------------------------------
 
-In DPDK and DPDK applications, some code is specific to an architecture (i686, x86_64) or to an executive environment (freebsd or linux) and so on.
-As far as is possible, all such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+In DPDK and DPDK applications, some code is architecture-specific (i686, x86_64) or  environment-specific (FreeBsd or Linux, etc.).
+When feasible, such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+
+By convention, a file is specific if the directory is indicated. Otherwise, it is common.
 
-By convention, a file is common if it is not located in a directory indicating that it is specific.
-For instance, a file located in a subdir of "x86_64" directory is specific to this architecture.
+For example:
+
+A file located in a subdir of "x86_64" directory is specific to this architecture.
 A file located in a subdir of "linux" is specific to this execution environment.
 
 .. note::
 
    Code in DPDK libraries and applications should be generic.
-   The correct location for architecture or executive environment specific code is in the EAL.
+   The correct location for architecture or executive environment-specific code is in the EAL.
+
+When necessary, there are several ways to handle specific code:
 
-When absolutely necessary, there are several ways to handle specific code:
 
-* Use a ``#ifdef`` with a build definition macro in the C code.
-  This can be done when the differences are small and they can be embedded in the same C file:
+* When the differences are small and they can be embedded in the same C file, use a ``#ifdef`` with a build definition macro in the C code.
+
 
   .. code-block:: c
 
@@ -33,9 +38,9 @@ When absolutely necessary, there are several ways to handle specific code:
      titi();
      #endif
 
-* Use build definition macros and conditions in the Meson build file. This is done when the differences are more significant.
-  In this case, the code is split into two separate files that are architecture or environment specific.
-  This should only apply inside the EAL library.
+
+* When the differences are more significant, use build definition macros and conditions in the Meson build file. In this case, the code is split into two separate files that are architecture or environment specific. This should only apply inside the EAL library.
+
 
 Per Architecture Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +48,8 @@ Per Architecture Sources
 The following macro options can be used:
 
 * ``RTE_ARCH`` is a string that contains the name of the architecture.
-* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined only if we are building for those architectures.
+* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined when building for these architectures.
+
 
 Per Execution Environment Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -51,30 +57,22 @@ Per Execution Environment Sources
 The following macro options can be used:
 
 * ``RTE_EXEC_ENV`` is a string that contains the name of the executive environment.
-* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only if we are building for this execution environment.
+* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only when building for this execution environment.
+
 
 Mbuf features
 -------------
 
-The ``rte_mbuf`` structure must be kept small (128 bytes).
-
-In order to add new features without wasting buffer space for unused features,
-some fields and flags can be registered dynamically in a shared area.
-The "dynamic" mbuf area is the default choice for the new features.
+A designated area in mbuf stores "dynamically" registered fields and flags. It is the default choice for accommodating new features. The "dynamic" area consumes the remaining space in the mbuf, indicating that it's being efficiently utilized. However, the ``rte_mbuf`` structure must be kept small (128 bytes).
 
-The "dynamic" area is eating the remaining space in mbuf,
-and some existing "static" fields may need to become "dynamic".
-
-Adding a new static field or flag must be an exception matching many criteria
-like (non exhaustive): wide usage, performance, size.
+As more features are added, the space for existinG=g "static" fields (fields that are allocated statically) may need to be reconsidered and possibly converted to "dynamic" allocation. Adding a new static field or flag should be an exception. It must meet specific criteria including widespread usage, performance impact, and size considerations. Before adding a new static feature, it must be justified by its necessity and its impact on the system's efficiency.
 
 
 Runtime Information - Logging, Tracing and Telemetry
 ----------------------------------------------------
 
-It is often desirable to provide information to the end-user
-as to what is happening to the application at runtime.
-DPDK provides a number of built-in mechanisms to provide this introspection:
+The end user may inquire as to what is happening to the application at runtime.
+DPDK provides several built-in mechanisms to provide these insights:
 
 * :ref:`Logging <dynamic_logging>`
 * :doc:`Tracing <../prog_guide/trace_lib>`
@@ -82,11 +80,11 @@ DPDK provides a number of built-in mechanisms to provide this introspection:
 
 Each of these has its own strengths and suitabilities for use within DPDK components.
 
-Below are some guidelines for when each should be used:
+Here are guidelines for when each mechanism should be used:
 
 * For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
-  Depending on the severity of the issue, the appropriate log level, for example,
-  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+  For example, depending on the severity of the issue, the appropriate log level,
+  ``ERROR``, ``WARNING`` or ``NOTICE`` should be used.
 
 .. note::
 
@@ -96,24 +94,24 @@ Below are some guidelines for when each should be used:
 
 * For component initialization, or other cases where a path through the code
   is only likely to be taken once,
-  either *logging* at ``DEBUG`` level or *tracing* may be used, or potentially both.
+  either *logging* at ``DEBUG`` level or *tracing* may be used, or both.
   In the latter case, tracing can provide basic information as to the code path taken,
   with debug-level logging providing additional details on internal state,
-  not possible to emit via tracing.
+  which is not possible to emit via tracing.
 
 * For a component's data-path, where a path is to be taken multiple times within a short timeframe,
   *tracing* should be used.
   Since DPDK tracing uses `Common Trace Format <https://diamon.org/ctf/>`_ for its tracing logs,
   post-analysis can be done using a range of external tools.
 
-* For numerical or statistical data generated by a component, for example, per-packet statistics,
+* For numerical or statistical data generated by a component, such as per-packet statistics,
   *telemetry* should be used.
 
-* For any data where the data may need to be gathered at any point in the execution
-  to help assess the state of the application component,
-  for example, core configuration, device information, *telemetry* should be used.
+* For any data that may need to be gathered at any point during the execution
+  to help assess the state of the application component (for example, core configuration, device information) *telemetry* should be used.
   Telemetry callbacks should not modify any program state, but be "read-only".
 
+
 Many libraries also include a ``rte_<libname>_dump()`` function as part of their API,
 writing verbose internal details to a given file-handle.
 New libraries are encouraged to provide such functions where it makes sense to do so,
@@ -135,13 +133,12 @@ requirements for preventing ABI changes when implementing statistics.
 Mechanism to allow the application to turn library statistics on and off
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Having runtime support for enabling/disabling library statistics is recommended,
-as build-time options should be avoided. However, if build-time options are used,
-for example as in the table library, the options can be set using c_args.
-When this flag is set, all the counters supported by current library are
+Having runtime support for enabling/disabling library statistics is recommended
+as build-time options should be avoided. However, if build-time options are used, as in the table library, the options can be set using c_args.
+When this flag is set, all the counters supported by the current library are
 collected for all the instances of every object type provided by the library.
 When this flag is cleared, none of the counters supported by the current library
-are collected for any instance of any object type provided by the library:
+are collected for any instance of any object type provided by the library.
 
 
 Prevention of ABI changes due to library statistics support
@@ -165,8 +162,8 @@ Motivation to allow the application to turn library statistics on and off
 
 It is highly recommended that each library provides statistics counters to allow
 an application to monitor the library-level run-time events. Typical counters
-are: number of packets received/dropped/transmitted, number of buffers
-allocated/freed, number of occurrences for specific events, etc.
+are: the number of packets received/dropped/transmitted, the number of buffers
+allocated/freed, the number of occurrences for specific events, etc.
 
 However, the resources consumed for library-level statistics counter collection
 have to be spent out of the application budget and the counters collected by
@@ -198,6 +195,7 @@ applications:
   the application may decide to turn the collection of statistics counters off for
   Library X and on for Library Y.
 
+
 The statistics collection consumes a certain amount of CPU resources (cycles,
 cache bandwidth, memory bandwidth, etc) that depends on:
 
@@ -218,6 +216,7 @@ cache bandwidth, memory bandwidth, etc) that depends on:
   validated for header integrity, counting the number of bits set in a bitmask
   might be needed.
 
+
 PF and VF Considerations
 ------------------------
 
@@ -229,5 +228,6 @@ Developers should work with the Linux Kernel community to get the required
 functionality upstream. PF functionality should only be added to DPDK for
 testing and prototyping purposes while the kernel work is ongoing. It should
 also be marked with an "EXPERIMENTAL" tag. If the functionality isn't
-upstreamable then a case can be made to maintain the PF functionality in DPDK
+upstreamable, then a case can be made to maintain the PF functionality in DPDK
 without the EXPERIMENTAL tag.
+
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 13be715933..0569c5cae6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -99,7 +99,7 @@ e.g. :doc:`../nics/index`
 Running DPDK Applications
 -------------------------
 
-To run a DPDK application, some customization may be required on the target machine.
+To run a DPDK application, customization may be required on the target machine.
 
 System Software
 ~~~~~~~~~~~~~~~
-- 
2.34.1


^ permalink raw reply	[relevance 6%]

* [DPDK/core Bug 1471] rte_pktmbuf_free_bulk does not respect RTE_LIBRTE_MBUF_DEBUG
@ 2024-06-21 18:33  3% bugzilla
  0 siblings, 0 replies; 200+ results
From: bugzilla @ 2024-06-21 18:33 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1471

            Bug ID: 1471
           Summary: rte_pktmbuf_free_bulk does not respect
                    RTE_LIBRTE_MBUF_DEBUG
           Product: DPDK
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: core
          Assignee: dev@dpdk.org
          Reporter: mb@smartsharesystems.com
  Target Milestone: ---

rte_pktmbuf_free_bulk() calls __rte_mbuf_sanity_check(), which behaves
differently depending on RTE_LIBRTE_MBUF_DEBUG being defined or not.

Unfortunately, rte_pktmbuf_free_bulk() is not inline, but in the C file.
This means that the behavior of __rte_mbuf_sanity_check() within
rte_pktmbuf_free_bulk() is controlled by the RTE_LIBRTE_MBUF_DEBUG setting when
building the DPDK library, not when building the application.

rte_pktmbuf_free_bulk() should have been inline in the header file, so the
application developer can control its __rte_mbuf_sanity_check() behavior by
using RTE_LIBRTE_MBUF_DEBUG setting when building the application.

How to remove this function from the ABI and make it inline instead?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 3174 bytes --]

^ permalink raw reply	[relevance 3%]

* [PATCH v5 4/4] dts: add API doc generation
  @ 2024-06-24 13:26  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 13:26 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 147 insertions(+), 18 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
 # Copyright(c) 2019 Intel Corporation
 #
 
+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run
 
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 
 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )
 
 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')
 
 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'
 
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
 
 master_doc = 'index'
 
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd42025507 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+   the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v6 4/4] dts: add API doc generation
  @ 2024-06-24 13:46  2%   ` Juraj Linkeš
  2024-06-24 14:08  0%     ` Juraj Linkeš
  0 siblings, 1 reply; 200+ results
From: Juraj Linkeš @ 2024-06-24 13:46 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 147 insertions(+), 18 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
 # Copyright(c) 2019 Intel Corporation
 #
 
+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run
 
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 
 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )
 
 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')
 
 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'
 
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
 
 master_doc = 'index'
 
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd42025507 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+   the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v6 4/4] dts: add API doc generation
  2024-06-24 13:46  2%   ` [PATCH v6 4/4] dts: add API doc generation Juraj Linkeš
@ 2024-06-24 14:08  0%     ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 14:08 UTC (permalink / raw)
  To: thomas
  Cc: dev, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte

Hi Thomas,

I believe the only open question in this patch set is the linking of DTS 
API docs on the main doxygen page. I've left only the parts relevant to 
the question so that it's easier for us to address it.

> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index f9283154f8..cc214ede46 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -244,3 +244,6 @@ The public API headers are grouped by topics:
>     [experimental APIs](@ref rte_compat.h),
>     [ABI versioning](@ref rte_function_versioning.h),
>     [version](@ref rte_version.h)
> +
> +- **tests**:
> +  [**DTS**](@dts_api_main_page)

> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index a8823c046f..c94f02d411 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -124,6 +124,8 @@ SEARCHENGINE            = YES
>   SORT_MEMBER_DOCS        = NO
>   SOURCE_BROWSER          = YES
>   
> +ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
> +
>   EXAMPLE_PATH            = @TOPDIR@/examples
>   EXAMPLE_PATTERNS        = *.c
>   EXAMPLE_RECURSIVE       = YES

> diff --git a/doc/api/meson.build b/doc/api/meson.build
> index 5b50692df9..ffc75d7b5a 100644
> --- a/doc/api/meson.build
> +++ b/doc/api/meson.build

> @@ -32,14 +33,18 @@ example = custom_target('examples.dox',
>   # set up common Doxygen configuration
>   cdata = configuration_data()
>   cdata.set('VERSION', meson.project_version())
> -cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
> -cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
> +cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
> +cdata.set('OUTPUT', doc_api_build_dir)
>   cdata.set('TOPDIR', dpdk_source_root)
> -cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
> +cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))

These three changes are here only for context, they're not relevant to 
the linking question.

>   cdata.set('WARN_AS_ERROR', 'NO')
>   if get_option('werror')
>       cdata.set('WARN_AS_ERROR', 'YES')
>   endif
> +# A local reference must be relative to the main index.html page
> +# The path below can't be taken from the DTS meson file as that would
> +# require recursive subdir traversal (doc, dts, then doc again)
> +cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))

This is where the path is actually set.

^ permalink raw reply	[relevance 0%]

* [PATCH v7 4/4] dts: add API doc generation
  @ 2024-06-24 14:25  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-06-24 14:25 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py | 31 ++++++++++++++++++--------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 147 insertions(+), 18 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index da19e950c9..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,31 +3,44 @@
 # Copyright(c) 2019 Intel Corporation
 #
 
+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run
 
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 
 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )
 
 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')
 
 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'
 
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
 
 master_doc = 'index'
 
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings,
+   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev
  @ 2024-06-24 15:13  3% ` Stephen Hemminger
  2024-06-25 12:01  3%   ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-06-24 15:13 UTC (permalink / raw)
  To: Vladimir Ratnikov; +Cc: longli, dev

On Mon, 24 Jun 2024 11:04:15 +0000
Vladimir Ratnikov <vratnikov@netgate.com> wrote:

> diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
> index e2475a642d..6b010cbe41 100644
> --- a/drivers/bus/vmbus/bus_vmbus_driver.h
> +++ b/drivers/bus/vmbus/bus_vmbus_driver.h
> @@ -37,6 +37,7 @@ struct rte_vmbus_device {
>  	rte_uuid_t device_id;		       /**< VMBUS device id */
>  	rte_uuid_t class_id;		       /**< VMBUS device type */
>  	uint32_t relid;			       /**< id for primary */
> +	uint16_t device_order;		       /**< Device order after probing */
>  	uint8_t monitor_id;		       /**< monitor page */
>  	int uio_num;			       /**< UIO device number */
>  	uint32_t *int_page;		       /**< VMBUS interrupt page */
> diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c

Is this an ABI change?

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev
  2024-06-24 15:13  3% ` Stephen Hemminger
@ 2024-06-25 12:01  3%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-06-25 12:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vladimir Ratnikov, longli, dev

On Mon, Jun 24, 2024 at 5:14 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 24 Jun 2024 11:04:15 +0000
> Vladimir Ratnikov <vratnikov@netgate.com> wrote:
>
> > diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
> > index e2475a642d..6b010cbe41 100644
> > --- a/drivers/bus/vmbus/bus_vmbus_driver.h
> > +++ b/drivers/bus/vmbus/bus_vmbus_driver.h
> > @@ -37,6 +37,7 @@ struct rte_vmbus_device {
> >       rte_uuid_t device_id;                  /**< VMBUS device id */
> >       rte_uuid_t class_id;                   /**< VMBUS device type */
> >       uint32_t relid;                        /**< id for primary */
> > +     uint16_t device_order;                 /**< Device order after probing */
> >       uint8_t monitor_id;                    /**< monitor page */
> >       int uio_num;                           /**< UIO device number */
> >       uint32_t *int_page;                    /**< VMBUS interrupt page */
> > diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
>
> Is this an ABI change?

drivers/bus/vmbus/meson.build:driver_sdk_headers = files('bus_vmbus_driver.h')

Only drivers of this bus know the rte_vmbus_device object.
So this patch does not impact the public ABI.

Yet, I fail to see what this patch is trying to achieve.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v8 2/3] ethdev: add VXLAN last reserved field
  2024-06-12  1:25  0%       ` rongwei liu
@ 2024-06-25 14:46  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-06-25 14:46 UTC (permalink / raw)
  To: Ferruh Yigit, rongwei liu
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam, Suanming Mou,
	Andrew Rybchenko, Dariusz Sosnowski, Aman Singh, Yuying Zhang,
	jerin.jacob, bruce.richardson, david.marchand, ajit.khaparde

12/06/2024 03:25, rongwei liu:
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> > On 6/7/2024 3:02 PM, Rongwei Liu wrote:
> > > @@ -41,7 +41,10 @@ struct rte_vxlan_hdr {
> > >                       uint8_t    flags;    /**< Should be 8 (I flag). */
> > >                       uint8_t    rsvd0[3]; /**< Reserved. */
> > >                       uint8_t    vni[3];   /**< VXLAN identifier. */
> > > -                     uint8_t    rsvd1;    /**< Reserved. */
> > > +                     union {
> > > +                             uint8_t    rsvd1;        /**< Reserved. */
> > > +                             uint8_t    last_rsvd;    /**< Reserved. */
> > > +                     };
> > >
> > 
> > Is there a plan to remove 'rsvd1' in next ABI break release?
> > We can keep both, but I guess it is not logically necessary to keep it, to prevent
> > bloat by time, we can remove the old one.
> > If decided to remove, sending a 'deprecation.rst' update helps us to remember
> > doing it.
> > 
> I think it should. @NBU-Contact-Thomas Monjalon (EXTERNAL) @Andrew Rybchenko@Ori Kam what do you think?

From user perspective, there is no benefit in removing an aliased field,
except for simplicity.
The drawback is a potential API compatibility breakage.

We may mark it as deprecated in the comment and plan for removal in a long time, let's say 25.11?
Is there anyone against removing "rsvd1" in VXLAN header for compatibility purpose?



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5] graph: expose node context as pointers
  2024-06-18 12:33  4%   ` David Marchand
@ 2024-06-25 15:22  0%     ` Robin Jarry
  2024-06-26 11:30  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Robin Jarry @ 2024-06-25 15:22 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan,
	Tyler Retzlaff

Sad :(

> The introduced anonymous structure gets aligned on the minimum cache
> line size (64 bytes): with this change, ctx[] move from offset 256, to
> offset 192.
> Similarly, nodes[] moves from offset 320 to offset 256.
>
> As we discussed offlist, there are a few options to workaround this
> issue (like moving nodes[] inside the anonymous struct though it still
> results in an increased rte_node struct, or like adding an explicit
> padding field right before the newly introduced anonymous struct,
> ...).
[snip]
> For those two reasons, it is better to revisit this patch and have it
> ready for the next release.
> While at it, it may be worth cleaning up the rte_node structure in
> v24.11, if so, please announce in a deprecation notice for this
> planned ABI breakage.

Jerin, wouldn't it be better if we managed to fill in that 64 bytes 
hole?

I don't know what to announce precisely about the breakage nature.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5] graph: expose node context as pointers
  2024-06-25 15:22  0%     ` Robin Jarry
@ 2024-06-26 11:30  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2024-06-26 11:30 UTC (permalink / raw)
  To: Robin Jarry
  Cc: David Marchand, dev, Jerin Jacob, Kiran Kumar K,
	Nithin Dabilpuram, Zhirun Yan, Tyler Retzlaff

On Tue, Jun 25, 2024 at 9:02 PM Robin Jarry <rjarry@redhat.com> wrote:
>
> Sad :(
>
> > The introduced anonymous structure gets aligned on the minimum cache
> > line size (64 bytes): with this change, ctx[] move from offset 256, to
> > offset 192.
> > Similarly, nodes[] moves from offset 320 to offset 256.
> >
> > As we discussed offlist, there are a few options to workaround this
> > issue (like moving nodes[] inside the anonymous struct though it still
> > results in an increased rte_node struct, or like adding an explicit
> > padding field right before the newly introduced anonymous struct,
> > ...).
> [snip]
> > For those two reasons, it is better to revisit this patch and have it
> > ready for the next release.
> > While at it, it may be worth cleaning up the rte_node structure in
> > v24.11, if so, please announce in a deprecation notice for this
> > planned ABI breakage.
>
> Jerin, wouldn't it be better if we managed to fill in that 64 bytes
> hole?

It will be available only for 128B cache line system. So may not make sense.
I think, following change will resolve the issue in your patch.

From

 __extension__ struct __rte_cache_min_aligned {
 #define RTE_NODE_CTX_SZ 16

To

 __extension__ struct __rte_cache__aligned {
 #define RTE_NODE_CTX_SZ 16





>
> I don't know what to announce precisely about the breakage nature.
>

^ permalink raw reply	[relevance 0%]

* [PATCH v4 0/2] power: introduce PM QoS interface
    2024-06-13 11:20  4% ` [PATCH v2 0/2] power: " Huisong Li
  2024-06-19  6:31  4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-06-27  6:00  4% ` Huisong Li
  2024-06-27  6:00  5%   ` [PATCH v4 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-07-02  3:50  4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-06-27  6:00 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  28 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 247 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v4 1/2] power: introduce PM QoS API on CPU wide
  2024-06-27  6:00  4% ` [PATCH v4 " Huisong Li
@ 2024-06-27  6:00  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-06-27  6:00 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..4de96f60ac 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
 
   * Added SSE/NEON vector datapath.
 
+* **Introduce PM QoS interface.**
+
+  * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..b131cf58e7
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[BUFSIZ] = {0};
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		sprintf(buf, "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		sprintf(buf, "%u", 0);
+	else
+		sprintf(buf, "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[BUFSIZ];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH] bpf: don't verify classic bpfs
  @ 2024-06-27 15:36  3%     ` Thomas Monjalon
  2024-06-27 18:14  0%       ` Konstantin Ananyev
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-06-27 15:36 UTC (permalink / raw)
  To: Stephen Hemminger, Yoav Winstein, Konstantin Ananyev; +Cc: dev

16/05/2024 11:36, Konstantin Ananyev:
> 
> > On Sun, 12 May 2024 08:55:45 +0300
> > Yoav Winstein <yoav.w@claroty.com> wrote:
> > 
> > > When classic BPFs with lots of branching instructions are compiled,
> > > __rte_bpf_bpf_validate runs way too slow. A simple bpf such as:
> > > 'ether host a0:38:6d:af:17:eb or b3:a3:ff:b6:c1:ef or ...' 12 times
> > >
> > > results in ~1 minute of bpf validation.
> > > This patch makes __rte_bpf_bpf_validate be aware of bpf_prm originating
> > > from classic BPF, allowing to safely skip over the validation.
> > >
> > > Signed-off-by: Yoav Winstein <yoav.w@claroty.com>
> > > ---
> > 
> > No.
> > Wallpapering over a performance bug in the BPF library is not
> > the best way to handle this. Please analyze the problem in the BPF
> > library; it should be fixed there.
> 
> +1
> Blindly disabling verification for all cBPFs is the worst possible option here.
> We need at least try to understand what exactly causing such slowdown.

+1

You didn't mention it is also breaking ABI compatibility.




^ permalink raw reply	[relevance 3%]

* RE: [PATCH] bpf: don't verify classic bpfs
  2024-06-27 15:36  3%     ` Thomas Monjalon
@ 2024-06-27 18:14  0%       ` Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-06-27 18:14 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger, Yoav Winstein, Konstantin Ananyev; +Cc: dev


> > > > When classic BPFs with lots of branching instructions are compiled,
> > > > __rte_bpf_bpf_validate runs way too slow. A simple bpf such as:
> > > > 'ether host a0:38:6d:af:17:eb or b3:a3:ff:b6:c1:ef or ...' 12 times
> > > >
> > > > results in ~1 minute of bpf validation.
> > > > This patch makes __rte_bpf_bpf_validate be aware of bpf_prm originating
> > > > from classic BPF, allowing to safely skip over the validation.
> > > >
> > > > Signed-off-by: Yoav Winstein <yoav.w@claroty.com>
> > > > ---
> > >
> > > No.
> > > Wallpapering over a performance bug in the BPF library is not
> > > the best way to handle this. Please analyze the problem in the BPF
> > > library; it should be fixed there.
> >
> > +1
> > Blindly disabling verification for all cBPFs is the worst possible option here.
> > We need at least try to understand what exactly causing such slowdown.
> 
> +1
> 
> You didn't mention it is also breaking ABI compatibility.

Yep, it does, thanks Thomas for highlighting it.

Yoav,  can I ask submitter to check would:
https://patchwork.dpdk.org/project/dpdk/list/?series=32321
fix the problem you are facing?
I think that the root cause is the same.
 

^ permalink raw reply	[relevance 0%]

* Community CI Meeting Minutes - June 27, 2024
@ 2024-06-27 20:52  2% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-06-27 20:52 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

#####################################################################
Attendees
1. Patrick Robb
2. Paul Szczepanek
3. Luca Vizzarro
4. Nicholas Pratte
5. Aaron Conole
6. Dean Marx
7. Jeremy Spewock
8. Juraj Linkeš
9. Manit Mahajan
10. Tomas Durovec
11. Adam Hassick

#####################################################################
Minutes

=====================================================================
General Announcements
* DPDK Summit in Montreal will be September 24-25:
https://www.dpdk.org/event/dpdk-summit-2024/
   * CFP closes July 21
   * Tech board voted yesterday to allow remote presentations at
Montreal (with lower priority)
   * Luca will make a submission for a remote DTS talk
   * David commented last week stating that there could be a section
for how to setup DTS, run the hello world testsuite
* Nathan Southern set up a call with some folks from AWS next Monday
to discuss testing on cloud infrastructure. Email Nathan if you want
to join this call.
* David indicated there will be a vote over email for a DTS branch for
framework patches

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* David noted this week that the template engine is out of date
(UNH-IOL fork has some update from the past months). UNH now has a 60
day reminder for aggregation all commits made to our fork and
upstreaming.
* New Servers have arrived at UNH-IOL. Getting these mounted onto our
2nd DPDK Rack, setting up the associated infrastructure etc.
   * Setting up the UPS, tor switch, etc for DPDK rack 2.
* Pending:
   * Pending emails are going out, but the checks are not being
written to the API
   * Emailed Ali - will have to debug with him
   * We only are running this for ABI testing right now, but as soon
as the behavior with the PW API looks good, we can turn this on for
all the other labels (the PR for pending for all testing is ready)
* Depends-on support: Adam has submitted a patchseries to the PW
project which adds the changes to the Django models. Is under review.
   * Github PR: https://github.com/getpatchwork/patchwork/pull/590
   * Has put together the corresponding changes to git-pw (client side)
      * Some overlap between the PW server and git-pw client - the pw
server maintainer is aware of this feature being added for git-pw
which will pair up with the dashboard updates
* SPDK: Submitted a patch fixing a malloc error which affected Fedora
40. This is now merged, so we added Fedora 40 coverage in our lab.
* We increased the retest limit per patchseries to 3 (was previously
1) due to a submitter who needed to retest multiple sets of contexts.

---------------------------------------------------------------------
Intel Lab
* None

---------------------------------------------------------------------
Github Actions
* None, the Robot is running smoothly.
* There was a GitHub outage itself a few weeks ago, but anyone who was
affected would have been able to request a retest.

---------------------------------------------------------------------
Loongarch Lab
* None

=====================================================================
DTS Improvements & Test Development
* Jumboframes testsuite: MTU behavior on different NIC drivers.
   * Within each driver, there are variables set for taking off
ethernet overhead when setting max packet length.
      * MLNX subtracts 18 bytes
      * Intel/Broadcom subtract 26 bytes.
         * But from testing it appears that you can only send packets
with MTU + 22 packets, not 26?
      * These variables are not common across drivers… so basically
MTU as defined by different drivers is not the same
      * When we build scapy packets, the ethernet overhead is 14 bytes
(source mac address, destination mac, error correction), so we can
actually increase the l3 packet above the given MTU and still send
packets
   * Juraj: Important to make sure we are running from the latest
firmware/drivers on each device
   * Firmware driver versions for devices are published per DPDK release
      * Patrick Robbshould set up a 4 month reminder (during beginning
of dpdk release cycle) to update all firmware to whatever was
published as being supported for the release which just came out - use
this version for all testing for the upcoming release
* Mac Filter Testsuite: Submitted, getting reviews on the mailing
list. Nick will respond to Jeremy’s comments today and submit a new
version.
* VLAN Filter: Bugzilla ticket is submitted for the VLAN filtering
bug. David requested some verbose logs, so Dean redid the test and
attached those to the ticket.
* Queue Start/Stop and Dynamic Queue:
   * Show port into exposes a capability for whether you can stop or
start the queue.
* There is another bugzilla ticket out for –max-packet-len. It does
not update the MTU if you are using a kernel driver.
   * https://bugs.dpdk.org/show_bug.cgi?id=1470
   * Need to double check this from the Jumboframes testsuite
* Paul: but essentially we're moving on from vff to l2fwd due to vff
requiring qemu, otherwise all normal, blocklist submitted
   * UNH people need to test blocklist on our hardware
* Luca will be on vacation for the next two weeks
* Juraj will be on vacation next week.
* Capabilities patch: Working on an updated version, it might not be
ready before Juraj goes on vacation next week.
   * Adds the conditional capabilities, for cases where some staging
is needed in order to report the capability. This is the case with
scatter (the capability will only report for some devices if MTU is
increased from the standard size)
   * After the capability setup function runs and then the capability
checked, everything is cleaned up and the previous state restored.
      * For the capability, use a decorated to associate a testpmd
function which will do the pre and post configuration for the
capability check
   * Looking to add support for all the capabilities testpmd exposes
* XLM-RPC server replacement: new version submitted and awaiting reviews
   * Extended usage of kwargs in the new version makes the code a
little more confusing. Right now the arguments are of type any, as
opposed to processing the output via a typed dict.

=====================================================================
Any other business
* Patrick Robbshould check with folks over email whether it would be
okay to reschedule the DTS calls from Wednesdays to Thursdays (same
time as CI calls, on the off weeks)
* Next Meeting: July 11, 2024

^ permalink raw reply	[relevance 2%]

* [PATCH v5 0/2] power: introduce PM QoS interface
                     ` (2 preceding siblings ...)
  2024-06-27  6:00  4% ` [PATCH v4 " Huisong Li
@ 2024-07-02  3:50  4% ` Huisong Li
  2024-07-02  3:50  5%   ` [PATCH v5 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-07-09  2:29  4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-02  3:50 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v5:
  - use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  28 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 247 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v5 1/2] power: introduce PM QoS API on CPU wide
  2024-07-02  3:50  4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-02  3:50  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-02  3:50 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..4de96f60ac 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@ New Features
 
   * Added SSE/NEON vector datapath.
 
+* **Introduce PM QoS interface.**
+
+  * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[LINE_MAX];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[LINE_MAX];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH v5] bitmap: add scan from offset function
  @ 2024-07-03 13:42  0%     ` Volodymyr Fialko
  0 siblings, 0 replies; 200+ results
From: Volodymyr Fialko @ 2024-07-03 13:42 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, cristian.dumitrescu, Jerin Jacob, Anoob Joseph

> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, July 3, 2024 2:50 PM
> To: Volodymyr Fialko
> Cc: dev@dpdk.org; cristian.dumitrescu@intel.com; Jerin Jacob; Anoob Joseph
> Subject: Re: [PATCH v5] bitmap: add scan from offset function
>
> 03/07/2023 14:39, Volodymyr Fialko:
> > Currently, in the case when we search for a bit set after a particular
> > value, the bitmap has to be scanned from the beginning and
> > rte_bitmap_scan() has to be called multiple times until we hit the value.
> >
> > Add a new rte_bitmap_scan_from_offset() function to initialize scan
> > state at the given offset and perform scan, this will allow getting
> > the next set bit after certain offset within one scan call.
> >
> > Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
> > ---
> > v2:
> >  - added rte_bitmap_scan_from_offset
> > v3:
> >  - added note for internal use only for init_at function
> > v4:
> >  - marked init_at function as __rte_internal
> > v5:
> >  - removed __rte_internal due to build errors
> 
> What was the build error?
> 
> You should not add an internal function in the public header file.
> At least, it should be experimental.
> 

From our discussion in previous versions(V3, V4), It looks like we agreed to
remove both markers.

> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 3, 2023 2:17 PM
> To: Dumitrescu, Cristian; Volodymyr Fialko
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran; Anoob Joseph
> Subject: Re: [PATCH v3] bitmap: add scan from offset function
> 
> > ----------------------------------------------------------------------
> > 03/07/2023 12:56, Volodymyr Fialko:
> > > Since it's header-only library, there is issue with using __rte_intenal (appeared in v4).
> > 
> > What is the issue?
> 
> From V4 ci build failure(http://mails.dpdk.org/archives/test-report/2023-July/421235.html):
> 	In file included from ../examples/ipsec-secgw/event_helper.c:6:
> 	../lib/eal/include/rte_bitmap.h:645:2: error: Symbol is not public ABI
> 	        __rte_bitmap_scan_init_at(bmp, offset);
>  	       ^
> 	../lib/eal/include/rte_bitmap.h:150:1: note: from 'diagnose_if' attribute on '__rte_bitmap_scan_init_at':
> 	__rte_internal
> 	^~~~~~~~~~~~~~
> 	../lib/eal/include/rte_compat.h:42:16: note: expanded from macro '__rte_internal'	
> 	__attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
>               		^           ~
> 	1 error generated.

> OK I see.
> So we should give up with __rte_internal for inline functions.
> As it is not supposed to be exposed to the applications,
> I think we can skip the __rte_experimental flag.

/Volodymyr

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup
  @ 2024-07-04 20:31  3%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-07-04 20:31 UTC (permalink / raw)
  To: Yoan Picchi
  Cc: Thomas Monjalon, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, dev, nd, Ruifeng Wang, Nathan Brown

Hello Yoan,

On Wed, Jul 3, 2024 at 7:13 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> Current hitmask includes padding due to Intel's SIMD
> implementation detail. This patch allows non Intel SIMD
> implementations to benefit from a dense hitmask.
> In addition, the new dense hitmask interweave the primary
> and secondary matches which allow a better cache usage and
> enable future improvements for the SIMD implementations
> The default non SIMD path now use this dense mask.
>
> Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Nathan Brown <nathan.brown@arm.com>

This patch does too many things at the same time.
There is code movement and behavior modifications all mixed in.

As there was still no review from the lib maintainer... I am going a
bit more in depth this time.
Please split this patch to make it less hard to understand.

I can see the need for at least one patch for isolating the change on
sig_cmp_fn from the exposed API, then one patch for moving the code to
per arch headers with *no behavior change*, and one patch for
introducing/switching to "dense hitmask".

More comments below.


> ---
>  .mailmap                                  |   1 +
>  lib/hash/compare_signatures_arm_pvt.h     |  60 +++++++
>  lib/hash/compare_signatures_generic_pvt.h |  37 +++++
>  lib/hash/compare_signatures_x86_pvt.h     |  49 ++++++
>  lib/hash/hash_sig_cmp_func_pvt.h          |  20 +++
>  lib/hash/rte_cuckoo_hash.c                | 190 +++++++++++-----------
>  lib/hash/rte_cuckoo_hash.h                |  10 +-
>  7 files changed, 267 insertions(+), 100 deletions(-)
>  create mode 100644 lib/hash/compare_signatures_arm_pvt.h
>  create mode 100644 lib/hash/compare_signatures_generic_pvt.h
>  create mode 100644 lib/hash/compare_signatures_x86_pvt.h
>  create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h
>
> diff --git a/.mailmap b/.mailmap
> index f76037213d..ec525981fe 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1661,6 +1661,7 @@ Yixue Wang <yixue.wang@intel.com>
>  Yi Yang <yangyi01@inspur.com> <yi.y.yang@intel.com>
>  Yi Zhang <zhang.yi75@zte.com.cn>
>  Yoann Desmouceaux <ydesmouc@cisco.com>
> +Yoan Picchi <yoan.picchi@arm.com>
>  Yogesh Jangra <yogesh.jangra@intel.com>
>  Yogev Chaimovich <yogev@cgstowernetworks.com>
>  Yongjie Gu <yongjiex.gu@intel.com>
> diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_signatures_arm_pvt.h
> new file mode 100644
> index 0000000000..e83bae9912
> --- /dev/null
> +++ b/lib/hash/compare_signatures_arm_pvt.h

I guess pvt stands for private.
No need for such suffix, this header won't be exported in any case.


> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2018-2024 Arm Limited
> + */
> +
> +/*
> + * Arm's version uses a densely packed hitmask buffer:
> + * Every bit is in use.
> + */

Please put a header guard.

#ifndef <UPPERCASE_HEADER_NAME>_H
#define <UPPERCASE_HEADER_NAME>_H

> +
> +#include <inttypes.h>
> +#include <rte_common.h>
> +#include <rte_vect.h>
> +
> +#include "rte_cuckoo_hash.h"
> +#include "hash_sig_cmp_func_pvt.h"
> +
> +#define DENSE_HASH_BULK_LOOKUP 1
> +
> +static inline void
> +compare_signatures_dense(uint16_t *hitmask_buffer,
> +                       const uint16_t *prim_bucket_sigs,
> +                       const uint16_t *sec_bucket_sigs,
> +                       uint16_t sig,
> +                       enum rte_hash_sig_compare_function sig_cmp_fn)
> +{
> +
> +       static_assert(sizeof(*hitmask_buffer) >= 2 * (RTE_HASH_BUCKET_ENTRIES / 8),
> +               "hitmask_buffer must be wide enough to fit a dense hitmask");
> +
> +       /* For match mask every bits indicates the match */
> +       switch (sig_cmp_fn) {
> +#if RTE_HASH_BUCKET_ENTRIES <= 8
> +       case RTE_HASH_COMPARE_NEON: {
> +               uint16x8_t vmat, vsig, x;
> +               int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
> +               uint16_t low, high;
> +
> +               vsig = vld1q_dup_u16((uint16_t const *)&sig);
> +               /* Compare all signatures in the primary bucket */
> +               vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)prim_bucket_sigs));
> +               x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> +               low = (uint16_t)(vaddvq_u16(x));
> +               /* Compare all signatures in the secondary bucket */
> +               vmat = vceqq_u16(vsig, vld1q_u16((uint16_t const *)sec_bucket_sigs));
> +               x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> +               high = (uint16_t)(vaddvq_u16(x));
> +               *hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
> +
> +               }
> +               break;
> +#endif
> +       default:
> +               for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +                       *hitmask_buffer |= (sig == prim_bucket_sigs[i]) << i;
> +                       *hitmask_buffer |=
> +                               ((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES;
> +               }
> +       }
> +}

IIRC, this code is copied in all three headers.
It is a common scalar version, so the ARM code could simply call the
"generic" implementation rather than copy/paste.

[snip]

> diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_signatures_x86_pvt.h
> new file mode 100644
> index 0000000000..932912ba19
> --- /dev/null
> +++ b/lib/hash/compare_signatures_x86_pvt.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2018-2024 Arm Limited
> + */
> +
> +/*
> + * x86's version uses a sparsely packed hitmask buffer:
> + * Every other bit is padding.
> + */
> +
> +#include <inttypes.h>
> +#include <rte_common.h>
> +#include <rte_vect.h>
> +
> +#include "rte_cuckoo_hash.h"
> +#include "hash_sig_cmp_func_pvt.h"
> +
> +#define DENSE_HASH_BULK_LOOKUP 0
> +
> +static inline void
> +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> +                       const struct rte_hash_bucket *prim_bkt,
> +                       const struct rte_hash_bucket *sec_bkt,
> +                       uint16_t sig,
> +                       enum rte_hash_sig_compare_function sig_cmp_fn)
> +{
> +       /* For match mask the first bit of every two bits indicates the match */
> +       switch (sig_cmp_fn) {
> +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8

The check on RTE_HASH_BUCKET_ENTRIES <= 8 seems new.
It was not present in the previous implementation for SSE2, and this
difference is not explained.


> +       case RTE_HASH_COMPARE_SSE:
> +               /* Compare all signatures in the bucket */
> +               *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
> +                       (__m128i const *)prim_bkt->sig_current), _mm_set1_epi16(sig)));
> +               /* Extract the even-index bits only */
> +               *prim_hash_matches &= 0x5555;
> +               /* Compare all signatures in the bucket */
> +               *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(_mm_load_si128(
> +                       (__m128i const *)sec_bkt->sig_current), _mm_set1_epi16(sig)));
> +               /* Extract the even-index bits only */
> +               *sec_hash_matches &= 0x5555;
> +               break;
> +#endif /* defined(__SSE2__) */
> +       default:
> +               for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +                       *prim_hash_matches |= (sig == prim_bkt->sig_current[i]) << (i << 1);
> +                       *sec_hash_matches |= (sig == sec_bkt->sig_current[i]) << (i << 1);
> +               }
> +       }
> +}
> diff --git a/lib/hash/hash_sig_cmp_func_pvt.h b/lib/hash/hash_sig_cmp_func_pvt.h
> new file mode 100644
> index 0000000000..d8d2fbffaf
> --- /dev/null
> +++ b/lib/hash/hash_sig_cmp_func_pvt.h

Please rename as compare_signatures.h or maybe a simpler option is to
move this enum declaration in rte_cuckoo_hash.c before including the
per arch headers.

> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef _SIG_CMP_FUNC_H_
> +#define _SIG_CMP_FUNC_H_

If keeping a header, this guard must reflect the file name.

> +
> +/** Enum used to select the implementation of the signature comparison function to use

/* is enough, doxygen only parses public headers.


> + * eg: A system supporting SVE might want to use a NEON implementation.
> + * Those may change and are for internal use only
> + */
> +enum rte_hash_sig_compare_function {
> +       RTE_HASH_COMPARE_SCALAR = 0,
> +       RTE_HASH_COMPARE_SSE,
> +       RTE_HASH_COMPARE_NEON,
> +       RTE_HASH_COMPARE_SVE,
> +       RTE_HASH_COMPARE_NUM
> +};
> +
> +#endif

[snip]

> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index a528f1d1a0..26a992419a 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -134,14 +134,6 @@ struct rte_hash_key {
>         char key[0];
>  };
>
> -/* All different signature compare functions */
> -enum rte_hash_sig_compare_function {
> -       RTE_HASH_COMPARE_SCALAR = 0,
> -       RTE_HASH_COMPARE_SSE,
> -       RTE_HASH_COMPARE_NEON,
> -       RTE_HASH_COMPARE_NUM
> -};
> -
>  /** Bucket structure */
>  struct __rte_cache_aligned rte_hash_bucket {
>         uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> @@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
>         /**< Custom function used to compare keys. */
>         enum cmp_jump_table_case cmp_jump_table_idx;
>         /**< Indicates which compare function to use. */
> -       enum rte_hash_sig_compare_function sig_cmp_fn;
> +       unsigned int sig_cmp_fn;

From an ABI perspective, it looks ok.
We may be breaking users that would inspect this public object, but I
think it is ok.

In any case, put this change in a separate patch so it is more visible.


>         /**< Indicates which signature compare function to use. */
>         uint32_t bucket_bitmask;
>         /**< Bitmask for getting bucket index from hash signature. */
> --
> 2.25.1
>


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* [PATCH v6] graph: expose node context as pointers
@ 2024-07-05 14:52  4% Robin Jarry
  2024-07-12 11:39  0% ` [EXTERNAL] " Kiran Kumar Kokkilagadda
  0 siblings, 1 reply; 200+ results
From: Robin Jarry @ 2024-07-05 14:52 UTC (permalink / raw)
  To: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan

In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. And without
intermediate structures, casting to opaque pointers is hard without
violating strict aliasing rules.

Add an unnamed union to allow storing opaque pointers in the node
context. Unfortunately, aligning an unnamed union that contains an array
produces inconsistent results between C and C++. To preserve ABI/API
compatibility in both C and C++, move all fast-path area fields into an
unnamed struct which is itself cache aligned. Use __rte_cache_aligned to
preserve existing alignment on architectures where cache lines are 128
bytes.

Add a static assert to ensure that the fast path area does not grow
beyond a 64 bytes cache line.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
---

Notes:
    v6:
    
    * Fix ABI breakage on arm64 (and all platforms that have RTE_CACHE_LINE_SIZE=128).
    * This patch will cause CI failures without libabigail 2.5. See this commit
      https://sourceware.org/git/?p=libabigail.git;a=commitdiff;h=f821c2be3fff2047ef8fc436f6f02301812d166f
      for more details.
    
    v5:
    
    * Helper functions to hide casting proved to be harder than expected.
      Naive casting may even be impossible without breaking strict aliasing
      rules. The only other option would be to use explicit memcpy calls.
    * Unnamed union tentative again. As suggested by Tyler (thank you!),
      using an intermediate unnamed struct to carry the alignment produces
      consistent ABI in C and C++.
    * Also, Tyler (thank you!) suggested that the fast path area alignment
      size may be incorrect for architectures where the cache line is not 64
      bytes. There will be a 64 bytes hole in the structure at the end of
      the unnamed struct before the zero length next nodes array. Use
      __rte_cache_min_aligned to preserve existing alignment.
    
    v4:
    
    * Replaced the unnamed union with helper inline functions.
    
    v3:
    
    * Added __extension__ to the unnamed struct inside the union.
    * Fixed C++ header checks.
    * Replaced alignas() with an explicit static_assert.

 lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..8d8956fdddda 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -12,7 +12,9 @@
  * process, enqueue and move streams of objects to the next nodes.
  */
 
+#include <assert.h>
 #include <stdalign.h>
+#include <stddef.h>
 
 #include <rte_common.h>
 #include <rte_cycles.h>
@@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
 		} dispatch;
 	};
 	/* Fast path area  */
+	__extension__ struct __rte_cache_aligned {
 #define RTE_NODE_CTX_SZ 16
-	alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
-	uint16_t size;		/**< Total number of objects available. */
-	uint16_t idx;		/**< Number of objects used. */
-	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
-	uint64_t total_cycles;	/**< Cycles spent in this node. */
-	uint64_t total_calls;	/**< Calls done to this node. */
-	uint64_t total_objs;	/**< Objects processed by this node. */
+		union {
+			uint8_t ctx[RTE_NODE_CTX_SZ];
+			__extension__ struct {
+				void *ctx_ptr;
+				void *ctx_ptr2;
+			};
+		}; /**< Node Context. */
+		uint16_t size;		/**< Total number of objects available. */
+		uint16_t idx;		/**< Number of objects used. */
+		rte_graph_off_t off;	/**< Offset of node in the graph reel. */
+		uint64_t total_cycles;	/**< Cycles spent in this node. */
+		uint64_t total_calls;	/**< Calls done to this node. */
+		uint64_t total_objs;	/**< Objects processed by this node. */
 		union {
 			void **objs;	   /**< Array of object pointers. */
 			uint64_t objs_u64;
@@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
 			rte_node_process_t process; /**< Process function. */
 			uint64_t process_u64;
 		};
-	alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
+		alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
+	};
 };
 
+static_assert(offsetof(struct rte_node, nodes) - offsetof(struct rte_node, ctx)
+	== RTE_CACHE_LINE_MIN_SIZE, "rte_node fast path area must fit in 64 bytes");
+
 /**
  * @internal
  *
-- 
2.45.2


^ permalink raw reply	[relevance 4%]

* Re: [PATCH 1/1] net/ena: restructure the llq policy user setting
  2024-06-06 13:33  3% ` [PATCH 1/1] net/ena: restructure the llq policy user setting shaibran
@ 2024-07-05 17:32  4%   ` Ferruh Yigit
  2024-07-06  4:59  4%     ` Brandes, Shai
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-07-05 17:32 UTC (permalink / raw)
  To: shaibran; +Cc: dev

On 6/6/2024 2:33 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
> 
> Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
> devargs with a new shared devarg named `llq_policy` that
> implements the same logic and accepts the following values:
> 0 - Disable LLQ.
>     Use with extreme caution as it leads to a huge performance
>     degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
>     Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
>     Required for packets with header that exceed 96 bytes on
>     AWS instances prior to 5th generation.
> 
> Signed-off-by: Shai Brandes <shaibran@amazon.com>
> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
>

Hi Shai,

This patch changes device parameters and impacts end user.
Although this is not part of ABI policy, and we don't have an explicit
policy around it, but since it may impact end user experience, would you
be OK to postpone this patch to v24.11 release, where ABI break is planned?




^ permalink raw reply	[relevance 4%]

* [PATCH v11 1/7] hash: make compare signature function enum private
  2024-07-05 17:45  3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-05 17:45  3%   ` Yoan Picchi
  0 siblings, 0 replies; 200+ results
From: Yoan Picchi @ 2024-07-05 17:45 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin
  Cc: dev, nd, Yoan Picchi

enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 10 ++++++++++
 lib/hash/rte_cuckoo_hash.h | 10 +---------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
 
 #include "rte_cuckoo_hash.h"
 
+/* Enum used to select the implementation of the signature comparison function to use
+ * eg: A system supporting SVE might want to use a NEON or scalar implementation.
+ */
+enum rte_hash_sig_compare_function {
+	RTE_HASH_COMPARE_SCALAR = 0,
+	RTE_HASH_COMPARE_SSE,
+	RTE_HASH_COMPARE_NEON,
+	RTE_HASH_COMPARE_NUM
+};
+
 /* Mask of all flags supported by this version */
 #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
 				   RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
 	char key[0];
 };
 
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
-	RTE_HASH_COMPARE_SCALAR = 0,
-	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_NEON,
-	RTE_HASH_COMPARE_NUM
-};
-
 /** Bucket structure */
 struct __rte_cache_aligned rte_hash_bucket {
 	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
 	/**< Custom function used to compare keys. */
 	enum cmp_jump_table_case cmp_jump_table_idx;
 	/**< Indicates which compare function to use. */
-	enum rte_hash_sig_compare_function sig_cmp_fn;
+	unsigned int sig_cmp_fn;
 	/**< Indicates which signature compare function to use. */
 	uint32_t bucket_bitmask;
 	/**< Bitmask for getting bucket index from hash signature. */
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v11 0/7] hash: add SVE support for bulk key lookup
      @ 2024-07-05 17:45  3% ` Yoan Picchi
  2024-07-05 17:45  3%   ` [PATCH v11 1/7] hash: make compare signature function enum private Yoan Picchi
  2024-07-08 12:14  3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
  3 siblings, 1 reply; 200+ results
From: Yoan Picchi @ 2024-07-05 17:45 UTC (permalink / raw)
  Cc: dev, nd, Yoan Picchi

This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
    3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
    3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

V9->V10:
  Fix more formating and indentation
  Move the new compare signature file directly in hash instead of being
    in a new subdir
  Re-order includes
  Remove duplicated static check
  Move rte_hash_sig_compare_function's definition into a private header

V10->V11:
  Split the "pack the hitmask" commit into four commits:
    Move the compare function enum out of the ABI
    Move the compare function implementations into arch-specific files
    Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
      in the future
    Implement the dense hitmask
  Add missing header guards
  Move compare function enum into cuckoo_hash.c instead of its own header.

Yoan Picchi (7):
  hash: make compare signature function enum private
  hash: split compare signature into arch-specific files
  hash: add a check on hash entry max size
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap                                  |   2 +
 app/test/test_hash.c                      |  99 ++++++++---
 lib/hash/compare_signatures_arm_pvt.h     | 121 +++++++++++++
 lib/hash/compare_signatures_generic_pvt.h |  40 +++++
 lib/hash/compare_signatures_x86_pvt.h     |  55 ++++++
 lib/hash/rte_cuckoo_hash.c                | 207 ++++++++++++----------
 lib/hash/rte_cuckoo_hash.h                |  10 +-
 7 files changed, 410 insertions(+), 124 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm_pvt.h
 create mode 100644 lib/hash/compare_signatures_generic_pvt.h
 create mode 100644 lib/hash/compare_signatures_x86_pvt.h

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* RE: [PATCH 1/1] net/ena: restructure the llq policy user setting
  2024-07-05 17:32  4%   ` Ferruh Yigit
@ 2024-07-06  4:59  4%     ` Brandes, Shai
  0 siblings, 0 replies; 200+ results
From: Brandes, Shai @ 2024-07-06  4:59 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

Sure, thanks!

בתאריך 5 ביולי 2024 20:32,‏ Ferruh Yigit <ferruh.yigit@amd.com> כתב:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



On 6/6/2024 2:33 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
>
> Replaced `enable_llq`, `normal_llq_hdr` and `large_llq_hdr`
> devargs with a new shared devarg named `llq_policy` that
> implements the same logic and accepts the following values:
> 0 - Disable LLQ.
>     Use with extreme caution as it leads to a huge performance
>     degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
>     Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
>     Required for packets with header that exceed 96 bytes on
>     AWS instances prior to 5th generation.
>
> Signed-off-by: Shai Brandes <shaibran@amazon.com>
> Reviewed-by: Amit Bernstein <amitbern@amazon.com>
>

Hi Shai,

This patch changes device parameters and impacts end user.
Although this is not part of ABI policy, and we don't have an explicit
policy around it, but since it may impact end user experience, would you
be OK to postpone this patch to v24.11 release, where ABI break is planned?




[-- Attachment #2: Type: text/html, Size: 2262 bytes --]

^ permalink raw reply	[relevance 4%]

* [PATCH] net/mlx5: fix compilation warning in GCC-9.1
@ 2024-07-07  9:57  4% Gregory Etelson
  2024-07-18  7:24  4% ` Raslan Darawsheh
  0 siblings, 1 reply; 200+ results
From: Gregory Etelson @ 2024-07-07  9:57 UTC (permalink / raw)
  To: dev
  Cc: getelson, mkashani, rasland, stable, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad

GCC has introduced a bugfix in 9.1 that changed GCC ABI in ARM setups:
https://gcc.gnu.org/gcc-9/changes.html
```
On Arm targets (arm*-*-*), a bug in the implementation of the
procedure call standard (AAPCS) in the GCC 6, 7 and 8 releases
has been fixed: a structure containing a bit-field based on a 64-bit
integral type and where no other element in a structure required
64-bit alignment could be passed incorrectly to functions.
This is an ABI change. If the option -Wpsabi is enabled
(on by default) the compiler will emit a diagnostic note for code
that might be affected.
```

The patch fixes PMD compilation in the INTEGRITY flow item.

Fixes: 23b0a8b298b1 ("net/mlx5: fix integrity item validation and translation")

Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 8a0d58cb05..89057edbcf 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7396,11 +7396,13 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 }
 
 static int
-validate_integrity_bits(const struct rte_flow_item_integrity *mask,
+validate_integrity_bits(const void *arg,
 			int64_t pattern_flags, uint64_t l3_flags,
 			uint64_t l4_flags, uint64_t ip4_flag,
 			struct rte_flow_error *error)
 {
+	const struct rte_flow_item_integrity *mask = arg;
+
 	if (mask->l3_ok && !(pattern_flags & l3_flags))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ITEM,
-- 
2.43.0


^ permalink raw reply	[relevance 4%]

* [PATCH v12 0/7] hash: add SVE support for bulk key lookup
                     ` (2 preceding siblings ...)
  2024-07-05 17:45  3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-08 12:14  3% ` Yoan Picchi
  2024-07-08 12:14  3%   ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
  2024-07-09  4:48  0%   ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
  3 siblings, 2 replies; 200+ results
From: Yoan Picchi @ 2024-07-08 12:14 UTC (permalink / raw)
  Cc: dev, nd, Yoan Picchi

This patchset adds SVE support for the signature comparison in the cuckoo
hash lookup and improves the existing NEON implementation. These
optimizations required changes to the data format and signature of the
relevant functions to support dense hitmasks (no padding) and having the
primary and secondary hitmasks interleaved instead of being in their own
array each.

Benchmarking the cuckoo hash perf test, I observed this effect on speed:
  There are no significant changes on Intel (ran on Sapphire Rapids)
  Neon is up to 7-10% faster (ran on ampere altra)
  128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
    3 cloud instance)
  256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
    3 cloud instance)

V2->V3:
  Remove a redundant if in the test
  Change a couple int to uint16_t in compare_signatures_dense
  Several codding-style fix

V3->V4:
  Rebase

V4->V5:
  Commit message

V5->V6:
  Move the arch-specific code into new arch-specific files
  Isolate the data struture refactor from adding SVE

V6->V7:
  Commit message
  Moved RTE_HASH_COMPARE_SVE to the last commit of the chain

V7->V8:
  Commit message
  Typos and missing spaces

V8->V9:
  Use __rte_unused instead of (void)
  Fix an indentation mistake

V9->V10:
  Fix more formating and indentation
  Move the new compare signature file directly in hash instead of being
    in a new subdir
  Re-order includes
  Remove duplicated static check
  Move rte_hash_sig_compare_function's definition into a private header

V10->V11:
  Split the "pack the hitmask" commit into four commits:
    Move the compare function enum out of the ABI
    Move the compare function implementations into arch-specific files
    Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
      in the future
    Implement the dense hitmask
  Add missing header guards
  Move compare function enum into cuckoo_hash.c instead of its own header.

V11->V12:
  Change the name of the compare function file (remove the _pvt suffix)

Yoan Picchi (7):
  hash: make compare signature function enum private
  hash: split compare signature into arch-specific files
  hash: add a check on hash entry max size
  hash: pack the hitmask for hash in bulk lookup
  hash: optimize compare signature for NEON
  test/hash: check bulk lookup of keys after collision
  hash: add SVE support for bulk key lookup

 .mailmap                              |   2 +
 app/test/test_hash.c                  |  99 +++++++++---
 lib/hash/compare_signatures_arm.h     | 121 +++++++++++++++
 lib/hash/compare_signatures_generic.h |  40 +++++
 lib/hash/compare_signatures_x86.h     |  55 +++++++
 lib/hash/rte_cuckoo_hash.c            | 207 ++++++++++++++------------
 lib/hash/rte_cuckoo_hash.h            |  10 +-
 7 files changed, 410 insertions(+), 124 deletions(-)
 create mode 100644 lib/hash/compare_signatures_arm.h
 create mode 100644 lib/hash/compare_signatures_generic.h
 create mode 100644 lib/hash/compare_signatures_x86.h

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [PATCH v12 1/7] hash: make compare signature function enum private
  2024-07-08 12:14  3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
@ 2024-07-08 12:14  3%   ` Yoan Picchi
  2024-07-09  4:48  0%   ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
  1 sibling, 0 replies; 200+ results
From: Yoan Picchi @ 2024-07-08 12:14 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin
  Cc: dev, nd, Yoan Picchi

enum rte_hash_sig_compare_function is only used internally. This
patch move it out of the public ABI and into the C file.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 10 ++++++++++
 lib/hash/rte_cuckoo_hash.h | 10 +---------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d87aa52b5b..e1d50e7d40 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -33,6 +33,16 @@ RTE_LOG_REGISTER_DEFAULT(hash_logtype, INFO);
 
 #include "rte_cuckoo_hash.h"
 
+/* Enum used to select the implementation of the signature comparison function to use
+ * eg: A system supporting SVE might want to use a NEON or scalar implementation.
+ */
+enum rte_hash_sig_compare_function {
+	RTE_HASH_COMPARE_SCALAR = 0,
+	RTE_HASH_COMPARE_SSE,
+	RTE_HASH_COMPARE_NEON,
+	RTE_HASH_COMPARE_NUM
+};
+
 /* Mask of all flags supported by this version */
 #define RTE_HASH_EXTRA_FLAGS_MASK (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT | \
 				   RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD | \
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index a528f1d1a0..26a992419a 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -134,14 +134,6 @@ struct rte_hash_key {
 	char key[0];
 };
 
-/* All different signature compare functions */
-enum rte_hash_sig_compare_function {
-	RTE_HASH_COMPARE_SCALAR = 0,
-	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_NEON,
-	RTE_HASH_COMPARE_NUM
-};
-
 /** Bucket structure */
 struct __rte_cache_aligned rte_hash_bucket {
 	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
@@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash {
 	/**< Custom function used to compare keys. */
 	enum cmp_jump_table_case cmp_jump_table_idx;
 	/**< Indicates which compare function to use. */
-	enum rte_hash_sig_compare_function sig_cmp_fn;
+	unsigned int sig_cmp_fn;
 	/**< Indicates which signature compare function to use. */
 	uint32_t bucket_bitmask;
 	/**< Bitmask for getting bucket index from hash signature. */
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [PATCH v6 0/2] power: introduce PM QoS interface
                     ` (3 preceding siblings ...)
  2024-07-02  3:50  4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  2:29  4% ` Huisong Li
  2024-07-09  2:29  5%   ` [PATCH v6 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-07-09  6:31  4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09  2:29 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v6:
  - update release_24_07.rst based on dpdk repo to resolve CI warning.
 v5:
  - use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  28 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 247 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v6 1/2] power: introduce PM QoS API on CPU wide
  2024-07-09  2:29  4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  2:29  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09  2:29 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 1dd842df3a..af6fd82a3c 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -155,6 +155,10 @@ New Features
 
   Added an API that allows the user to reclaim the defer queue with RCU.
 
+* **Introduce per-CPU PM QoS interface.**
+
+  * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[LINE_MAX];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[LINE_MAX];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH v12 0/7] hash: add SVE support for bulk key lookup
  2024-07-08 12:14  3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
  2024-07-08 12:14  3%   ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
@ 2024-07-09  4:48  0%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-07-09  4:48 UTC (permalink / raw)
  To: Yoan Picchi; +Cc: dev, nd

On Mon, Jul 8, 2024 at 2:14 PM Yoan Picchi <yoan.picchi@arm.com> wrote:
>
> This patchset adds SVE support for the signature comparison in the cuckoo
> hash lookup and improves the existing NEON implementation. These
> optimizations required changes to the data format and signature of the
> relevant functions to support dense hitmasks (no padding) and having the
> primary and secondary hitmasks interleaved instead of being in their own
> array each.
>
> Benchmarking the cuckoo hash perf test, I observed this effect on speed:
>   There are no significant changes on Intel (ran on Sapphire Rapids)
>   Neon is up to 7-10% faster (ran on ampere altra)
>   128b SVE is about 3-5% slower than the optimized neon (ran on a graviton
>     3 cloud instance)
>   256b SVE is about 0-3% slower than the optimized neon (ran on a graviton
>     3 cloud instance)
>
> V2->V3:
>   Remove a redundant if in the test
>   Change a couple int to uint16_t in compare_signatures_dense
>   Several codding-style fix
>
> V3->V4:
>   Rebase
>
> V4->V5:
>   Commit message
>
> V5->V6:
>   Move the arch-specific code into new arch-specific files
>   Isolate the data struture refactor from adding SVE
>
> V6->V7:
>   Commit message
>   Moved RTE_HASH_COMPARE_SVE to the last commit of the chain
>
> V7->V8:
>   Commit message
>   Typos and missing spaces
>
> V8->V9:
>   Use __rte_unused instead of (void)
>   Fix an indentation mistake
>
> V9->V10:
>   Fix more formating and indentation
>   Move the new compare signature file directly in hash instead of being
>     in a new subdir
>   Re-order includes
>   Remove duplicated static check
>   Move rte_hash_sig_compare_function's definition into a private header
>
> V10->V11:
>   Split the "pack the hitmask" commit into four commits:
>     Move the compare function enum out of the ABI
>     Move the compare function implementations into arch-specific files
>     Add a missing check on RTE_HASH_BUCKET_ENTRIES in case we change it
>       in the future
>     Implement the dense hitmask
>   Add missing header guards
>   Move compare function enum into cuckoo_hash.c instead of its own header.
>
> V11->V12:
>   Change the name of the compare function file (remove the _pvt suffix)
>
> Yoan Picchi (7):
>   hash: make compare signature function enum private
>   hash: split compare signature into arch-specific files
>   hash: add a check on hash entry max size
>   hash: pack the hitmask for hash in bulk lookup
>   hash: optimize compare signature for NEON
>   test/hash: check bulk lookup of keys after collision
>   hash: add SVE support for bulk key lookup
>
>  .mailmap                              |   2 +
>  app/test/test_hash.c                  |  99 +++++++++---
>  lib/hash/compare_signatures_arm.h     | 121 +++++++++++++++
>  lib/hash/compare_signatures_generic.h |  40 +++++
>  lib/hash/compare_signatures_x86.h     |  55 +++++++
>  lib/hash/rte_cuckoo_hash.c            | 207 ++++++++++++++------------
>  lib/hash/rte_cuckoo_hash.h            |  10 +-
>  7 files changed, 410 insertions(+), 124 deletions(-)
>  create mode 100644 lib/hash/compare_signatures_arm.h
>  create mode 100644 lib/hash/compare_signatures_generic.h
>  create mode 100644 lib/hash/compare_signatures_x86.h

I added RN updates, reformated commitlogs, fixed header guards and
removed some pvt leftover.
Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [PATCH v7 0/2] power: introduce PM QoS interface
                     ` (4 preceding siblings ...)
  2024-07-09  2:29  4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  6:31  4% ` Huisong Li
  2024-07-09  6:31  5%   ` [PATCH v7 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-07-09  7:25  4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
  2024-08-09  9:50  4% ` [PATCH v9 0/2] power: introduce PM QoS interface Huisong Li
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09  6:31 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v7:
  - remove a dead code rte_lcore_is_enabled in patch[2/2]
 v6:
  - update release_24_07.rst based on dpdk repo to resolve CI warning.
 v5:
  - use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  24 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 243 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v7 1/2] power: introduce PM QoS API on CPU wide
  2024-07-09  6:31  4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  6:31  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09  6:31 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 1dd842df3a..af6fd82a3c 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -155,6 +155,10 @@ New Features
 
   Added an API that allows the user to reclaim the defer queue with RCU.
 
+* **Introduce per-CPU PM QoS interface.**
+
+  * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[LINE_MAX];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[LINE_MAX];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* [PATCH v8 1/2] power: introduce PM QoS API on CPU wide
  2024-07-09  7:25  4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  7:25  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-07-09  7:25 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 219 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 50ffc1f74a..e771868d9f 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -156,6 +156,10 @@ New Features
   * Added defer queue reclamation via RCU.
   * Added SVE support for bulk lookup.
 
+* **Introduce per-CPU PM QoS interface.**
+
+  * Introduce per-CPU PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[LINE_MAX];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[LINE_MAX];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* [PATCH v8 0/2] power: introduce PM QoS interface
                     ` (5 preceding siblings ...)
  2024-07-09  6:31  4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-07-09  7:25  4% ` Huisong Li
  2024-07-09  7:25  5%   ` [PATCH v8 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  2024-08-09  9:50  4% ` [PATCH v9 0/2] power: introduce PM QoS interface Huisong Li
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-07-09  7:25 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v8:
  - update the latest code to resolve CI warning
 v7:
  - remove a dead code rte_lcore_is_enabled in patch[2/2]
 v6:
  - update release_24_07.rst based on dpdk repo to resolve CI warning.
 v5:
  - use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 examples/l3fwd-power/main.c            |  24 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   2 +
 7 files changed, 243 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v4] ethdev: Add link_speed lanes support
  @ 2024-07-09 11:10  4%   ` Ferruh Yigit
  2024-07-09 21:20  0%     ` Damodharam Ammepalli
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-07-09 11:10 UTC (permalink / raw)
  To: Damodharam Ammepalli
  Cc: ajit.khaparde, dev, huangdengdui, kalesh-anakkur.purayil

On 7/9/2024 12:22 AM, Damodharam Ammepalli wrote:
> Update the eth_dev_ops structure with new function vectors
> to get, get capabilities and set ethernet link speed lanes.
> Update the testpmd to provide required config and information
> display infrastructure.
> 
> The supporting ethernet controller driver will register callbacks
> to avail link speed lanes config and get services. This lanes
> configuration is applicable only when the nic is forced to fixed
> speeds. In Autonegiation mode, the hardware automatically
> negotiates the number of lanes.
> 
> These are the new commands.
> 
> testpmd> show port 0 speed_lanes capabilities
> 
>  Supported speeds         Valid lanes
> -----------------------------------
>  10 Gbps                  1
>  25 Gbps                  1
>  40 Gbps                  4
>  50 Gbps                  1 2
>  100 Gbps                 1 2 4
>  200 Gbps                 2 4
>  400 Gbps                 4 8
> testpmd>
> 
> testpmd>
> testpmd> port stop 0
> testpmd> port config 0 speed_lanes 4
> testpmd> port config 0 speed 200000 duplex full
>

Is there a requirement to set speed before speed_lane?
Because I expect driver will verify if a speed_lane value is valid or
not for a specific speed value. In above usage, driver will verify based
on existing speed, whatever it is, later chaning speed may cause invalid
speed_lane configuration.


> testpmd> port start 0
> testpmd>
> testpmd> show port info 0
> 
> ********************* Infos for port 0  *********************
> MAC address: 14:23:F2:C3:BA:D2
> Device name: 0000:b1:00.0
> Driver name: net_bnxt
> Firmware-version: 228.9.115.0
> Connect to socket: 2
> memory allocation on the socket: 2
> Link status: up
> Link speed: 200 Gbps
> Active Lanes: 4
> Link duplex: full-duplex
> Autoneg status: Off
> 
> Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
> ---
> v2->v3 Consolidating the testpmd and rtelib patches into a single patch
> as requested.
> v3->v4 Addressed comments and fix help string and documentation.
> 
>  app/test-pmd/cmdline.c     | 230 +++++++++++++++++++++++++++++++++++++
>  app/test-pmd/config.c      |  69 ++++++++++-
>  app/test-pmd/testpmd.h     |   4 +
>  lib/ethdev/ethdev_driver.h |  77 +++++++++++++
>  lib/ethdev/rte_ethdev.c    |  51 ++++++++
>  lib/ethdev/rte_ethdev.h    |  92 +++++++++++++++
>  lib/ethdev/version.map     |   5 +
>  7 files changed, 526 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index b7759e38a8..a507df31d8 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -284,6 +284,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>  
>  			"dump_log_types\n"
>  			"    Dumps the log level for all the dpdk modules\n\n"
> +
> +			"show port (port_id) speed_lanes capabilities"
> +			"	Show speed lanes capabilities of a port.\n\n"
>  		);
>  	}
>  
> @@ -823,6 +826,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>  			"port config (port_id) txq (queue_id) affinity (value)\n"
>  			"    Map a Tx queue with an aggregated port "
>  			"of the DPDK port\n\n"
> +
> +			"port config (port_id|all) speed_lanes (0|1|4|8)\n"
> +			"    Set number of lanes for all ports or port_id for a forced speed\n\n"
>  		);
>  	}
>  
> @@ -1560,6 +1566,110 @@ static cmdline_parse_inst_t cmd_config_speed_specific = {
>  	},
>  };
>  
> +static int
> +parse_speed_lanes_cfg(portid_t pid, uint32_t lanes)
> +{
> +	int ret;
> +	uint32_t lanes_capa;
> +
> +	ret = parse_speed_lanes(lanes, &lanes_capa);
> +	if (ret < 0) {
> +		fprintf(stderr, "Unknown speed lane value: %d for port %d\n", lanes, pid);
> +		return -1;
> +	}
> +
> +	ret = rte_eth_speed_lanes_set(pid, lanes_capa);
> +	if (ret == -ENOTSUP) {
> +		fprintf(stderr, "Function not implemented\n");
> +		return -1;
> +	} else if (ret < 0) {
> +		fprintf(stderr, "Set speed lanes failed\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* *** display speed lanes per port capabilities *** */
> +struct cmd_show_speed_lanes_result {
> +	cmdline_fixed_string_t cmd_show;
> +	cmdline_fixed_string_t cmd_port;
> +	cmdline_fixed_string_t cmd_keyword;
> +	portid_t cmd_pid;
> +};
> +
> +static void
> +cmd_show_speed_lanes_parsed(void *parsed_result,
> +			    __rte_unused struct cmdline *cl,
> +			    __rte_unused void *data)
> +{
> +	struct cmd_show_speed_lanes_result *res = parsed_result;
> +	struct rte_eth_speed_lanes_capa *speed_lanes_capa;
> +	unsigned int num;
> +	int ret;
> +
> +	if (!rte_eth_dev_is_valid_port(res->cmd_pid)) {
> +		fprintf(stderr, "Invalid port id %u\n", res->cmd_pid);
> +		return;
> +	}
> +
> +	ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, NULL, 0);
> +	if (ret == -ENOTSUP) {
> +		fprintf(stderr, "Function not implemented\n");
> +		return;
> +	} else if (ret < 0) {
> +		fprintf(stderr, "Get speed lanes capability failed: %d\n", ret);
> +		return;
> +	}
> +
> +	num = (unsigned int)ret;
> +	speed_lanes_capa = calloc(num, sizeof(*speed_lanes_capa));
> +	if (speed_lanes_capa == NULL) {
> +		fprintf(stderr, "Failed to alloc speed lanes capability buffer\n");
> +		return;
> +	}
> +
> +	ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, speed_lanes_capa, num);
> +	if (ret < 0) {
> +		fprintf(stderr, "Error getting speed lanes capability: %d\n", ret);
> +		goto out;
> +	}
> +
> +	show_speed_lanes_capability(num, speed_lanes_capa);
> +out:
> +	free(speed_lanes_capa);
> +}
> +
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_show =
> +	TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> +				 cmd_show, "show");
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> +				 cmd_port, "port");
> +static cmdline_parse_token_num_t cmd_show_speed_lanes_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_show_speed_lanes_result,
> +			      cmd_pid, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> +				 cmd_keyword, "speed_lanes");
> +static cmdline_parse_token_string_t cmd_show_speed_lanes_cap_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> +				 cmd_keyword, "capabilities");
> +
> +static cmdline_parse_inst_t cmd_show_speed_lanes = {
> +	.f = cmd_show_speed_lanes_parsed,
> +	.data = NULL,
> +	.help_str = "show port <port_id> speed_lanes capabilities",
> +	.tokens = {
> +		(void *)&cmd_show_speed_lanes_show,
> +		(void *)&cmd_show_speed_lanes_port,
> +		(void *)&cmd_show_speed_lanes_pid,
> +		(void *)&cmd_show_speed_lanes_keyword,
> +		(void *)&cmd_show_speed_lanes_cap_keyword,
> +		NULL,
> +	},
> +};
> +
>  /* *** configure loopback for all ports *** */
>  struct cmd_config_loopback_all {
>  	cmdline_fixed_string_t port;
> @@ -1676,6 +1786,123 @@ static cmdline_parse_inst_t cmd_config_loopback_specific = {
>  	},
>  };
>  
> +/* *** configure speed_lanes for all ports *** */
> +struct cmd_config_speed_lanes_all {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t keyword;
> +	cmdline_fixed_string_t all;
> +	cmdline_fixed_string_t item;
> +	uint32_t lanes;
> +};
> +
> +static void
> +cmd_config_speed_lanes_all_parsed(void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	struct cmd_config_speed_lanes_all *res = parsed_result;
> +	portid_t pid;
> +
> +	if (!all_ports_stopped()) {
> +		fprintf(stderr, "Please stop all ports first\n");
> +		return;
> +	}
> +
> +	RTE_ETH_FOREACH_DEV(pid) {
> +		if (parse_speed_lanes_cfg(pid, res->lanes))
> +			return;
> +	}
> +
> +	cmd_reconfig_device_queue(RTE_PORT_ALL, 1, 1);
> +}
> +
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, port, "port");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, keyword,
> +				 "config");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_all =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, all, "all");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_item =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, item,
> +				 "speed_lanes");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_all_lanes =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_all, lanes, RTE_UINT32);
> +
> +static cmdline_parse_inst_t cmd_config_speed_lanes_all = {
> +	.f = cmd_config_speed_lanes_all_parsed,
> +	.data = NULL,
> +	.help_str = "port config all speed_lanes <value>",
> +	.tokens = {
> +		(void *)&cmd_config_speed_lanes_all_port,
> +		(void *)&cmd_config_speed_lanes_all_keyword,
> +		(void *)&cmd_config_speed_lanes_all_all,
> +		(void *)&cmd_config_speed_lanes_all_item,
> +		(void *)&cmd_config_speed_lanes_all_lanes,
> +		NULL,
> +	},
> +};
> +
> +/* *** configure speed_lanes for specific port *** */
> +struct cmd_config_speed_lanes_specific {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t keyword;
> +	uint16_t port_id;
> +	cmdline_fixed_string_t item;
> +	uint32_t lanes;
> +};
> +
> +static void
> +cmd_config_speed_lanes_specific_parsed(void *parsed_result,
> +				       __rte_unused struct cmdline *cl,
> +				       __rte_unused void *data)
> +{
> +	struct cmd_config_speed_lanes_specific *res = parsed_result;
> +
> +	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> +		return;
> +
> +	if (!port_is_stopped(res->port_id)) {
> +		fprintf(stderr, "Please stop port %u first\n", res->port_id);
> +		return;
> +	}
>

There is a requirement here, that port needs to be stopped before
calling the rte_eth_speed_lanes_set(),
is this requirement documented in the API documentation?


> +
> +	if (parse_speed_lanes_cfg(res->port_id, res->lanes))
> +		return;
> +
> +	cmd_reconfig_device_queue(res->port_id, 1, 1);
> +}
> +
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, port,
> +				 "port");
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, keyword,
> +				 "config");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_id =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, port_id,
> +			      RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_item =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, item,
> +				 "speed_lanes");
> +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_lanes =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, lanes,
> +			      RTE_UINT32);
> +
> +static cmdline_parse_inst_t cmd_config_speed_lanes_specific = {
> +	.f = cmd_config_speed_lanes_specific_parsed,
> +	.data = NULL,
> +	.help_str = "port config <port_id> speed_lanes <value>",
> +	.tokens = {
> +		(void *)&cmd_config_speed_lanes_specific_port,
> +		(void *)&cmd_config_speed_lanes_specific_keyword,
> +		(void *)&cmd_config_speed_lanes_specific_id,
> +		(void *)&cmd_config_speed_lanes_specific_item,
> +		(void *)&cmd_config_speed_lanes_specific_lanes,
> +		NULL,
> +	},
> +};
> +
>  /* *** configure txq/rxq, txd/rxd *** */
>  struct cmd_config_rx_tx {
>  	cmdline_fixed_string_t port;
> @@ -13238,6 +13465,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_set_port_setup_on,
>  	(cmdline_parse_inst_t *)&cmd_config_speed_all,
>  	(cmdline_parse_inst_t *)&cmd_config_speed_specific,
> +	(cmdline_parse_inst_t *)&cmd_config_speed_lanes_all,
> +	(cmdline_parse_inst_t *)&cmd_config_speed_lanes_specific,
> +	(cmdline_parse_inst_t *)&cmd_show_speed_lanes,
>  	(cmdline_parse_inst_t *)&cmd_config_loopback_all,
>  	(cmdline_parse_inst_t *)&cmd_config_loopback_specific,
>  	(cmdline_parse_inst_t *)&cmd_config_rx_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 66c3a68c1d..498a7db467 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -207,6 +207,32 @@ static const struct {
>  	{"gtpu", RTE_ETH_FLOW_GTPU},
>  };
>  
> +static const struct {
> +	enum rte_eth_speed_lanes lane;
> +	const uint32_t value;
> +} speed_lane_name[] = {
> +	{
> +		.lane = RTE_ETH_SPEED_LANE_UNKNOWN,
> +		.value = 0,
> +	},
> +	{
> +		.lane = RTE_ETH_SPEED_LANE_1,
> +		.value = 1,
> +	},
> +	{
> +		.lane = RTE_ETH_SPEED_LANE_2,
> +		.value = 2,
> +	},
> +	{
> +		.lane = RTE_ETH_SPEED_LANE_4,
> +		.value = 4,
> +	},
> +	{
> +		.lane = RTE_ETH_SPEED_LANE_8,
> +		.value = 8,
> +	},
> +};
> +
>  static void
>  print_ethaddr(const char *name, struct rte_ether_addr *eth_addr)
>  {
> @@ -786,6 +812,7 @@ port_infos_display(portid_t port_id)
>  	char name[RTE_ETH_NAME_MAX_LEN];
>  	int ret;
>  	char fw_version[ETHDEV_FWVERS_LEN];
> +	uint32_t lanes;
>  
>  	if (port_id_is_invalid(port_id, ENABLED_WARN)) {
>  		print_valid_ports();
> @@ -828,6 +855,12 @@ port_infos_display(portid_t port_id)
>  
>  	printf("\nLink status: %s\n", (link.link_status) ? ("up") : ("down"));
>  	printf("Link speed: %s\n", rte_eth_link_speed_to_str(link.link_speed));
> +	if (rte_eth_speed_lanes_get(port_id, &lanes) == 0) {
> +		if (lanes > 0)
> +			printf("Active Lanes: %d\n", lanes);
> +		else
> +			printf("Active Lanes: %s\n", "Unknown");
>

What can be the 'else' case?
As 'lanes' is unsigned, only option is it being zero. Is API allowed to
return zero as lane number?


> +	}
>  	printf("Link duplex: %s\n", (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
>  	       ("full-duplex") : ("half-duplex"));
>  	printf("Autoneg status: %s\n", (link.link_autoneg == RTE_ETH_LINK_AUTONEG) ?
> @@ -962,7 +995,7 @@ port_summary_header_display(void)
>  
>  	port_number = rte_eth_dev_count_avail();
>  	printf("Number of available ports: %i\n", port_number);
> -	printf("%-4s %-17s %-12s %-14s %-8s %s\n", "Port", "MAC Address", "Name",
> +	printf("%-4s %-17s %-12s %-14s %-8s %-8s\n", "Port", "MAC Address", "Name",
>  			"Driver", "Status", "Link");
>  }
>  
> @@ -993,7 +1026,7 @@ port_summary_display(portid_t port_id)
>  	if (ret != 0)
>  		return;
>  
> -	printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n",
> +	printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n",
>

Summary updates are irrelevant in the patch, can you please drop them.


>  		port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name,
>  		dev_info.driver_name, (link.link_status) ? ("up") : ("down"),
>  		rte_eth_link_speed_to_str(link.link_speed));
> @@ -7244,3 +7277,35 @@ show_mcast_macs(portid_t port_id)
>  		printf("  %s\n", buf);
>  	}
>  }
> +
> +int
> +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane)
> +{
> +	uint8_t i;
> +
> +	for (i = 0; i < RTE_DIM(speed_lane_name); i++) {
> +		if (speed_lane_name[i].value == lane) {
> +			*speed_lane = lane;
>

This converts from 8 -> 8, 4 -> 4 ....

Why not completely eliminate this fucntion? See below.

> +			return 0;
> +		}
> +	}
> +	return -1;
> +}
> +
> +void
> +show_speed_lanes_capability(unsigned int num, struct rte_eth_speed_lanes_capa *speed_lanes_capa)
> +{
> +	unsigned int i, j;
> +
> +	printf("\n%-15s %-10s", "Supported-speeds", "Valid-lanes");
> +	printf("\n-----------------------------------\n");
> +	for (i = 0; i < num; i++) {
> +		printf("%-17s ", rte_eth_link_speed_to_str(speed_lanes_capa[i].speed));
> +
> +		for (j = 0; j < RTE_ETH_SPEED_LANE_MAX; j++) {
> +			if (RTE_ETH_SPEED_LANES_TO_CAPA(j) & speed_lanes_capa[i].capa)
> +				printf("%-2d ", speed_lane_name[j].value);
> +		}

To eliminate both RTE_ETH_SPEED_LANE_MAX & speed_lane_name, what do you
think about:

capa = speed_lanes_capa[i].capa;
int s = 0;
while (capa) {
    if (capa & 0x1)
        printf("%-2d ", 1 << s);
    s++;
    capa = capa >> 1;
}

> +		printf("\n");
> +	}
> +}
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 9facd7f281..fb9ef05cc5 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -1253,6 +1253,10 @@ extern int flow_parse(const char *src, void *result, unsigned int size,
>  		      struct rte_flow_item **pattern,
>  		      struct rte_flow_action **actions);
>  
> +void show_speed_lanes_capability(uint32_t num,
> +				 struct rte_eth_speed_lanes_capa *speed_lanes_capa);
> +int parse_speed_lanes(uint32_t lane, uint32_t *speed_lane);
> +
>

These functions only called in 'test-pmd/cmdline.c', what do you think
move functions to that file and make them static?


>  uint64_t str_to_rsstypes(const char *str);
>  const char *rsstypes_to_str(uint64_t rss_type);
>  
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 883e59a927..0f10aec3a1 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1179,6 +1179,79 @@ typedef int (*eth_rx_descriptor_dump_t)(const struct rte_eth_dev *dev,
>  					uint16_t queue_id, uint16_t offset,
>  					uint16_t num, FILE *file);
>  
> +/**
> + * @internal
> + * Get number of current active lanes
> + *
> + * @param dev
> + *   ethdev handle of port.
> + * @param speed_lanes
> + *   Number of active lanes that the link is trained up.
> + * @return
> + *   Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + *   Success, get speed_lanes data success.
> + * @retval -ENOTSUP
> + *   Operation is not supported.
> + * @retval -EIO
> + *   Device is removed.
>

Is above '-ENOTSUP' & '-EIO' return values are valid?
Normally we expect those two from ethdev API, not from dev_ops.
In which case a dev_ops expected to return these?

Same comment for all three new APIs.


> + */
> +typedef int (*eth_speed_lanes_get_t)(struct rte_eth_dev *dev, uint32_t *speed_lanes);
> +
> +/**
> + * @internal
> + * Set speed lanes
> + *
> + * @param dev
> + *   ethdev handle of port.
> + * @param speed_lanes
> + *   Non-negative number of lanes
> + *
> + * @return
> + *   Negative errno value on error, 0 on success.
> + *
> + * @retval 0
> + *   Success, set lanes success.
> + * @retval -ENOTSUP
> + *   Operation is not supported.
> + * @retval -EINVAL
> + *   Unsupported mode requested.
> + * @retval -EIO
> + *   Device is removed.
> + */
> +typedef int (*eth_speed_lanes_set_t)(struct rte_eth_dev *dev, uint32_t speed_lanes);
> +
> +/**
> + * @internal
> + * Get supported link speed lanes capability
> + *
> + * @param speed_lanes_capa
> + *   speed_lanes_capa is out only with per-speed capabilities.
>

I can understand what above says but I think it can be clarified more,
what do you think?

> + * @param num
> + *   a number of elements in an speed_speed_lanes_capa array.
>

'a number of elements' or 'number of elements' ?

> + *
> + * @return
> + *   Negative errno value on error, positive value on success.
> + *
> + * @retval positive value
> + *   A non-negative value lower or equal to num: success. The return value
> + *   is the number of entries filled in the speed lanes array.
> + *   A non-negative value higher than num: error, the given speed lanes capa array
> + *   is too small. The return value corresponds to the num that should
> + *   be given to succeed. The entries in the speed lanes capa array are not valid
> + *   and shall not be used by the caller.
> + * @retval -ENOTSUP
> + *   Operation is not supported.
> + * @retval -EIO
> + *   Device is removed.
> + * @retval -EINVAL
> + *   *num* or *speed_lanes_capa* invalid.
> + */
> +typedef int (*eth_speed_lanes_get_capability_t)(struct rte_eth_dev *dev,
> +						struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> +						unsigned int num);
> +
>

These new dev_ops placed just in between existing dev_ops
'eth_rx_descriptor_dump_t' and 'eth_tx_descriptor_dump_t',
if you were looking this header file as whole, what would you think
about quality of it?

Please group new dev_ops below link related ones.


>  /**
>   * @internal
>   * Dump Tx descriptor info to a file.
> @@ -1247,6 +1320,10 @@ struct eth_dev_ops {
>  	eth_dev_close_t            dev_close;     /**< Close device */
>  	eth_dev_reset_t		   dev_reset;	  /**< Reset device */
>  	eth_link_update_t          link_update;   /**< Get device link state */
> +	eth_speed_lanes_get_t	   speed_lanes_get;	  /**<Get link speed active lanes */
> +	eth_speed_lanes_set_t      speed_lanes_set;	  /**<set the link speeds supported lanes */
> +	/** Get link speed lanes capability */
> +	eth_speed_lanes_get_capability_t speed_lanes_get_capa;
>  	/** Check if the device was physically removed */
>  	eth_is_removed_t           is_removed;
>  
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index f1c658f49e..07cefea307 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -7008,4 +7008,55 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
>  	return ret;
>  }
>  
> +int
> +rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lane)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (*dev->dev_ops->speed_lanes_get == NULL)
> +		return -ENOTSUP;
> +	return eth_err(port_id, (*dev->dev_ops->speed_lanes_get)(dev, lane));
> +}
> +
> +int
> +rte_eth_speed_lanes_get_capability(uint16_t port_id,
> +				   struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> +				   unsigned int num)
> +{
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (speed_lanes_capa == NULL && num > 0) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +				    "Cannot get ethdev port %u speed lanes capability to NULL when array size is non zero",
> +				    port_id);
> +		return -EINVAL;
> +	}
>

According above check, "speed_lanes_capa == NULL && num == 0" is a valid
input, I assume this is useful to get expected size of the
'speed_lanes_capa' array, but this is not mentioned in the API
documentation, can you please update API doxygen comment to cover this case.


> +
> +	if (*dev->dev_ops->speed_lanes_get_capa == NULL)
> +		return -ENOTSUP;
>

About the order or the checks, should we first check if the dev_ops
exist than validating the input arguments?
If dev_ops is not available, input variables doesn't matter anyway.

> +	ret = (*dev->dev_ops->speed_lanes_get_capa)(dev, speed_lanes_capa, num);
> +
> +	return ret;
>

API returns -EIO only if it is returned with 'eth_err()', that is to
cover the hot remove case. It is missing in this function.


> +}
> +
> +int
> +rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes_capa)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (*dev->dev_ops->speed_lanes_set == NULL)
> +		return -ENOTSUP;
> +	return eth_err(port_id, (*dev->dev_ops->speed_lanes_set)(dev, speed_lanes_capa));
> +}
>

Simiar location comment with the header one, instead of adding new APIs
to the very bottom of the file, can you please group them just below the
link related APIs?

> +
>  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 548fada1c7..35d0b81452 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -357,6 +357,30 @@ struct rte_eth_link {
>  #define RTE_ETH_LINK_MAX_STR_LEN 40 /**< Max length of default link string. */
>  /**@}*/
>  
> +/**
> + * This enum indicates the possible link speed lanes of an ethdev port.
> + */
> +enum rte_eth_speed_lanes {
> +	RTE_ETH_SPEED_LANE_UNKNOWN = 0,  /**< speed lanes unsupported mode or default */
> +	RTE_ETH_SPEED_LANE_1 = 1,        /**< Link speed lane  1 */
> +	RTE_ETH_SPEED_LANE_2 = 2,        /**< Link speed lanes 2 */
> +	RTE_ETH_SPEED_LANE_4 = 4,        /**< Link speed lanes 4 */
> +	RTE_ETH_SPEED_LANE_8 = 8,        /**< Link speed lanes 8 */
>

Do we really need enum for the lane number? Why not use it as just number?
As far as I can see APIs get "uint32 lanes" parameter anyway.


> +	RTE_ETH_SPEED_LANE_MAX,
>

This kind of MAX enum usage is causing trouble when we want to extend
the support in the future.
Like when 16 lane is required, adding it changes the value of MAX and as
this is a public structure, change is causing ABI break, making us wait
until next ABI break realease.
So better if we can prevent MAX enum usage.

> +};
> +
> +/* Translate from link speed lanes to speed lanes capa */
> +#define RTE_ETH_SPEED_LANES_TO_CAPA(x) RTE_BIT32(x)
> +
> +/* This macro indicates link speed lanes capa mask */
> +#define RTE_ETH_SPEED_LANES_CAPA_MASK(x) RTE_BIT32(RTE_ETH_SPEED_ ## x)
>

Why is above macro needed?


> +
> +/* A structure used to get and set lanes capabilities per link speed */
> +struct rte_eth_speed_lanes_capa {
> +	uint32_t speed;
> +	uint32_t capa;
> +};
> +
>  /**
>   * A structure used to configure the ring threshold registers of an Rx/Tx
>   * queue for an Ethernet port.
> @@ -6922,6 +6946,74 @@ rte_eth_tx_queue_count(uint16_t port_id, uint16_t queue_id)
>  	return rc;
>  }
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Get Active lanes.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param lanes
> + *   driver populates a active lanes value whether link is Autonegotiated or Fixed speed.
>

As these doxygen comments are API docummentation, can you please form
them as proper sentences, like start with uppercase, end with '.', etc...
Same comment for all APIs.

> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + *     that operation.
> + *   - (-EIO) if device is removed.
> + *   - (-ENODEV)  if *port_id* invalid.
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lanes);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Set speed lanes supported by the NIC.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param speed_lanes
> + *   speed_lanes a non-zero value of number lanes for this speeds.
>

'this speeds' ?


> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + *     that operation.
> + *   - (-EIO) if device is removed.
> + *   - (-ENODEV)  if *port_id* invalid.
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Get speed lanes supported by the NIC.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param speed_lanes_capa
> + *   speed_lanes_capa int array with valid lanes per speed.
> + * @param num
> + *   size of the speed_lanes_capa array.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> + *     that operation.
> + *   - (-EIO) if device is removed.
> + *   - (-ENODEV)  if *port_id* invalid.
> + *   - (-EINVAL)  if *speed_lanes* invalid
> + */
> +__rte_experimental
> +int rte_eth_speed_lanes_get_capability(uint16_t port_id,
> +				       struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> +				       unsigned int num);
> +
>

The bottom of the header file is for static inline functions.
Instead of adding these new APIs at the very bottom of the header, can
you please group them just below the link speed related APIs?


>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 79f6f5293b..db9261946f 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -325,6 +325,11 @@ EXPERIMENTAL {
>  	rte_flow_template_table_resizable;
>  	rte_flow_template_table_resize;
>  	rte_flow_template_table_resize_complete;
> +
> +	# added in 24.07
> +	rte_eth_speed_lanes_get;
> +	rte_eth_speed_lanes_get_capability;
> +	rte_eth_speed_lanes_set;
>  };
>  
>  INTERNAL {


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v4] ethdev: Add link_speed lanes support
  2024-07-09 11:10  4%   ` Ferruh Yigit
@ 2024-07-09 21:20  0%     ` Damodharam Ammepalli
  0 siblings, 0 replies; 200+ results
From: Damodharam Ammepalli @ 2024-07-09 21:20 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: ajit.khaparde, dev, huangdengdui, kalesh-anakkur.purayil

On Tue, Jul 9, 2024 at 4:10 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 7/9/2024 12:22 AM, Damodharam Ammepalli wrote:
> > Update the eth_dev_ops structure with new function vectors
> > to get, get capabilities and set ethernet link speed lanes.
> > Update the testpmd to provide required config and information
> > display infrastructure.
> >
> > The supporting ethernet controller driver will register callbacks
> > to avail link speed lanes config and get services. This lanes
> > configuration is applicable only when the nic is forced to fixed
> > speeds. In Autonegiation mode, the hardware automatically
> > negotiates the number of lanes.
> >
> > These are the new commands.
> >
> > testpmd> show port 0 speed_lanes capabilities
> >
> >  Supported speeds         Valid lanes
> > -----------------------------------
> >  10 Gbps                  1
> >  25 Gbps                  1
> >  40 Gbps                  4
> >  50 Gbps                  1 2
> >  100 Gbps                 1 2 4
> >  200 Gbps                 2 4
> >  400 Gbps                 4 8
> > testpmd>
> >
> > testpmd>
> > testpmd> port stop 0
> > testpmd> port config 0 speed_lanes 4
> > testpmd> port config 0 speed 200000 duplex full
> >
>
> Is there a requirement to set speed before speed_lane?
> Because I expect driver will verify if a speed_lane value is valid or
> not for a specific speed value. In above usage, driver will verify based
> on existing speed, whatever it is, later chaning speed may cause invalid
> speed_lane configuration.
>
>
There is no requirement to set  speed before speed_lane.
If the controller supports lanes configuration capability, if no lanes
are given (which is 0)
the driver will pick up the lowest speed (eg: 200 gbps with NRZ mode),
if a fixed speed
already exists or is configured in tandem with speed_lanes. If speed
is already Auto,
test-pmd's speed_lane config is ignored.

> > testpmd> port start 0
> > testpmd>
> > testpmd> show port info 0
> >
> > ********************* Infos for port 0  *********************
> > MAC address: 14:23:F2:C3:BA:D2
> > Device name: 0000:b1:00.0
> > Driver name: net_bnxt
> > Firmware-version: 228.9.115.0
> > Connect to socket: 2
> > memory allocation on the socket: 2
> > Link status: up
> > Link speed: 200 Gbps
> > Active Lanes: 4
> > Link duplex: full-duplex
> > Autoneg status: Off
> >
> > Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
> > ---
> > v2->v3 Consolidating the testpmd and rtelib patches into a single patch
> > as requested.
> > v3->v4 Addressed comments and fix help string and documentation.
> >
> >  app/test-pmd/cmdline.c     | 230 +++++++++++++++++++++++++++++++++++++
> >  app/test-pmd/config.c      |  69 ++++++++++-
> >  app/test-pmd/testpmd.h     |   4 +
> >  lib/ethdev/ethdev_driver.h |  77 +++++++++++++
> >  lib/ethdev/rte_ethdev.c    |  51 ++++++++
> >  lib/ethdev/rte_ethdev.h    |  92 +++++++++++++++
> >  lib/ethdev/version.map     |   5 +
> >  7 files changed, 526 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > index b7759e38a8..a507df31d8 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -284,6 +284,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> >
> >                       "dump_log_types\n"
> >                       "    Dumps the log level for all the dpdk modules\n\n"
> > +
> > +                     "show port (port_id) speed_lanes capabilities"
> > +                     "       Show speed lanes capabilities of a port.\n\n"
> >               );
> >       }
> >
> > @@ -823,6 +826,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> >                       "port config (port_id) txq (queue_id) affinity (value)\n"
> >                       "    Map a Tx queue with an aggregated port "
> >                       "of the DPDK port\n\n"
> > +
> > +                     "port config (port_id|all) speed_lanes (0|1|4|8)\n"
> > +                     "    Set number of lanes for all ports or port_id for a forced speed\n\n"
> >               );
> >       }
> >
> > @@ -1560,6 +1566,110 @@ static cmdline_parse_inst_t cmd_config_speed_specific = {
> >       },
> >  };
> >
> > +static int
> > +parse_speed_lanes_cfg(portid_t pid, uint32_t lanes)
> > +{
> > +     int ret;
> > +     uint32_t lanes_capa;
> > +
> > +     ret = parse_speed_lanes(lanes, &lanes_capa);
> > +     if (ret < 0) {
> > +             fprintf(stderr, "Unknown speed lane value: %d for port %d\n", lanes, pid);
> > +             return -1;
> > +     }
> > +
> > +     ret = rte_eth_speed_lanes_set(pid, lanes_capa);
> > +     if (ret == -ENOTSUP) {
> > +             fprintf(stderr, "Function not implemented\n");
> > +             return -1;
> > +     } else if (ret < 0) {
> > +             fprintf(stderr, "Set speed lanes failed\n");
> > +             return -1;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +/* *** display speed lanes per port capabilities *** */
> > +struct cmd_show_speed_lanes_result {
> > +     cmdline_fixed_string_t cmd_show;
> > +     cmdline_fixed_string_t cmd_port;
> > +     cmdline_fixed_string_t cmd_keyword;
> > +     portid_t cmd_pid;
> > +};
> > +
> > +static void
> > +cmd_show_speed_lanes_parsed(void *parsed_result,
> > +                         __rte_unused struct cmdline *cl,
> > +                         __rte_unused void *data)
> > +{
> > +     struct cmd_show_speed_lanes_result *res = parsed_result;
> > +     struct rte_eth_speed_lanes_capa *speed_lanes_capa;
> > +     unsigned int num;
> > +     int ret;
> > +
> > +     if (!rte_eth_dev_is_valid_port(res->cmd_pid)) {
> > +             fprintf(stderr, "Invalid port id %u\n", res->cmd_pid);
> > +             return;
> > +     }
> > +
> > +     ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, NULL, 0);
> > +     if (ret == -ENOTSUP) {
> > +             fprintf(stderr, "Function not implemented\n");
> > +             return;
> > +     } else if (ret < 0) {
> > +             fprintf(stderr, "Get speed lanes capability failed: %d\n", ret);
> > +             return;
> > +     }
> > +
> > +     num = (unsigned int)ret;
> > +     speed_lanes_capa = calloc(num, sizeof(*speed_lanes_capa));
> > +     if (speed_lanes_capa == NULL) {
> > +             fprintf(stderr, "Failed to alloc speed lanes capability buffer\n");
> > +             return;
> > +     }
> > +
> > +     ret = rte_eth_speed_lanes_get_capability(res->cmd_pid, speed_lanes_capa, num);
> > +     if (ret < 0) {
> > +             fprintf(stderr, "Error getting speed lanes capability: %d\n", ret);
> > +             goto out;
> > +     }
> > +
> > +     show_speed_lanes_capability(num, speed_lanes_capa);
> > +out:
> > +     free(speed_lanes_capa);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_show =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > +                              cmd_show, "show");
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_port =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > +                              cmd_port, "port");
> > +static cmdline_parse_token_num_t cmd_show_speed_lanes_pid =
> > +     TOKEN_NUM_INITIALIZER(struct cmd_show_speed_lanes_result,
> > +                           cmd_pid, RTE_UINT16);
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_keyword =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > +                              cmd_keyword, "speed_lanes");
> > +static cmdline_parse_token_string_t cmd_show_speed_lanes_cap_keyword =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_show_speed_lanes_result,
> > +                              cmd_keyword, "capabilities");
> > +
> > +static cmdline_parse_inst_t cmd_show_speed_lanes = {
> > +     .f = cmd_show_speed_lanes_parsed,
> > +     .data = NULL,
> > +     .help_str = "show port <port_id> speed_lanes capabilities",
> > +     .tokens = {
> > +             (void *)&cmd_show_speed_lanes_show,
> > +             (void *)&cmd_show_speed_lanes_port,
> > +             (void *)&cmd_show_speed_lanes_pid,
> > +             (void *)&cmd_show_speed_lanes_keyword,
> > +             (void *)&cmd_show_speed_lanes_cap_keyword,
> > +             NULL,
> > +     },
> > +};
> > +
> >  /* *** configure loopback for all ports *** */
> >  struct cmd_config_loopback_all {
> >       cmdline_fixed_string_t port;
> > @@ -1676,6 +1786,123 @@ static cmdline_parse_inst_t cmd_config_loopback_specific = {
> >       },
> >  };
> >
> > +/* *** configure speed_lanes for all ports *** */
> > +struct cmd_config_speed_lanes_all {
> > +     cmdline_fixed_string_t port;
> > +     cmdline_fixed_string_t keyword;
> > +     cmdline_fixed_string_t all;
> > +     cmdline_fixed_string_t item;
> > +     uint32_t lanes;
> > +};
> > +
> > +static void
> > +cmd_config_speed_lanes_all_parsed(void *parsed_result,
> > +                               __rte_unused struct cmdline *cl,
> > +                               __rte_unused void *data)
> > +{
> > +     struct cmd_config_speed_lanes_all *res = parsed_result;
> > +     portid_t pid;
> > +
> > +     if (!all_ports_stopped()) {
> > +             fprintf(stderr, "Please stop all ports first\n");
> > +             return;
> > +     }
> > +
> > +     RTE_ETH_FOREACH_DEV(pid) {
> > +             if (parse_speed_lanes_cfg(pid, res->lanes))
> > +                     return;
> > +     }
> > +
> > +     cmd_reconfig_device_queue(RTE_PORT_ALL, 1, 1);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_port =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, port, "port");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_keyword =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, keyword,
> > +                              "config");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_all =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, all, "all");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_all_item =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_all, item,
> > +                              "speed_lanes");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_all_lanes =
> > +     TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_all, lanes, RTE_UINT32);
> > +
> > +static cmdline_parse_inst_t cmd_config_speed_lanes_all = {
> > +     .f = cmd_config_speed_lanes_all_parsed,
> > +     .data = NULL,
> > +     .help_str = "port config all speed_lanes <value>",
> > +     .tokens = {
> > +             (void *)&cmd_config_speed_lanes_all_port,
> > +             (void *)&cmd_config_speed_lanes_all_keyword,
> > +             (void *)&cmd_config_speed_lanes_all_all,
> > +             (void *)&cmd_config_speed_lanes_all_item,
> > +             (void *)&cmd_config_speed_lanes_all_lanes,
> > +             NULL,
> > +     },
> > +};
> > +
> > +/* *** configure speed_lanes for specific port *** */
> > +struct cmd_config_speed_lanes_specific {
> > +     cmdline_fixed_string_t port;
> > +     cmdline_fixed_string_t keyword;
> > +     uint16_t port_id;
> > +     cmdline_fixed_string_t item;
> > +     uint32_t lanes;
> > +};
> > +
> > +static void
> > +cmd_config_speed_lanes_specific_parsed(void *parsed_result,
> > +                                    __rte_unused struct cmdline *cl,
> > +                                    __rte_unused void *data)
> > +{
> > +     struct cmd_config_speed_lanes_specific *res = parsed_result;
> > +
> > +     if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> > +             return;
> > +
> > +     if (!port_is_stopped(res->port_id)) {
> > +             fprintf(stderr, "Please stop port %u first\n", res->port_id);
> > +             return;
> > +     }
> >
>
> There is a requirement here, that port needs to be stopped before
> calling the rte_eth_speed_lanes_set(),
> is this requirement documented in the API documentation?
>
>
Speed link mode needs a phy reset, hence port stop is a requirement.
I will update this in the documentation in the next patch.

> > +
> > +     if (parse_speed_lanes_cfg(res->port_id, res->lanes))
> > +             return;
> > +
> > +     cmd_reconfig_device_queue(res->port_id, 1, 1);
> > +}
> > +
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_port =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, port,
> > +                              "port");
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_keyword =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, keyword,
> > +                              "config");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_id =
> > +     TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, port_id,
> > +                           RTE_UINT16);
> > +static cmdline_parse_token_string_t cmd_config_speed_lanes_specific_item =
> > +     TOKEN_STRING_INITIALIZER(struct cmd_config_speed_lanes_specific, item,
> > +                              "speed_lanes");
> > +static cmdline_parse_token_num_t cmd_config_speed_lanes_specific_lanes =
> > +     TOKEN_NUM_INITIALIZER(struct cmd_config_speed_lanes_specific, lanes,
> > +                           RTE_UINT32);
> > +
> > +static cmdline_parse_inst_t cmd_config_speed_lanes_specific = {
> > +     .f = cmd_config_speed_lanes_specific_parsed,
> > +     .data = NULL,
> > +     .help_str = "port config <port_id> speed_lanes <value>",
> > +     .tokens = {
> > +             (void *)&cmd_config_speed_lanes_specific_port,
> > +             (void *)&cmd_config_speed_lanes_specific_keyword,
> > +             (void *)&cmd_config_speed_lanes_specific_id,
> > +             (void *)&cmd_config_speed_lanes_specific_item,
> > +             (void *)&cmd_config_speed_lanes_specific_lanes,
> > +             NULL,
> > +     },
> > +};
> > +
> >  /* *** configure txq/rxq, txd/rxd *** */
> >  struct cmd_config_rx_tx {
> >       cmdline_fixed_string_t port;
> > @@ -13238,6 +13465,9 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
> >       (cmdline_parse_inst_t *)&cmd_set_port_setup_on,
> >       (cmdline_parse_inst_t *)&cmd_config_speed_all,
> >       (cmdline_parse_inst_t *)&cmd_config_speed_specific,
> > +     (cmdline_parse_inst_t *)&cmd_config_speed_lanes_all,
> > +     (cmdline_parse_inst_t *)&cmd_config_speed_lanes_specific,
> > +     (cmdline_parse_inst_t *)&cmd_show_speed_lanes,
> >       (cmdline_parse_inst_t *)&cmd_config_loopback_all,
> >       (cmdline_parse_inst_t *)&cmd_config_loopback_specific,
> >       (cmdline_parse_inst_t *)&cmd_config_rx_tx,
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index 66c3a68c1d..498a7db467 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -207,6 +207,32 @@ static const struct {
> >       {"gtpu", RTE_ETH_FLOW_GTPU},
> >  };
> >
> > +static const struct {
> > +     enum rte_eth_speed_lanes lane;
> > +     const uint32_t value;
> > +} speed_lane_name[] = {
> > +     {
> > +             .lane = RTE_ETH_SPEED_LANE_UNKNOWN,
> > +             .value = 0,
> > +     },
> > +     {
> > +             .lane = RTE_ETH_SPEED_LANE_1,
> > +             .value = 1,
> > +     },
> > +     {
> > +             .lane = RTE_ETH_SPEED_LANE_2,
> > +             .value = 2,
> > +     },
> > +     {
> > +             .lane = RTE_ETH_SPEED_LANE_4,
> > +             .value = 4,
> > +     },
> > +     {
> > +             .lane = RTE_ETH_SPEED_LANE_8,
> > +             .value = 8,
> > +     },
> > +};
> > +
> >  static void
> >  print_ethaddr(const char *name, struct rte_ether_addr *eth_addr)
> >  {
> > @@ -786,6 +812,7 @@ port_infos_display(portid_t port_id)
> >       char name[RTE_ETH_NAME_MAX_LEN];
> >       int ret;
> >       char fw_version[ETHDEV_FWVERS_LEN];
> > +     uint32_t lanes;
> >
> >       if (port_id_is_invalid(port_id, ENABLED_WARN)) {
> >               print_valid_ports();
> > @@ -828,6 +855,12 @@ port_infos_display(portid_t port_id)
> >
> >       printf("\nLink status: %s\n", (link.link_status) ? ("up") : ("down"));
> >       printf("Link speed: %s\n", rte_eth_link_speed_to_str(link.link_speed));
> > +     if (rte_eth_speed_lanes_get(port_id, &lanes) == 0) {
> > +             if (lanes > 0)
> > +                     printf("Active Lanes: %d\n", lanes);
> > +             else
> > +                     printf("Active Lanes: %s\n", "Unknown");
> >
>
> What can be the 'else' case?
> As 'lanes' is unsigned, only option is it being zero. Is API allowed to
> return zero as lane number?
>
>
Yes. link is down, but controller supports speed_lanes capability,
then we can show "unknown"
Other cases from brcm spec.
1gb 1Gb link speed < no lane info > (theoretically it can't be zero,
but we need to show what controller provides in the query).
10Gb (NRZ: 10G per lane, 1 lane) link speed

> > +     }
> >       printf("Link duplex: %s\n", (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
> >              ("full-duplex") : ("half-duplex"));
> >       printf("Autoneg status: %s\n", (link.link_autoneg == RTE_ETH_LINK_AUTONEG) ?
> > @@ -962,7 +995,7 @@ port_summary_header_display(void)
> >
> >       port_number = rte_eth_dev_count_avail();
> >       printf("Number of available ports: %i\n", port_number);
> > -     printf("%-4s %-17s %-12s %-14s %-8s %s\n", "Port", "MAC Address", "Name",
> > +     printf("%-4s %-17s %-12s %-14s %-8s %-8s\n", "Port", "MAC Address", "Name",
> >                       "Driver", "Status", "Link");
> >  }
> >
> > @@ -993,7 +1026,7 @@ port_summary_display(portid_t port_id)
> >       if (ret != 0)
> >               return;
> >
> > -     printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %s\n",
> > +     printf("%-4d " RTE_ETHER_ADDR_PRT_FMT " %-12s %-14s %-8s %-8s\n",
> >
>
> Summary updates are irrelevant in the patch, can you please drop them.
>
>
Sure I will.

> >               port_id, RTE_ETHER_ADDR_BYTES(&mac_addr), name,
> >               dev_info.driver_name, (link.link_status) ? ("up") : ("down"),
> >               rte_eth_link_speed_to_str(link.link_speed));
> > @@ -7244,3 +7277,35 @@ show_mcast_macs(portid_t port_id)
> >               printf("  %s\n", buf);
> >       }
> >  }
> > +
> > +int
> > +parse_speed_lanes(uint32_t lane, uint32_t *speed_lane)
> > +{
> > +     uint8_t i;
> > +
> > +     for (i = 0; i < RTE_DIM(speed_lane_name); i++) {
> > +             if (speed_lane_name[i].value == lane) {
> > +                     *speed_lane = lane;
> >
>
> This converts from 8 -> 8, 4 -> 4 ....
>
> Why not completely eliminate this fucntion? See below.
>
Sure, will evaluate and do the needful.

> > +                     return 0;
> > +             }
> > +     }
> > +     return -1;
> > +}
> > +
> > +void
> > +show_speed_lanes_capability(unsigned int num, struct rte_eth_speed_lanes_capa *speed_lanes_capa)
> > +{
> > +     unsigned int i, j;
> > +
> > +     printf("\n%-15s %-10s", "Supported-speeds", "Valid-lanes");
> > +     printf("\n-----------------------------------\n");
> > +     for (i = 0; i < num; i++) {
> > +             printf("%-17s ", rte_eth_link_speed_to_str(speed_lanes_capa[i].speed));
> > +
> > +             for (j = 0; j < RTE_ETH_SPEED_LANE_MAX; j++) {
> > +                     if (RTE_ETH_SPEED_LANES_TO_CAPA(j) & speed_lanes_capa[i].capa)
> > +                             printf("%-2d ", speed_lane_name[j].value);
> > +             }
>
> To eliminate both RTE_ETH_SPEED_LANE_MAX & speed_lane_name, what do you
> think about:
>
> capa = speed_lanes_capa[i].capa;
> int s = 0;
> while (capa) {
>     if (capa & 0x1)
>         printf("%-2d ", 1 << s);
>     s++;
>     capa = capa >> 1;
> }
>
Am new to the DPDK world.
Followed the FEC driver conventions for consistency.
Will update it as you suggested and it makes sense.

> > +             printf("\n");
> > +     }
> > +}
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> > index 9facd7f281..fb9ef05cc5 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -1253,6 +1253,10 @@ extern int flow_parse(const char *src, void *result, unsigned int size,
> >                     struct rte_flow_item **pattern,
> >                     struct rte_flow_action **actions);
> >
> > +void show_speed_lanes_capability(uint32_t num,
> > +                              struct rte_eth_speed_lanes_capa *speed_lanes_capa);
> > +int parse_speed_lanes(uint32_t lane, uint32_t *speed_lane);
> > +
> >
>
> These functions only called in 'test-pmd/cmdline.c', what do you think
> move functions to that file and make them static?
>
>
Ack

> >  uint64_t str_to_rsstypes(const char *str);
> >  const char *rsstypes_to_str(uint64_t rss_type);
> >
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 883e59a927..0f10aec3a1 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -1179,6 +1179,79 @@ typedef int (*eth_rx_descriptor_dump_t)(const struct rte_eth_dev *dev,
> >                                       uint16_t queue_id, uint16_t offset,
> >                                       uint16_t num, FILE *file);
> >
> > +/**
> > + * @internal
> > + * Get number of current active lanes
> > + *
> > + * @param dev
> > + *   ethdev handle of port.
> > + * @param speed_lanes
> > + *   Number of active lanes that the link is trained up.
> > + * @return
> > + *   Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + *   Success, get speed_lanes data success.
> > + * @retval -ENOTSUP
> > + *   Operation is not supported.
> > + * @retval -EIO
> > + *   Device is removed.
> >
>
> Is above '-ENOTSUP' & '-EIO' return values are valid?
> Normally we expect those two from ethdev API, not from dev_ops.
> In which case a dev_ops expected to return these?
>
> Same comment for all three new APIs.
>
>
code snippet from our driver
.speed_lanes_get = bnxt_speed_lanes_get,
The ethdev dev_ops returns -ENOTSUP if capability is not supported. Is this ok?
static int bnxt_speed_lanes_get(struct rte_eth_dev *dev,
                                 uint32_t *speed_lanes)
{
        struct bnxt *bp = dev->data->dev_private;

        if (!BNXT_LINK_SPEEDS_V2(bp))
                return -ENOTSUP;

-EIO - will remove
I will check and update other functions also.

> > + */
> > +typedef int (*eth_speed_lanes_get_t)(struct rte_eth_dev *dev, uint32_t *speed_lanes);
> > +
> > +/**
> > + * @internal
> > + * Set speed lanes
> > + *
> > + * @param dev
> > + *   ethdev handle of port.
> > + * @param speed_lanes
> > + *   Non-negative number of lanes
> > + *
> > + * @return
> > + *   Negative errno value on error, 0 on success.
> > + *
> > + * @retval 0
> > + *   Success, set lanes success.
> > + * @retval -ENOTSUP
> > + *   Operation is not supported.
> > + * @retval -EINVAL
> > + *   Unsupported mode requested.
> > + * @retval -EIO
> > + *   Device is removed.
> > + */
> > +typedef int (*eth_speed_lanes_set_t)(struct rte_eth_dev *dev, uint32_t speed_lanes);
> > +
> > +/**
> > + * @internal
> > + * Get supported link speed lanes capability
> > + *
> > + * @param speed_lanes_capa
> > + *   speed_lanes_capa is out only with per-speed capabilities.
> >
>
> I can understand what above says but I think it can be clarified more,
> what do you think?
>
Ack

> > + * @param num
> > + *   a number of elements in an speed_speed_lanes_capa array.
> >
>
> 'a number of elements' or 'number of elements' ?
>
Ack

> > + *
> > + * @return
> > + *   Negative errno value on error, positive value on success.
> > + *
> > + * @retval positive value
> > + *   A non-negative value lower or equal to num: success. The return value
> > + *   is the number of entries filled in the speed lanes array.
> > + *   A non-negative value higher than num: error, the given speed lanes capa array
> > + *   is too small. The return value corresponds to the num that should
> > + *   be given to succeed. The entries in the speed lanes capa array are not valid
> > + *   and shall not be used by the caller.
> > + * @retval -ENOTSUP
> > + *   Operation is not supported.
> > + * @retval -EIO
> > + *   Device is removed.
> > + * @retval -EINVAL
> > + *   *num* or *speed_lanes_capa* invalid.
> > + */
> > +typedef int (*eth_speed_lanes_get_capability_t)(struct rte_eth_dev *dev,
> > +                                             struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > +                                             unsigned int num);
> > +
> >
>
> These new dev_ops placed just in between existing dev_ops
> 'eth_rx_descriptor_dump_t' and 'eth_tx_descriptor_dump_t',
> if you were looking this header file as whole, what would you think
> about quality of it?
>
> Please group new dev_ops below link related ones.
>
>
Ack

> >  /**
> >   * @internal
> >   * Dump Tx descriptor info to a file.
> > @@ -1247,6 +1320,10 @@ struct eth_dev_ops {
> >       eth_dev_close_t            dev_close;     /**< Close device */
> >       eth_dev_reset_t            dev_reset;     /**< Reset device */
> >       eth_link_update_t          link_update;   /**< Get device link state */
> > +     eth_speed_lanes_get_t      speed_lanes_get;       /**<Get link speed active lanes */
> > +     eth_speed_lanes_set_t      speed_lanes_set;       /**<set the link speeds supported lanes */
> > +     /** Get link speed lanes capability */
> > +     eth_speed_lanes_get_capability_t speed_lanes_get_capa;
> >       /** Check if the device was physically removed */
> >       eth_is_removed_t           is_removed;
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> > index f1c658f49e..07cefea307 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -7008,4 +7008,55 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
> >       return ret;
> >  }
> >
> > +int
> > +rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lane)
> > +{
> > +     struct rte_eth_dev *dev;
> > +
> > +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +     dev = &rte_eth_devices[port_id];
> > +
> > +     if (*dev->dev_ops->speed_lanes_get == NULL)
> > +             return -ENOTSUP;
> > +     return eth_err(port_id, (*dev->dev_ops->speed_lanes_get)(dev, lane));
> > +}
> > +
> > +int
> > +rte_eth_speed_lanes_get_capability(uint16_t port_id,
> > +                                struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > +                                unsigned int num)
> > +{
> > +     struct rte_eth_dev *dev;
> > +     int ret;
> > +
> > +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +     dev = &rte_eth_devices[port_id];
> > +
> > +     if (speed_lanes_capa == NULL && num > 0) {
> > +             RTE_ETHDEV_LOG_LINE(ERR,
> > +                                 "Cannot get ethdev port %u speed lanes capability to NULL when array size is non zero",
> > +                                 port_id);
> > +             return -EINVAL;
> > +     }
> >
>
> According above check, "speed_lanes_capa == NULL && num == 0" is a valid
> input, I assume this is useful to get expected size of the
> 'speed_lanes_capa' array, but this is not mentioned in the API
> documentation, can you please update API doxygen comment to cover this case.
>
>
Ack

> > +
> > +     if (*dev->dev_ops->speed_lanes_get_capa == NULL)
> > +             return -ENOTSUP;
> >
>
> About the order or the checks, should we first check if the dev_ops
> exist than validating the input arguments?
> If dev_ops is not available, input variables doesn't matter anyway.
>
Ack

> > +     ret = (*dev->dev_ops->speed_lanes_get_capa)(dev, speed_lanes_capa, num);
> > +
> > +     return ret;
> >
>
> API returns -EIO only if it is returned with 'eth_err()', that is to
> cover the hot remove case. It is missing in this function.
>
>
Ack

> > +}
> > +
> > +int
> > +rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes_capa)
> > +{
> > +     struct rte_eth_dev *dev;
> > +
> > +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +     dev = &rte_eth_devices[port_id];
> > +
> > +     if (*dev->dev_ops->speed_lanes_set == NULL)
> > +             return -ENOTSUP;
> > +     return eth_err(port_id, (*dev->dev_ops->speed_lanes_set)(dev, speed_lanes_capa));
> > +}
> >
>
> Simiar location comment with the header one, instead of adding new APIs
> to the very bottom of the file, can you please group them just below the
> link related APIs?
>
Ack

> > +
> >  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 548fada1c7..35d0b81452 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -357,6 +357,30 @@ struct rte_eth_link {
> >  #define RTE_ETH_LINK_MAX_STR_LEN 40 /**< Max length of default link string. */
> >  /**@}*/
> >
> > +/**
> > + * This enum indicates the possible link speed lanes of an ethdev port.
> > + */
> > +enum rte_eth_speed_lanes {
> > +     RTE_ETH_SPEED_LANE_UNKNOWN = 0,  /**< speed lanes unsupported mode or default */
> > +     RTE_ETH_SPEED_LANE_1 = 1,        /**< Link speed lane  1 */
> > +     RTE_ETH_SPEED_LANE_2 = 2,        /**< Link speed lanes 2 */
> > +     RTE_ETH_SPEED_LANE_4 = 4,        /**< Link speed lanes 4 */
> > +     RTE_ETH_SPEED_LANE_8 = 8,        /**< Link speed lanes 8 */
> >
>
> Do we really need enum for the lane number? Why not use it as just number?
> As far as I can see APIs get "uint32 lanes" parameter anyway.
>
>
Ack

> > +     RTE_ETH_SPEED_LANE_MAX,
> >
>
Will take care in the upcoming new patch

> This kind of MAX enum usage is causing trouble when we want to extend
> the support in the future.
> Like when 16 lane is required, adding it changes the value of MAX and as
> this is a public structure, change is causing ABI break, making us wait
> until next ABI break realease.
> So better if we can prevent MAX enum usage.
>
Make sense. Ack

> > +};
> > +
> > +/* Translate from link speed lanes to speed lanes capa */
> > +#define RTE_ETH_SPEED_LANES_TO_CAPA(x) RTE_BIT32(x)
> > +
> > +/* This macro indicates link speed lanes capa mask */
> > +#define RTE_ETH_SPEED_LANES_CAPA_MASK(x) RTE_BIT32(RTE_ETH_SPEED_ ## x)
> >
>
> Why is above macro needed?
>
>
To use in parse_speed_lanes to validate user input. It's not used any
more in new patches. will remove it.

> > +
> > +/* A structure used to get and set lanes capabilities per link speed */
> > +struct rte_eth_speed_lanes_capa {
> > +     uint32_t speed;
> > +     uint32_t capa;
> > +};
> > +
> >  /**
> >   * A structure used to configure the ring threshold registers of an Rx/Tx
> >   * queue for an Ethernet port.
> > @@ -6922,6 +6946,74 @@ rte_eth_tx_queue_count(uint16_t port_id, uint16_t queue_id)
> >       return rc;
> >  }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Get Active lanes.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param lanes
> > + *   driver populates a active lanes value whether link is Autonegotiated or Fixed speed.
> >
>
> As these doxygen comments are API docummentation, can you please form
> them as proper sentences, like start with uppercase, end with '.', etc...
> Same comment for all APIs.
>
Ack

> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + *     that operation.
> > + *   - (-EIO) if device is removed.
> > + *   - (-ENODEV)  if *port_id* invalid.
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_get(uint16_t port_id, uint32_t *lanes);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Set speed lanes supported by the NIC.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param speed_lanes
> > + *   speed_lanes a non-zero value of number lanes for this speeds.
> >
>
> 'this speeds' ?
>
>
Ack. "number of lanes for current speed"

> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + *     that operation.
> > + *   - (-EIO) if device is removed.
> > + *   - (-ENODEV)  if *port_id* invalid.
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_set(uint16_t port_id, uint32_t speed_lanes);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * Get speed lanes supported by the NIC.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param speed_lanes_capa
> > + *   speed_lanes_capa int array with valid lanes per speed.
> > + * @param num
> > + *   size of the speed_lanes_capa array.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if underlying hardware OR driver doesn't support.
> > + *     that operation.
> > + *   - (-EIO) if device is removed.
> > + *   - (-ENODEV)  if *port_id* invalid.
> > + *   - (-EINVAL)  if *speed_lanes* invalid
> > + */
> > +__rte_experimental
> > +int rte_eth_speed_lanes_get_capability(uint16_t port_id,
> > +                                    struct rte_eth_speed_lanes_capa *speed_lanes_capa,
> > +                                    unsigned int num);
> > +
> >
>
> The bottom of the header file is for static inline functions.
> Instead of adding these new APIs at the very bottom of the header, can
> you please group them just below the link speed related APIs?
>
>
Ack

> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > index 79f6f5293b..db9261946f 100644
> > --- a/lib/ethdev/version.map
> > +++ b/lib/ethdev/version.map
> > @@ -325,6 +325,11 @@ EXPERIMENTAL {
> >       rte_flow_template_table_resizable;
> >       rte_flow_template_table_resize;
> >       rte_flow_template_table_resize_complete;
> > +
> > +     # added in 24.07
> > +     rte_eth_speed_lanes_get;
> > +     rte_eth_speed_lanes_get_capability;
> > +     rte_eth_speed_lanes_set;
> >  };
> >
> >  INTERNAL {
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

^ permalink raw reply	[relevance 0%]

* [PATCH v8 5/5] dts: add API doc generation
  @ 2024-07-12  8:57  3%   ` Juraj Linkeš
  2024-07-30 13:51  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Juraj Linkeš @ 2024-07-12  8:57 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py |  3 +++
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             |  4 ++++
 doc/guides/conf.py              | 33 +++++++++++++++++++++++++++++++-
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 ++++++++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 ++++++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++++
 meson.build                     |  1 +
 10 files changed, 122 insertions(+), 2 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 693274da4e..dff8471560 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -14,10 +14,13 @@
 parser.add_argument('version')
 parser.add_argument('src')
 parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
 args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
 sphinx_cmd = [args.sphinx] + extra_args
 
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9283154f8..cc214ede46 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -244,3 +244,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,10 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 8b440fb2a9..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -9,7 +9,7 @@
 from os import environ
 from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -23,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings,
+   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* RE: [EXTERNAL] [PATCH v6] graph: expose node context as pointers
  2024-07-05 14:52  4% [PATCH v6] graph: expose node context as pointers Robin Jarry
@ 2024-07-12 11:39  0% ` Kiran Kumar Kokkilagadda
  0 siblings, 0 replies; 200+ results
From: Kiran Kumar Kokkilagadda @ 2024-07-12 11:39 UTC (permalink / raw)
  To: Robin Jarry, dev, Jerin Jacob, Nithin Kumar Dabilpuram, Zhirun Yan



> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Friday, July 5, 2024 8:23 PM
> To: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Kiran Kumar
> Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram
> <ndabilpuram@marvell.com>; Zhirun Yan <yanzhirun_163@163.com>
> Subject: [EXTERNAL] [PATCH v6] graph: expose node context as pointers
> 
> In some cases, the node context data is used to store two pointers because
> the data is larger than the reserved 16 bytes. Having to define intermediate
> structures just to be able to cast is tedious. And without intermediate
> structures, casting 
> In some cases, the node context data is used to store two pointers because
> the data is larger than the reserved 16 bytes. Having to define intermediate
> structures just to be able to cast is tedious. And without intermediate
> structures, casting to opaque pointers is hard without violating strict aliasing
> rules.
> 
> Add an unnamed union to allow storing opaque pointers in the node
> context. Unfortunately, aligning an unnamed union that contains an array
> produces inconsistent results between C and C++. To preserve ABI/API
> compatibility in both C and C++, move all fast-path area fields into an
> unnamed struct which is itself cache aligned. Use __rte_cache_aligned to
> preserve existing alignment on architectures where cache lines are 128 bytes.
> 
> Add a static assert to ensure that the fast path area does not grow beyond a
> 64 bytes cache line.
> 
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---

Acked-by: Kiran Kumar Kokkilagadda <kirankumark@marvell.com>

> 
> Notes:
>     v6:
> 
>     * Fix ABI breakage on arm64 (and all platforms that have
> RTE_CACHE_LINE_SIZE=128).
>     * This patch will cause CI failures without libabigail 2.5. See this commit
>       https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__sourceware.org_git_-3Fp-3Dlibabigail.git-3Ba-3Dcommitdiff-3Bh-
> 3Df821c2be3fff2047ef8fc436f6f02301812d166f&d=DwIDAg&c=nKjWec2b6R0m
> OyPaz7xtfQ&r=owEKckYY4FTmil1Z6oBURwkTThyuRbLAY9LdfiaT6HA&m=p2InA
> hlxVf3SXbWbwMoGbsA2ylexBEm_WKDY6mf88Lgp6hbAD5UNuKGkFiO0F8vV&
> s=mtRUGLkylM_33OiTJTBoxFQNDQh6p7xIyNwmDu9GgTk&e=
>       for more details.
> 
>     v5:
> 
>     * Helper functions to hide casting proved to be harder than expected.
>       Naive casting may even be impossible without breaking strict aliasing
>       rules. The only other option would be to use explicit memcpy calls.
>     * Unnamed union tentative again. As suggested by Tyler (thank you!),
>       using an intermediate unnamed struct to carry the alignment produces
>       consistent ABI in C and C++.
>     * Also, Tyler (thank you!) suggested that the fast path area alignment
>       size may be incorrect for architectures where the cache line is not 64
>       bytes. There will be a 64 bytes hole in the structure at the end of
>       the unnamed struct before the zero length next nodes array. Use
>       __rte_cache_min_aligned to preserve existing alignment.
> 
>     v4:
> 
>     * Replaced the unnamed union with helper inline functions.
> 
>     v3:
> 
>     * Added __extension__ to the unnamed struct inside the union.
>     * Fixed C++ header checks.
>     * Replaced alignas() with an explicit static_assert.
> 
>  lib/graph/rte_graph_worker_common.h | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/graph/rte_graph_worker_common.h
> b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..8d8956fdddda 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -12,7 +12,9 @@
>   * process, enqueue and move streams of objects to the next nodes.
>   */
> 
> +#include <assert.h>
>  #include <stdalign.h>
> +#include <stddef.h>
> 
>  #include <rte_common.h>
>  #include <rte_cycles.h>
> @@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
>  		} dispatch;
>  	};
>  	/* Fast path area  */
> +	__extension__ struct __rte_cache_aligned {
>  #define RTE_NODE_CTX_SZ 16
> -	alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**<
> Node Context. */
> -	uint16_t size;		/**< Total number of objects available. */
> -	uint16_t idx;		/**< Number of objects used. */
> -	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
> -	uint64_t total_cycles;	/**< Cycles spent in this node. */
> -	uint64_t total_calls;	/**< Calls done to this node. */
> -	uint64_t total_objs;	/**< Objects processed by this node. */
> +		union {
> +			uint8_t ctx[RTE_NODE_CTX_SZ];
> +			__extension__ struct {
> +				void *ctx_ptr;
> +				void *ctx_ptr2;
> +			};
> +		}; /**< Node Context. */
> +		uint16_t size;		/**< Total number of objects
> available. */
> +		uint16_t idx;		/**< Number of objects used. */
> +		rte_graph_off_t off;	/**< Offset of node in the graph reel.
> */
> +		uint64_t total_cycles;	/**< Cycles spent in this node. */
> +		uint64_t total_calls;	/**< Calls done to this node. */
> +		uint64_t total_objs;	/**< Objects processed by this node.
> */
>  		union {
>  			void **objs;	   /**< Array of object pointers. */
>  			uint64_t objs_u64;
> @@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
>  			rte_node_process_t process; /**< Process function.
> */
>  			uint64_t process_u64;
>  		};
> -	alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**<
> Next nodes. */
> +		alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node
> *nodes[]; /**< Next nodes. */
> +	};
>  };
> 
> +static_assert(offsetof(struct rte_node, nodes) - offsetof(struct rte_node, ctx)
> +	== RTE_CACHE_LINE_MIN_SIZE, "rte_node fast path area must fit in
> 64
> +bytes");
> +
>  /**
>   * @internal
>   *
> --
> 2.45.2


^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] ethdev: an API for cache stashing hints
  @ 2024-07-17  2:27  3% ` Stephen Hemminger
  2024-07-18 18:48  0%   ` Wathsala Wathawana Vithanage
  2024-07-20  3:05  3%   ` Honnappa Nagarahalli
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2024-07-17  2:27 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi

On Mon, 15 Jul 2024 22:11:41 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> An application provides cache stashing hints to the ethernet devices to
> improve memory access latencies from the CPU and the NIC. This patch
> introduces three distinct hints for this purpose.
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
> (CPU) requires the data written by the NIC immediately. This implies
> that the CPU expects to read data from its local cache rather than LLC
> or main memory if possible. This would improve memory access latency in
> the Rx path. For PCI devices with TPH capability, these hints translate
> into DWHR (Device Writes Host Reads) access pattern. This hint is only
> valid for receive queues.
> 
> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
> the device access the data structure equally. Rx/Tx queue descriptors
> fit the description of such data. This hint applies to both Rx and Tx
> directions.  In the PCI TPH context, this hint translates into a
> Bi-Directional access pattern.
> 
> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
> involved in a given device's receive or transmit paths. This implies
> that only devices are involved in the IO path. Depending on the
> implementation, this hint may result in data getting placed in a cache
> close to the device or not cached at all. For PCI devices with TPH
> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
> access patterns. This is a bidirectional hint, and it can be applied to
> both Rx and Tx queues.  
> 
> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
> reads data written by the host (CPU) that may still be in the host's
> local cache but is not required by the host anytime soon. This hint is
> intended to prevent unnecessary cache invalidations that cause
> interconnect latencies when a device writes to a buffer already in host
> cache memory. In DPDK, this could happen with the recycling of mbufs
> where a mbuf is placed in the Tx queue that then gets back into mempool
> and gets recycled back into the Rx queue, all while a copy is being held
> in the CPU's local cache unnecessarily. By using this hint on supported
> platforms, the mbuf will be invalidated after the device completes the
> buffer reading, but it will be well before the buffer gets recycled and
> updated in the Rx path. This hint is only valid for transmit queues. 
> 
> Applications use three main interfaces in the ethdev library to discover
> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
> is used to set hints on an Rx queue. Both of these functions take the
> following parameters as inputs: a port_id (the id of the ethernet
> device), a cpu_id (the target CPU), a cache_level (the level of the
> cache hierarchy the data should be stashed into), a queue_id (the queue
> the hints are applied to). In addition to the above list of parameters,
> a type parameter indicates the type of the object the application
> expects to be stashed by the hardware. Depending on the hardware, these
> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
> packet headers, and packet payloads. These are indicated by the macros
> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
> type. When an offset is used, the offset parameter in the above two
> functions should be set appropriately.
> 
> rte_eth_dev_stashing_hints_discover is used to discover the object types
> and hints supported in the platform and the device. The function takes
> types and hints pointers used as a bit vector to indicate hints and
> types supported by the NIC. An application that intends to use stashing
> hints should first discover supported hints and types and then use the
> functions rte_eth_dev_stashing_hints_tx and
> rte_eth_dev_stashing_hints_rx as required to set stashing hints
> accordingly. eth_dev_ops structure has been updated with two new ops
> that a PMD should implement to support cache stashing hints. A PMD that
> intends to support cache stashing hints should initialize the
> set_stashing_hints function pointer to a function that issues hints to
> the underlying hardware in compliance with platform capabilities. The
> same PMD should also implement a function that can return two-bit fields
> indicating supported types and hints and then initialize the
> discover_stashing_hints function pointer with it. If the NIC supports
> cache stashing hints, the NIC should always set the
> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

My initial reaction is negative on this. The DPDK does not need more nerd knobs
for performance. If it is a performance win, it should be automatic and handled
by the driver.

If you absolutely have to have another flag, then it should be in existing config
(yes, extend the ABI) rather than adding more flags and calls in ethdev.

^ permalink raw reply	[relevance 3%]

* [DPDK/eventdev Bug 1497] [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing
@ 2024-07-18  3:42 10% bugzilla
  0 siblings, 0 replies; 200+ results
From: bugzilla @ 2024-07-18  3:42 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 4656 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1497

            Bug ID: 1497
           Summary: [dpdk-24.07] [ABI][meson test]
                    driver-tests/event_dma_adapter_autotest test hang when
                    do ABI testing
           Product: DPDK
           Version: 24.07
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: eventdev
          Assignee: dev@dpdk.org
          Reporter: yux.jiang@intel.com
  Target Milestone: ---

[Environment]

DPDK version: 24.07.0-rc2
DPDK ABI version: 23.11.0
OS: RHEL9.0/5.14.0-70.13.1.el9_0.x86_64
Compiler: gcc version 11.2.1 20220127 (Red Hat 11.2.1-9)
Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz


[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.

1, Build latest main dpdk24.03-rc1
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf /root/tmp/dpdk_share_lib /root/shared_lib_dpdk
DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110
install
mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk
ll /root/shared_lib_dpdk
cat /root/.bashrc | grep LD_LIBRARY_PATH
sed -i 's#export LD_LIBRARY_PATH=.*#export
LD_LIBRARY_PATH=/root/shared_lib_dpdk#g' /root/.bashrc

2, Build LTS dpdk23.11.0
rm /root/dpdk
tar zxvf dpdk_abi.tar.gz -C ~
cd ~/dpdk/
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf x86_64-native-linuxapp-gcc/lib
rm -rf x86_64-native-linuxapp-gcc/drivers

3, Launch dpdk-test and run event_dma_adapter_autotest
MALLOC_PERTURB_=132 DPDK_TEST=event_dma_adapter_autotest
/root/dpdk/x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d
/root/shared_lib_dpdk --vdev=dma_skeleton


Show the output from the previous commands.
[root@ABI-80 dpdk]# MALLOC_PERTURB_=132 DPDK_TEST=event_dma_adapter_autotest
/root/dpdk/x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d
/root/shared_lib_dpdk --vdev=dma_skeleton
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
skeldma_probe(): Create dma_skeleton dmadev with lcore-id -1
APP: HPET is not enabled, using TSC as default timer
RTE>>event_dma_adapter_autotest
 + ------------------------------------------------------- +
 + Test Suite : Event dma adapter test suite
 + ------------------------------------------------------- +
 + TestCase [ 0] : test_dma_adapter_create succeeded
 + TestCase [ 1] : test_dma_adapter_vchan_add_del succeeded
 +------------------------------------------------------+
 + DMA adapter stats for instance 0:
 + Event port poll count         0x0
 + Event dequeue count           0x0
 + DMA dev enqueue count         0x0
 + DMA dev enqueue failed count  0x0
 + DMA dev dequeue count         0x0
 + Event enqueue count           0x0
 + Event enqueue retry count     0x0
 + Event enqueue fail count      0x0
 +------------------------------------------------------+
 + TestCase [ 2] : test_dma_adapter_stats succeeded
 + TestCase [ 3] : test_dma_adapter_params succeeded

[Expected Result]
Test ok.

[Regression]
Is this issue a regression: (Y/N) Y
The first bad commit:
commit 588dcac2361011556934166d93da62dae712ce69
Author: Pavan Nikhilesh <pbhagavatula@marvell.com>
Date:   Fri Jun 7 16:06:25 2024 +0530

    eventdev/dma: reorganize event DMA ops

    Re-organize event DMA ops structure to allow holding
    source and destination pointers without the need for
    additional memory, the mempool allocating memory for
    rte_event_dma_adapter_ops can size the structure to
    accommodate all the needed source and destination
    pointers.

    Add multiple words for holding user metadata, adapter
    implementation specific metadata and event metadata.

    Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
    Acked-by: Amit Prakash Shukla <amitprakashs@marvell.com>

-----------Note---------
Based on dpdk24.07-rc2 which includes
https://bugs.dpdk.org/show_bug.cgi?id=1469's fix patch, also test hang.

Please confirm it need fix or not on ABI compatibility testing or it needn't
test for ABI compatibility testing. Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 6939 bytes --]

^ permalink raw reply	[relevance 10%]

* Re: [PATCH] net/mlx5: fix compilation warning in GCC-9.1
  2024-07-07  9:57  4% [PATCH] net/mlx5: fix compilation warning in GCC-9.1 Gregory Etelson
@ 2024-07-18  7:24  4% ` Raslan Darawsheh
  0 siblings, 0 replies; 200+ results
From: Raslan Darawsheh @ 2024-07-18  7:24 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Maayan Kashani, stable, Dariusz Sosnowski, Slava Ovsiienko,
	Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad

Hi,

From: Gregory Etelson <getelson@nvidia.com>
Sent: Sunday, July 7, 2024 12:57 PM
To: dev@dpdk.org
Cc: Gregory Etelson; Maayan Kashani; Raslan Darawsheh; stable@dpdk.org; Dariusz Sosnowski; Slava Ovsiienko; Bing Zhao; Ori Kam; Suanming Mou; Matan Azrad
Subject: [PATCH] net/mlx5: fix compilation warning in GCC-9.1

GCC has introduced a bugfix in 9.1 that changed GCC ABI in ARM setups:
https://gcc.gnu.org/gcc-9/changes.html
```
On Arm targets (arm*-*-*), a bug in the implementation of the
procedure call standard (AAPCS) in the GCC 6, 7 and 8 releases
has been fixed: a structure containing a bit-field based on a 64-bit
integral type and where no other element in a structure required
64-bit alignment could be passed incorrectly to functions.
This is an ABI change. If the option -Wpsabi is enabled
(on by default) the compiler will emit a diagnostic note for code
that might be affected.
```

The patch fixes PMD compilation in the INTEGRITY flow item.

Fixes: 23b0a8b298b1 ("net/mlx5: fix integrity item validation and translation")

Cc: stable@dpdk.org

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


^ permalink raw reply	[relevance 4%]

* RE: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-17  2:27  3% ` Stephen Hemminger
@ 2024-07-18 18:48  0%   ` Wathsala Wathawana Vithanage
  2024-07-20  3:05  3%   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 200+ results
From: Wathsala Wathawana Vithanage @ 2024-07-18 18:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas, Ferruh Yigit, Andrew Rybchenko, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, nd

> 
> My initial reaction is negative on this. The DPDK does not need more nerd
> knobs for performance. If it is a performance win, it should be automatic and
> handled by the driver.
> 
> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.


Thanks, Steve, for the feedback. My thesis is that in a DPDK-based packet processing system,
the application is more knowledgeable of memory buffer (packets) usage than the generic
underlying hardware or the PMD (I have provided some examples below with the hint they
would map into). Recognizing such cases, PCI SIG introduced TLP Packet Processing Hints (TPH).
Consequently, many interconnect designers enabled support for TPH in their interconnects so
that based on steering tags provided by an application to a NIC, which sets them in the TLP
header, memory buffers can be targeted toward a CPU at the desired level in the cache hierarchy.
With this proposed API, applications provide cache-stashing hints to ethernet devices to improve
memory access latencies from the CPU and the NIC to improve system performance.

Listed below are some use cases.

- A run-to-completion application may not need the next packet immediately in L1D. It may rather
issue a prefetch and do other work with packet and application data already in L1D before it needs
the next packet. A generic PMD will not know such subtleties in the application endpoint, and it
would resolve to stash buffers into the L1D indiscriminately or not do it at all. But, with a hint from
the application that buffers of the packets will be stashed at a cache level suitable for the
application. (like UNIX MADV_DONOTNEED but for mbufs at cache line granularity)

- Similarly, a pipelined application may use a hint that advice the buffers are needed in L1D as soon
as they arrive. (parallels MADV_WILLNEED)

- Let's call the time between a mbuf being allocated into an Rx queue, freed back into mempool in
the Tx path, and once again reallocated back in the Same Rx queue the "buffer recycle window". 
The length of the buffer recycle window is a function of the application in question; the PMD or the
NIC has no prior knowledge of this property of an application. A buffer may stay in the L1D of a CPU
throughout the entire recycle window if the window is short enough for that application.
An application with a short buffer recycle window may hint to the platform that the Tx buffer is not
needed anytime soon in the CPU cache via a hint to avoid unnecessary cache invalidations when
the buffer gets written by the Rx packet for the second time. (parallels MADV_DONOTNEED)

^ permalink raw reply	[relevance 0%]

* Re: IPv6 APIs rework
  @ 2024-07-18 21:34  3%     ` Robin Jarry
  2024-07-19  8:25  0%       ` Konstantin Ananyev
  2024-07-19  9:12  0%       ` Morten Brørup
  0 siblings, 2 replies; 200+ results
From: Robin Jarry @ 2024-07-18 21:34 UTC (permalink / raw)
  To: Vladimir Medvedkin, Morten Brørup
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> address and just provide a pointer to the memory inside the packet. Current
> vector implementation loads IPv6 addresses using unaligned access (
> _mm512_loadu_si512) so it doesn't rely on alignment.

Yes, my intention was exactly that, being able to map that structure 
directly in packets without copying them on the stack.

> > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > aligned.

> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> IPv6 is aligned only by 2 bytes.

We probably could safely say that aligning on 2 bytes would be OK. But 
is there any benefit, performance wise, in doing so? Keeping the same 
alignment as before the change would at least make it ABI compatible.


^ permalink raw reply	[relevance 3%]

* RE: IPv6 APIs rework
  2024-07-18 21:34  3%     ` Robin Jarry
  2024-07-19  8:25  0%       ` Konstantin Ananyev
@ 2024-07-19  9:12  0%       ` Morten Brørup
  2024-07-19 10:41  0%         ` Medvedkin, Vladimir
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-07-19  9:12 UTC (permalink / raw)
  To: Robin Jarry, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy
> IPv6
> > address and just provide a pointer to the memory inside the packet.

How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.

What you are suggesting only works with single lookup, not bulk lookup.

> Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
> 
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.

This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.

It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.

> 
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
> aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16
> byte
> > > aligned.
> 
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
> then
> > IPv6 is aligned only by 2 bytes.
> 
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.

I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!

There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.

The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.

The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.


^ permalink raw reply	[relevance 0%]

* Re: IPv6 APIs rework
  2024-07-19  9:12  0%       ` Morten Brørup
@ 2024-07-19 10:41  0%         ` Medvedkin, Vladimir
  0 siblings, 0 replies; 200+ results
From: Medvedkin, Vladimir @ 2024-07-19 10:41 UTC (permalink / raw)
  To: Morten Brørup, Robin Jarry, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Bruce Richardson, Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram,
	Kiran Kumar K, Satha Rao, Harman Kalra, Ankur Dwivedi,
	Anoob Joseph, Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal,
	Ajit Khaparde, Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Hi Morten,

On 19/07/2024 10:12, Morten Brørup wrote:
>> From: Robin Jarry [mailto:rjarry@redhat.com]
>>
>> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
>>> I think alignment should be 1 since in FIB6 users usually don't copy
>> IPv6
>>> address and just provide a pointer to the memory inside the packet.
> How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.
>
> What you are suggesting only works with single lookup, not bulk lookup.

You're right, sorry, confused with an internal implementation that 
passes an array of pointers


>> Current
>>> vector implementation loads IPv6 addresses using unaligned access (
>>> _mm512_loadu_si512) so it doesn't rely on alignment.
>> Yes, my intention was exactly that, being able to map that structure
>> directly in packets without copying them on the stack.
> This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.
>
> It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.
>
>>>> 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
>> aligned,
>>>> they are 8 byte aligned. So we cannot make the IPv6 address type 16
>> byte
>>>> aligned.
>>> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
>> then
>>> IPv6 is aligned only by 2 bytes.
>> We probably could safely say that aligning on 2 bytes would be OK. But
>> is there any benefit, performance wise, in doing so? Keeping the same
>> alignment as before the change would at least make it ABI compatible.
> I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!
>
> There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
> E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.
>
> The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.
>
> The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.
>
-- 
Regards,
Vladimir


^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] ethdev: an API for cache stashing hints
  2024-07-17  2:27  3% ` Stephen Hemminger
  2024-07-18 18:48  0%   ` Wathsala Wathawana Vithanage
@ 2024-07-20  3:05  3%   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2024-07-20  3:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Wathsala Wathawana Vithanage, dev, thomas, Ferruh Yigit,
	Andrew Rybchenko, nd, Dhruv Tripathi



> On Jul 16, 2024, at 9:27 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Mon, 15 Jul 2024 22:11:41 +0000
> Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
>> An application provides cache stashing hints to the ethernet devices to
>> improve memory access latencies from the CPU and the NIC. This patch
>> introduces three distinct hints for this purpose.
>> 
>> The RTE_ETH_DEV_STASH_HINT_HOST_WILLNEED hint indicates that the host
>> (CPU) requires the data written by the NIC immediately. This implies
>> that the CPU expects to read data from its local cache rather than LLC
>> or main memory if possible. This would improve memory access latency in
>> the Rx path. For PCI devices with TPH capability, these hints translate
>> into DWHR (Device Writes Host Reads) access pattern. This hint is only
>> valid for receive queues.
>> 
>> The RTE_ETH_DEV_STASH_HINT_BI_DIR_DATA hint indicates that the host and
>> the device access the data structure equally. Rx/Tx queue descriptors
>> fit the description of such data. This hint applies to both Rx and Tx
>> directions.  In the PCI TPH context, this hint translates into a
>> Bi-Directional access pattern.
>> 
>> RTE_ETH_DEV_STASH_HINT_DEV_ONLY hint indicates that the CPU is not
>> involved in a given device's receive or transmit paths. This implies
>> that only devices are involved in the IO path. Depending on the
>> implementation, this hint may result in data getting placed in a cache
>> close to the device or not cached at all. For PCI devices with TPH
>> capability, this hint translates into D*D* (DWDR, DRDW, DWDW, DRDR)
>> access patterns. This is a bidirectional hint, and it can be applied to
>> both Rx and Tx queues.  
>> 
>> The RTE_ETH_DEV_STASH_HINT_HOST_DONTNEED hint indicates that the device
>> reads data written by the host (CPU) that may still be in the host's
>> local cache but is not required by the host anytime soon. This hint is
>> intended to prevent unnecessary cache invalidations that cause
>> interconnect latencies when a device writes to a buffer already in host
>> cache memory. In DPDK, this could happen with the recycling of mbufs
>> where a mbuf is placed in the Tx queue that then gets back into mempool
>> and gets recycled back into the Rx queue, all while a copy is being held
>> in the CPU's local cache unnecessarily. By using this hint on supported
>> platforms, the mbuf will be invalidated after the device completes the
>> buffer reading, but it will be well before the buffer gets recycled and
>> updated in the Rx path. This hint is only valid for transmit queues. 
>> 
>> Applications use three main interfaces in the ethdev library to discover
>> and set cache stashing hints. rte_eth_dev_stashing_hints_tx interface is
>> used to set hints on a Tx queue. rte_eth_dev_stashing_hints_rx interface
>> is used to set hints on an Rx queue. Both of these functions take the
>> following parameters as inputs: a port_id (the id of the ethernet
>> device), a cpu_id (the target CPU), a cache_level (the level of the
>> cache hierarchy the data should be stashed into), a queue_id (the queue
>> the hints are applied to). In addition to the above list of parameters,
>> a type parameter indicates the type of the object the application
>> expects to be stashed by the hardware. Depending on the hardware, these
>> may vary. Intel E810 NICs support the stashing of Rx/Tx descriptors,
>> packet headers, and packet payloads. These are indicated by the macros
>> RTE_ETH_DEV_STASH_TYPE_DESC, RTE_ETH_DEV_STASH_TYPE_HEADER,
>> RTE_ETH_DEV_STASH_TYPE_PAYLOAD. Hardware capable of stashing data at any
>> given offset into a packet can use the RTE_ETH_DEV_STASH_TYPE_OFFSET
>> type. When an offset is used, the offset parameter in the above two
>> functions should be set appropriately.
>> 
>> rte_eth_dev_stashing_hints_discover is used to discover the object types
>> and hints supported in the platform and the device. The function takes
>> types and hints pointers used as a bit vector to indicate hints and
>> types supported by the NIC. An application that intends to use stashing
>> hints should first discover supported hints and types and then use the
>> functions rte_eth_dev_stashing_hints_tx and
>> rte_eth_dev_stashing_hints_rx as required to set stashing hints
>> accordingly. eth_dev_ops structure has been updated with two new ops
>> that a PMD should implement to support cache stashing hints. A PMD that
>> intends to support cache stashing hints should initialize the
>> set_stashing_hints function pointer to a function that issues hints to
>> the underlying hardware in compliance with platform capabilities. The
>> same PMD should also implement a function that can return two-bit fields
>> indicating supported types and hints and then initialize the
>> discover_stashing_hints function pointer with it. If the NIC supports
>> cache stashing hints, the NIC should always set the
>> RTE_ETH_DEV_CAPA_CACHE_STASHING device capability.
>> 
>> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> 
> My initial reaction is negative on this. The DPDK does not need more nerd knobs
> for performance. If it is a performance win, it should be automatic and handled
> by the driver.
> 
IMO, DPDK provides low level APIs and they should provide flexibility for users to control what part of the data from NIC is stashed where. For ex: currently available systems across multiple architectures provide system wide configuration to control stashing data from the NIC to system cache. The configuration allows for all the data from NIC to be stated or none. Whereas some applications need access to just the headers and some others need access to all the packet data. 

> If you absolutely have to have another flag, then it should be in existing config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.
Agree. Extending the ABI would result in a better solution rather than another set of APIs.


^ permalink raw reply	[relevance 3%]

* [PATCH v6 1/8] ethdev: support report register names and filter
  @ 2024-07-22  6:58  8%   ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-07-22  6:58 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: lihuisong, fengchengwen

This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their module names.

The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by modules. And the original API
rte_eth_dev_get_reg_info() does not use the names and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "index_XXX", which means the location in the register table.

Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
 doc/guides/rel_notes/release_24_07.rst |  8 ++++++
 lib/ethdev/ethdev_trace.h              |  2 ++
 lib/ethdev/rte_dev_info.h              | 11 ++++++++
 lib/ethdev/rte_ethdev.c                | 38 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 29 ++++++++++++++++++++
 lib/ethdev/version.map                 |  3 ++
 6 files changed, 91 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 058609b0f36b..b0bb49c8f29e 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -186,6 +186,12 @@ New Features
   * Added defer queue reclamation via RCU.
   * Added SVE support for bulk lookup.
 
+* **Added support for dumping registers with names and filtering by modules.**
+
+  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to filter the
+    registers by module names and get the information (names, values and other
+    attributes) of the filtered registers.
+
 
 Removed Items
 -------------
@@ -241,6 +247,8 @@ ABI Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =======================================================
+   * ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+     structure for filtering by modules and reporting names of registers.
 
 * No ABI change that would break compatibility with 23.11.
 
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb70..0c4780a09ef5 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -1152,6 +1152,8 @@ RTE_TRACE_POINT(
 	rte_trace_point_emit_u32(info->length);
 	rte_trace_point_emit_u32(info->width);
 	rte_trace_point_emit_u32(info->version);
+	rte_trace_point_emit_ptr(info->names);
+	rte_trace_point_emit_ptr(info->filter);
 	rte_trace_point_emit_int(ret);
 )
 
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..26b777f9836e 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
 
 #include <stdint.h>
 
+#define RTE_ETH_REG_NAME_SIZE 64
+struct rte_eth_reg_name {
+	char name[RTE_ETH_REG_NAME_SIZE];
+};
+
 /*
  * Placeholder for accessing device registers
  */
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
 	uint32_t length; /**< Number of registers to fetch */
 	uint32_t width; /**< Size of device register */
 	uint32_t version; /**< Device version */
+	/**
+	 * Name of target module, filter for target subset of registers.
+	 * This field could affects register selection for data/length/names.
+	 */
+	const char *filter;
+	struct rte_eth_reg_name *names; /**< Registers name saver */
 };
 
 /*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..30ca4a0043c5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
 
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+	struct rte_dev_reg_info reg_info = { 0 };
+	int ret;
+
+	if (info == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Cannot get ethdev port %u register info to NULL",
+			port_id);
+		return -EINVAL;
+	}
+
+	reg_info.length = info->length;
+	reg_info.data = info->data;
+
+	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
+	if (ret != 0)
+		return ret;
+
+	info->length = reg_info.length;
+	info->width = reg_info.width;
+	info->version = reg_info.version;
+	info->offset = reg_info.offset;
+
+	return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
 {
 	struct rte_eth_dev *dev;
+	uint32_t i;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6402,12 +6431,21 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 		return -EINVAL;
 	}
 
+	if (info->names != NULL && info->length != 0)
+		memset(info->names, 0, sizeof(struct rte_eth_reg_name) * info->length);
+
 	if (*dev->dev_ops->get_reg == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->get_reg)(dev, info));
 
 	rte_ethdev_trace_get_reg_info(port_id, info, ret);
 
+	/* Report the default names if drivers not report. */
+	if (ret == 0 && info->names != NULL && strlen(info->names[0].name) == 0) {
+		for (i = 0; i < info->length; i++)
+			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+				"index_%u", info->offset + i);
+	}
 	return ret;
 }
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7ad..02cb3c07f742 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5071,6 +5071,35 @@ __rte_experimental
 int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
 		struct rte_power_monitor_cond *pmc);
 
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   Pointer to rte_dev_reg_info structure to fill in.
+ *   - If info->filter is NULL, return info for all registers (seen as filter
+ *     none).
+ *   - If info->filter is not NULL, return error if the driver does not support
+ *     filter. Fill the length field with filtered register number.
+ *   - If info->data is NULL, the function fills in the width and length fields.
+ *   - If info->data is not NULL, ethdev considers there are enough spaces to
+ *     store the registers, and the values of registers with the filter string
+ *     as the module name are put into the buffer pointed at by info->data.
+ *   - If info->names is not NULL, drivers should fill it or the ethdev fills it
+ *     with default names.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b5c..e3289e999382 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -325,6 +325,9 @@ EXPERIMENTAL {
 	rte_flow_template_table_resizable;
 	rte_flow_template_table_resize;
 	rte_flow_template_table_resize_complete;
+
+	# added in 24.07
+	rte_eth_dev_get_reg_info_ext;
 };
 
 INTERNAL {
-- 
2.33.0


^ permalink raw reply	[relevance 8%]

* RE: IPv6 APIs rework
  2024-07-18 21:34  3%     ` Robin Jarry
@ 2024-07-19  8:25  0%       ` Konstantin Ananyev
  2024-07-19  9:12  0%       ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-07-19  8:25 UTC (permalink / raw)




> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> > address and just provide a pointer to the memory inside the packet. Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
> 
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.
> 
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > > aligned.
> 
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> > IPv6 is aligned only by 2 bytes.
> 
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.

I am also not sure that this extra alignment (2B or 4B) here will give us any benefit,
while it most likely will introduce extra restrictions. 
AFAIK, right now we do have ipv6 as array of plain chars, and there were no much
complaints about it.
So I am for keeping it 1B aligned.
Overall proposal looks reasonable to me... might be 24.11 is a good opportunity for such change.
Konstantin  

^ permalink raw reply	[relevance 0%]

* [PATCH] doc: announce cryptodev change to support EDDSA
@ 2024-07-22 14:53  8% Gowrishankar Muthukrishnan
  2024-07-24  5:07  0% ` Anoob Joseph
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:53 UTC (permalink / raw)
  To: dev, bruce.richardson, ciara.power, jerinj, fanzhang.oss,
	arkadiuszx.kusztal, kai.ji, jack.bond-preston, david.marchand,
	hemant.agrawal, pablo.de.lara.guarch, fiona.trahe,
	declan.doherty, matan, ruifeng.wang, abhinandan.gujjar,
	maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
	ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
	jianjay.zhou, lee.daly
  Cc: Anoob Joseph, zhangfei.gao, Gowrishankar Muthukrishnan

Announce the additions in cryptodev ABI to support EDDSA algorithm.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
  https://patches.dpdk.org/project/dpdk/patch/0ae6a1afadac64050d80b0fd7712c4a6a8599e2c.1701273963.git.gmuthukrishn@marvell.com/
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..fcbec965b1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,7 @@ Deprecation Notices
   will be deprecated and subsequently removed in DPDK 24.11 release.
   Before this, the new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status.
+
+* cryptodev: The enum ``rte_crypto_asym_xform_type`` and struct ``rte_crypto_asym_op``
+  will be extended to include new values to support EDDSA. This will break
+  ABI compatibility with existing applications that use these data types.
-- 
2.21.0


^ permalink raw reply	[relevance 8%]

* [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
@ 2024-07-22 14:55  5% Gowrishankar Muthukrishnan
  2024-07-24  6:49  0% ` [EXTERNAL] " Akhil Goyal
  2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
  0 siblings, 2 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:55 UTC (permalink / raw)
  To: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
	fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
	david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, ruifeng.wang,
	abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, lee.daly
  Cc: Gowrishankar Muthukrishnan

Announce cryptodev changes to offload RSA asymmetric operation in
VirtIO PMD.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
  https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-gmuthukrishn@marvell.com/
  https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-gmuthukrishn@marvell.com/
---
 doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..26fec84aba 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,14 @@ Deprecation Notices
   will be deprecated and subsequently removed in DPDK 24.11 release.
   Before this, the new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status.
+
+* cryptodev: The struct rte_crypto_rsa_padding will be moved from
+  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
+  breaking ABI. The new location is recommended to comply with
+  virtio-crypto specification. Applications and drivers using
+  this struct will be updated.
+
+* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
+  in either exponent or quintuple format is changed from union to struct
+  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
+  This change will not break existing applications.
-- 
2.21.0


^ permalink raw reply	[relevance 5%]

* [PATCH] doc: announce vhost changes to support asymmetric operation
@ 2024-07-22 14:56  8% Gowrishankar Muthukrishnan
  2024-07-23 18:30  4% ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-22 14:56 UTC (permalink / raw)
  To: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
	fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
	david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, ruifeng.wang,
	abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, lee.daly
  Cc: Gowrishankar Muthukrishnan

Announce vhost ABI changes to modify few functions to support
asymmetric crypto operation.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
--
RFC:
  https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-4-gmuthukrishn@marvell.com/
---
 doc/guides/rel_notes/deprecation.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6948641ff6..2f5c2c5a34 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,10 @@ Deprecation Notices
   will be deprecated and subsequently removed in DPDK 24.11 release.
   Before this, the new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status.
+
+* vhost: The function ``rte_vhost_crypto_create`` will accept a new parameter
+  to specify rte_mempool for asymmetric crypto session. The function
+  ``rte_vhost_crypto_finalize_requests`` will accept two new parameters,
+  where the first one is to specify vhost device id and other one is to specify
+  the virtio queue index. These two modifications are required to support
+  asymmetric crypto operation in vhost crypto and will break ABI.
-- 
2.21.0


^ permalink raw reply	[relevance 8%]

* RE: release candidate 24.07-rc2
  @ 2024-07-23  2:14  4% ` Xu, HailinX
  0 siblings, 0 replies; 200+ results
From: Xu, HailinX @ 2024-07-23  2:14 UTC (permalink / raw)
  To: Marchand, David, dev
  Cc: Kovacevic, Marko, Mcnamara, John, Richardson, Bruce,
	Ferruh Yigit, Puttaswamy, Rajesh T

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Saturday, July 13, 2024 2:25 AM
> To: announce@dpdk.org
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Subject: release candidate 24.07-rc2
> 
> A new DPDK release candidate is ready for testing:
>         https://git.dpdk.org/dpdk/tag/?id=v24.07-rc2
> 
> There are 461 new patches in this snapshot.
> 
> Release notes:
>         https://doc.dpdk.org/guides/rel_notes/release_24_07.html
> 
> Highlights of 24.07-rc2:
>         - SVE support in the hash library
>         - FEC support in net/i40e and net/ice
>         - log cleanups in drivers
>         - various driver fixes and updates
> 
> Please test and report issues on bugs.dpdk.org.
> 
> DPDK 24.07-rc3 is expected in approximately one week.
> 
> Thank you everyone
> 
> --
> David Marchand
Update the test status for Intel part. dpdk24.07-rc2 all test is done. found four new issues.

New issues:
1. Bug 1497 - [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing    -> not fix yet
2. ipsec-secgw tests fail    -> Intel dev is under investigating
3. ice_rx_timestamp/single_queue_with_timestamp: obtained unexpected timestamp    -> Intel dev is under investigating
4. cryptodev_cpu_aesni_mb_autotest is failing    -> Intel dev is under investigating

# Basic Intel(R) NIC testing
* Build or compile:  
    *Build: cover the build test combination with latest GCC/Clang version and the popular OS revision such as Ubuntu22.04.4, Ubuntu24.04, Fedora40, RHEL9.3, RHEL9.4, FreeBSD14.0, SUSE15.5, OpenAnolis8.8, CBL-Mariner2.0 etc.
              - All test passed.
    *Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as Ubuntu24..04 and RHEL9.4.
              - All test passed with latest dpdk.
* PF/VF(i40e, ixgbe): test scenarios including PF/VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc. 
	- All test case is done. No new issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
	- Execution rate is done. found the 3 issue.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test, RFC2544 Zero packet loss performance test, etc.
	- Execution rate is done. No new issue is found.
* Power and IPsec: 
    * Power: test scenarios including bi-direction/Telemetry/Empty Poll Lib/Priority Base Frequency, etc. 
	- Execution rate is done. No new issue is found.
    * IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - QAT&SW/FIB library, etc.
	- Execution rate is done. found the 2 issue. 
# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE ESXI 8.0U1, etc.
	- Execution rate is done. No new issue is found.
* Cryptodev: 
    *Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
	- Execution rate is done. found the 4 issue. 
    *Performance test: test scenarios including Throughput Performance /Cryptodev Latency, etc.
	- Execution rate is done. No performance drop.

Regards,
Xu, Hailin

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] doc: announce vhost changes to support asymmetric operation
  2024-07-22 14:56  8% [PATCH] doc: announce vhost changes to support asymmetric operation Gowrishankar Muthukrishnan
@ 2024-07-23 18:30  4% ` Jerin Jacob
  2024-07-25  9:29  4%   ` [EXTERNAL] " Gowrishankar Muthukrishnan
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2024-07-23 18:30 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan
  Cc: dev, Anoob Joseph, bruce.richardson, ciara.power, jerinj,
	fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
	david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, ruifeng.wang,
	abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, lee.daly

On Mon, Jul 22, 2024 at 8:33 PM Gowrishankar Muthukrishnan
<gmuthukrishn@marvell.com> wrote:
>
> Announce vhost ABI changes to modify few functions to support
> asymmetric crypto operation.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
>   https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-4-gmuthukrishn@marvell.com/

Looks like in this case adding new arguments to function. Could you
check ABI versing helps here? It seems like it can be easy manged with
ABI versioning.

https://doc.dpdk.org/guides/contributing/abi_versioning.html

> ---
>  doc/guides/rel_notes/deprecation.rst | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..2f5c2c5a34 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,10 @@ Deprecation Notices
>    will be deprecated and subsequently removed in DPDK 24.11 release.
>    Before this, the new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status.
> +
> +* vhost: The function ``rte_vhost_crypto_create`` will accept a new parameter
> +  to specify rte_mempool for asymmetric crypto session. The function
> +  ``rte_vhost_crypto_finalize_requests`` will accept two new parameters,
> +  where the first one is to specify vhost device id and other one is to specify
> +  the virtio queue index. These two modifications are required to support
> +  asymmetric crypto operation in vhost crypto and will break ABI.
> --
> 2.21.0
>

^ permalink raw reply	[relevance 4%]

* RE: [PATCH] doc: announce cryptodev change to support EDDSA
  2024-07-22 14:53  8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
@ 2024-07-24  5:07  0% ` Anoob Joseph
  2024-07-24  6:46  0% ` [EXTERNAL] " Akhil Goyal
  2024-07-25 15:01  0% ` Kusztal, ArkadiuszX
  2 siblings, 0 replies; 200+ results
From: Anoob Joseph @ 2024-07-24  5:07 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, bruce.richardson, ciara.power,
	Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal, kai.ji,
	jack.bond-preston, david.marchand, hemant.agrawal,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, g.singh, jianjay.zhou,
	lee.daly
  Cc: zhangfei.gao, Gowrishankar Muthukrishnan

> Subject: [PATCH] doc: announce cryptodev change to support EDDSA
> 
> Announce the additions in cryptodev ABI to support EDDSA algorithm.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>

Acked-by: Anoob Joseph <anoobj@marvell.com>

^ permalink raw reply	[relevance 0%]

* RE: [EXTERNAL] [PATCH] doc: announce cryptodev change to support EDDSA
  2024-07-22 14:53  8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
  2024-07-24  5:07  0% ` Anoob Joseph
@ 2024-07-24  6:46  0% ` Akhil Goyal
  2024-07-25 15:01  0% ` Kusztal, ArkadiuszX
  2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-07-24  6:46 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, bruce.richardson, ciara.power,
	Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal, kai.ji,
	jack.bond-preston, david.marchand, hemant.agrawal,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, g.singh, jianjay.zhou,
	lee.daly
  Cc: Anoob Joseph, zhangfei.gao, Gowrishankar Muthukrishnan

> Announce the additions in cryptodev ABI to support EDDSA algorithm.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
Acked-by: Akhil Goyal <gakhil@marvell.com>

> RFC:
>   https://patches.dpdk.org/project/dpdk/patch/0ae6a1afadac64050d80b0fd7712c4a6a8599e2c.1701273963.git.gmuthukrishn@marvell.com/
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..fcbec965b1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,7 @@ Deprecation Notices
>    will be deprecated and subsequently removed in DPDK 24.11 release.
>    Before this, the new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status.
> +
> +* cryptodev: The enum ``rte_crypto_asym_xform_type`` and struct
> ``rte_crypto_asym_op``
> +  will be extended to include new values to support EDDSA. This will break
> +  ABI compatibility with existing applications that use these data types.
> --
> 2.21.0


^ permalink raw reply	[relevance 0%]

* RE: [EXTERNAL] [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-22 14:55  5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
@ 2024-07-24  6:49  0% ` Akhil Goyal
  2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
  1 sibling, 0 replies; 200+ results
From: Akhil Goyal @ 2024-07-24  6:49 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, Anoob Joseph, bruce.richardson,
	ciara.power, Jerin Jacob, fanzhang.oss, arkadiuszx.kusztal,
	kai.ji, jack.bond-preston, david.marchand, hemant.agrawal,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	ruifeng.wang, abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, lee.daly
  Cc: Gowrishankar Muthukrishnan

> Announce cryptodev changes to offload RSA asymmetric operation in
> VirtIO PMD.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
>   https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-gmuthukrishn@marvell.com/
>   https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-gmuthukrishn@marvell.com/
> ---
Acked-by: Akhil Goyal <gakhil@marvell.com>

>  doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..26fec84aba 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,14 @@ Deprecation Notices
>    will be deprecated and subsequently removed in DPDK 24.11 release.
>    Before this, the new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status.
> +
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> +  breaking ABI. The new location is recommended to comply with
> +  virtio-crypto specification. Applications and drivers using
> +  this struct will be updated.
> +
> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> +  in either exponent or quintuple format is changed from union to struct
> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> +  This change will not break existing applications.
> --
> 2.21.0


^ permalink raw reply	[relevance 0%]

* [PATCH v5 5/6] ci: test compiler memcpy
  @ 2024-07-24  7:53  5%   ` Mattias Rönnblom
  0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-07-24  7:53 UTC (permalink / raw)
  To: dev
  Cc: Mattias Rönnblom, Morten Brørup, Stephen Hemminger,
	David Marchand, Pavan Nikhilesh, Bruce Richardson,
	Mattias Rönnblom

Add compilation tests for the use_cc_memcpy build option.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 .ci/linux-build.sh            | 5 +++++
 .github/workflows/build.yml   | 7 +++++++
 devtools/test-meson-builds.sh | 4 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 15ed51e4c1..a873f83d09 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -98,6 +98,11 @@ if [ "$STDATOMIC" = "true" ]; then
 else
 	OPTS="$OPTS -Dcheck_includes=true"
 fi
+if [ "$CCMEMCPY" = "true" ]; then
+	OPTS="$OPTS -Duse_cc_memcpy=true"
+else
+	OPTS="$OPTS -Duse_cc_memcpy=true"
+fi
 if [ "$MINI" = "true" ]; then
     OPTS="$OPTS -Denable_drivers=net/null"
     OPTS="$OPTS -Ddisable_libs=*"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index dbf25626d4..cd45d6c6c1 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -31,6 +31,7 @@ jobs:
       RISCV64: ${{ matrix.config.cross == 'riscv64' }}
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
       STDATOMIC: ${{ contains(matrix.config.checks, 'stdatomic') }}
+      CCMEMCPY: ${{ contains(matrix.config.checks, 'ccmemcpy') }}
 
     strategy:
       fail-fast: false
@@ -45,6 +46,12 @@ jobs:
           - os: ubuntu-22.04
             compiler: clang
             checks: stdatomic
+          - os: ubuntu-22.04
+            compiler: gcc
+            checks: ccmemcpy
+          - os: ubuntu-22.04
+            compiler: clang
+            checks: ccmemcpy
           - os: ubuntu-22.04
             compiler: gcc
             checks: abi+debug+doc+examples+tests
diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index d71bb1ded0..e72146be3b 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -228,12 +228,14 @@ for c in gcc clang ; do
 		if [ $s = shared ] ; then
 			abicheck=ABI
 			stdatomic=-Denable_stdatomic=true
+			ccmemcpy=-Duse_cc_memcpy=true
 		else
 			abicheck=skipABI # save time and disk space
 			stdatomic=-Denable_stdatomic=false
+			ccmemcpy=-Duse_cc_memcpy=false
 		fi
 		export CC="$CCACHE $c"
-		build build-$c-$s $c $abicheck $stdatomic --default-library=$s
+		build build-$c-$s $c $abicheck $stdatomic $ccmemcpy --default-library=$s
 		unset CC
 	done
 done
-- 
2.34.1


^ permalink raw reply	[relevance 5%]

* RE: [EXTERNAL] Re: [PATCH] doc: announce vhost changes to support asymmetric operation
  2024-07-23 18:30  4% ` Jerin Jacob
@ 2024-07-25  9:29  4%   ` Gowrishankar Muthukrishnan
  0 siblings, 0 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25  9:29 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Anoob Joseph, bruce.richardson, ciara.power, Jerin Jacob,
	fanzhang.oss, arkadiuszx.kusztal, kai.ji, jack.bond-preston,
	david.marchand, hemant.agrawal, pablo.de.lara.guarch,
	fiona.trahe, declan.doherty, matan, ruifeng.wang,
	abhinandan.gujjar, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, lee.daly

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]

Sure Jerin. I’ll drop this proposal as ABI versioning could help. Thanks.




Looks like in this case adding new arguments to function. Could you

check ABI versing helps here? It seems like it can be easy manged with

ABI versioning.





[-- Attachment #2: Type: text/html, Size: 2978 bytes --]

^ permalink raw reply	[relevance 4%]

* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-22 14:55  5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
  2024-07-24  6:49  0% ` [EXTERNAL] " Akhil Goyal
@ 2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
  2024-07-25 15:53  0%   ` Gowrishankar Muthukrishnan
  2024-07-25 16:00  0%   ` Gowrishankar Muthukrishnan
  1 sibling, 2 replies; 200+ results
From: Kusztal, ArkadiuszX @ 2024-07-25  9:48 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, Anoob Joseph, Richardson, Bruce,
	ciara.power, jerinj, fanzhang.oss, Ji, Kai, jack.bond-preston,
	Marchand, David, hemant.agrawal, De Lara Guarch, Pablo, Trahe,
	Fiona, Doherty, Declan, matan, ruifeng.wang, Gujjar,
	Abhinandan S, maxime.coquelin, chenbox, sunilprakashrao.uttarwar,
	andrew.boyer, ajit.khaparde, raveendra.padasalagi, vikas.gupta,
	zhangfei.gao, g.singh, jianjay.zhou, Daly, Lee

Hi Gowrishankar,

> -----Original Message-----
> From: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> Sent: Monday, July 22, 2024 4:56 PM
> To: dev@dpdk.org; Anoob Joseph <anoobj@marvell.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; ciara.power@intel.com; jerinj@marvell.com;
> fanzhang.oss@gmail.com; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>;
> Ji, Kai <kai.ji@intel.com>; jack.bond-preston@foss.arm.com; Marchand, David
> <david.marchand@redhat.com>; hemant.agrawal@nxp.com; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Doherty, Declan <declan.doherty@intel.com>;
> matan@nvidia.com; ruifeng.wang@arm.com; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; maxime.coquelin@redhat.com;
> chenbox@nvidia.com; sunilprakashrao.uttarwar@amd.com;
> andrew.boyer@amd.com; ajit.khaparde@broadcom.com;
> raveendra.padasalagi@broadcom.com; vikas.gupta@broadcom.com;
> zhangfei.gao@linaro.org; g.singh@nxp.com; jianjay.zhou@huawei.com; Daly,
> Lee <lee.daly@intel.com>
> Cc: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> Subject: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
> 
> Announce cryptodev changes to offload RSA asymmetric operation in VirtIO
> PMD.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> --
> RFC:
>   https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-2-
> gmuthukrishn@marvell.com/
>   https://patches.dpdk.org/project/dpdk/patch/20230928095300.1353-3-
> gmuthukrishn@marvell.com/
> ---
>  doc/guides/rel_notes/deprecation.rst | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..26fec84aba 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,14 @@ Deprecation Notices
>    will be deprecated and subsequently removed in DPDK 24.11 release.
>    Before this, the new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status.
> +
> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> +  breaking ABI. The new location is recommended to comply with
> +  virtio-crypto specification. Applications and drivers using
> +  this struct will be updated.
> +

The problem here, I see is that there is one private key but multiple combinations of padding.
Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.

> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> +  in either exponent or quintuple format is changed from union to
> +struct
> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> +  This change will not break existing applications.
This one I agree. RFC 8017 obsoletes RFC 3447.
> --
> 2.21.0


^ permalink raw reply	[relevance 0%]

* RE: [PATCH] doc: announce cryptodev change to support EDDSA
  2024-07-22 14:53  8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
  2024-07-24  5:07  0% ` Anoob Joseph
  2024-07-24  6:46  0% ` [EXTERNAL] " Akhil Goyal
@ 2024-07-25 15:01  0% ` Kusztal, ArkadiuszX
  2024-07-31 12:57  3%   ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Kusztal, ArkadiuszX @ 2024-07-25 15:01 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, Richardson, Bruce, ciara.power,
	jerinj, fanzhang.oss, Ji,  Kai, jack.bond-preston, Marchand,
	David, hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona,
	Doherty, Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
	maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
	ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
	jianjay.zhou, Daly, Lee
  Cc: Anoob Joseph, zhangfei.gao

> Announce the additions in cryptodev ABI to support EDDSA algorithm.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>

Acked-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
@ 2024-07-25 15:53  0%   ` Gowrishankar Muthukrishnan
  2024-07-30 14:39  0%     ` Gowrishankar Muthukrishnan
  2024-07-25 16:00  0%   ` Gowrishankar Muthukrishnan
  1 sibling, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25 15:53 UTC (permalink / raw)
  To: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
	ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
	jack.bond-preston, Marchand, David, hemant.agrawal,
	De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
	ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, Daly, Lee

[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]

> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from

> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,

> +  breaking ABI. The new location is recommended to comply with

> +  virtio-crypto specification. Applications and drivers using

> +  this struct will be updated.

> +



The problem here, I see is that there is one private key but multiple combinations of padding.

Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.

The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.



Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).

When we share the same private key for Sign (and its public key in case of Encryption) between

multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme

could potentially open door to used private key in the session and hence take advantage

on other crypto operations.



I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.

Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,

only that session could be destroyed.



Thanks,

Gowrishankar



Though padding schemes could be same



> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key

> +  in either exponent or quintuple format is changed from union to

> +struct

> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).

> +  This change will not break existing applications.

This one I agree. RFC 8017 obsoletes RFC 3447.



Thanks,

Gowrishankar

> --

> 2.21.0



[-- Attachment #2: Type: text/html, Size: 7504 bytes --]

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
  2024-07-25 15:53  0%   ` Gowrishankar Muthukrishnan
@ 2024-07-25 16:00  0%   ` Gowrishankar Muthukrishnan
  1 sibling, 0 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-25 16:00 UTC (permalink / raw)
  To: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
	ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
	jack.bond-preston, Marchand, David, hemant.agrawal,
	De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
	ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, Daly, Lee

[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]

Hi ArkadiuszX,


> +

> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from

> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,

> +  breaking ABI. The new location is recommended to comply with

> +  virtio-crypto specification. Applications and drivers using

> +  this struct will be updated.

> +



The problem here, I see is that there is one private key but multiple combinations of padding.

Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.

The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.





Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).

When we share the same private key for Sign (and its public key in case of Encryption) between

multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme

could potentially open door to used private key in the session and hence take advantage

on other crypto operations.



I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.

Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,

only that session could be destroyed.



Please share your thoughts.



> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key

> +  in either exponent or quintuple format is changed from union to

> +struct

> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).

> +  This change will not break existing applications.

This one I agree. RFC 8017 obsoletes RFC 3447.



Thanks,

Gowrishankar



> --

> 2.21.0



[-- Attachment #2: Type: text/html, Size: 7769 bytes --]

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: announce dmadev new capability addition
  @ 2024-07-29 15:20  3% ` Jerin Jacob
  2024-07-29 17:17  0%   ` Morten Brørup
  2024-07-31 10:24  0%   ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2024-07-29 15:20 UTC (permalink / raw)
  To: Vamsi Attunuru, Morten Brørup
  Cc: fengchengwen, dev, kevin.laatz, bruce.richardson, jerinj, anoobj

On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com> wrote:
>
> Announce addition of new capability flag and fields in

The new capability flag won't break ABI. We can mention only fields
update rte_dma_info and rte_dma_conf structures.

Another option is new set APIs for priority enablement.  The downside
is more code. All, opinions?


> rte_dma_info and rte_dma_conf structures.
>
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> ---
> RFC:
> https://patchwork.dpdk.org/project/dpdk/patch/20240729115558.263574-1-vattunuru@marvell.com/
>
>  doc/guides/rel_notes/deprecation.rst | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 6948641ff6..05d28473c0 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,8 @@ Deprecation Notices
>    will be deprecated and subsequently removed in DPDK 24.11 release.
>    Before this, the new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status.
> +
> +* dmadev: A new flag ``RTE_DMA_CAPA_QOS`` will be introduced to advertise
> +  dma device's QoS capability. Also new fields will be added in ``rte_dma_info``
> +  and ``rte_dma_conf`` structures to get device supported priority levels
> +  and to configure the required priority level.
> --
> 2.25.1
>

^ permalink raw reply	[relevance 3%]

* RE: [PATCH] doc: announce dmadev new capability addition
  2024-07-29 15:20  3% ` Jerin Jacob
@ 2024-07-29 17:17  0%   ` Morten Brørup
  2024-07-31 10:24  0%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-07-29 17:17 UTC (permalink / raw)
  To: Jerin Jacob, Vamsi Attunuru
  Cc: fengchengwen, dev, kevin.laatz, bruce.richardson, jerinj, anoobj

> From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> Sent: Monday, 29 July 2024 17.20
> 
> On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com>
> wrote:
> >
> > Announce addition of new capability flag and fields in
> > rte_dma_info and rte_dma_conf structures.
> 
> The new capability flag won't break ABI. We can mention only fields
> update rte_dma_info and rte_dma_conf structures.
> 
> Another option is new set APIs for priority enablement.  The downside
> is more code. All, opinions?

I think that this feature should be simple enough to expand the rte_dma_info and rte_dma_conf structures with a few new fields, rather than adding a new set of APIs for it.

It seems to become 1-level weighted priority scheduling of a few QoS classes, not hierarchical or anything complex enough to justify a new set of APIs. Just a simple array of per-class properties.

The max possible number of QoS classes (i.e. the array size) should be build time configurable. Considering Marvell hardware, it seems 4 would be a good default.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v8 5/5] dts: add API doc generation
  2024-07-12  8:57  3%   ` [PATCH v8 5/5] dts: add API doc generation Juraj Linkeš
@ 2024-07-30 13:51  0%     ` Thomas Monjalon
  2024-08-01 13:03  0%       ` Juraj Linkeš
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-07-30 13:51 UTC (permalink / raw)
  To: Juraj Linkeš
  Cc: Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte, dev, Luca Vizzarro

12/07/2024 10:57, Juraj Linkeš:
> The tool used to generate DTS API docs is Sphinx, which is already in
> use in DPDK. The same configuration is used to preserve style with one
> DTS-specific configuration (so that the DPDK docs are unchanged) that
> modifies how the sidebar displays the content.

What is changed in the sidebar?


> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -244,3 +244,6 @@ The public API headers are grouped by topics:
>    [experimental APIs](@ref rte_compat.h),
>    [ABI versioning](@ref rte_function_versioning.h),
>    [version](@ref rte_version.h)
> +
> +- **tests**:
> +  [**DTS**](@dts_api_main_page)

OK looks good


> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -124,6 +124,8 @@ SEARCHENGINE            = YES
>  SORT_MEMBER_DOCS        = NO
>  SOURCE_BROWSER          = YES
>  
> +ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"

Why is it needed?
That's the only way to reference it in doxy-api-index.md?
Would be nice to explain in the commit log.

> --- a/doc/api/meson.build
> +++ b/doc/api/meson.build
> +# A local reference must be relative to the main index.html page
> +# The path below can't be taken from the DTS meson file as that would
> +# require recursive subdir traversal (doc, dts, then doc again)

This comment is really obscure.

> +cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))

Oh I think I get it:
	- DTS_API_MAIN_PAGE is the Meson variable
	- dts_api_main_page is the Doxygen variable


> +# Napoleon enables the Google format of Python doscstrings, used in DTS
> +# Intersphinx allows linking to external projects, such as Python docs, also used in DTS

Close sentences with a dot, it is easier to read.

> +extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
> +
> +# DTS Python docstring options
> +autodoc_default_options = {
> +    'members': True,
> +    'member-order': 'bysource',
> +    'show-inheritance': True,
> +}
> +autodoc_class_signature = 'separated'
> +autodoc_typehints = 'both'
> +autodoc_typehints_format = 'short'
> +autodoc_typehints_description_target = 'documented'
> +napoleon_numpy_docstring = False
> +napoleon_attr_annotations = True
> +napoleon_preprocess_types = True
> +add_module_names = False
> +toc_object_entries = True
> +toc_object_entries_show_parents = 'hide'
> +intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
> +
> +dts_root = environ.get('DTS_ROOT')

Why does it need to be passed as an environment variable?
Isn't it a fixed absolute path?

> +if dts_root:
> +    path.append(dts_root)
> +    # DTS Sidebar config
> +    html_theme_options = {
> +        'collapse_navigation': False,
> +        'navigation_depth': -1,
> +    }

[...]

> +To build DTS API docs, install the dependencies with Poetry, then enter its shell:

I don't plan to use Poetry on my machine.
Can we simply describe the dependencies even if the versions are not specified?

> +
> +.. code-block:: console
> +
> +   poetry install --no-root --with docs
> +   poetry shell
> +
> +The documentation is built using the standard DPDK build system.
> +After executing the meson command and entering Poetry's shell, build the documentation with:
> +
> +.. code-block:: console
> +
> +   ninja -C build dts-doc

Don't we rely on the Meson option "enable_docs"?
> +
> +The output is generated in ``build/doc/api/dts/html``.
> +
> +.. Note::

In general the RST expressions are lowercase.

> +
> +   Make sure to fix any Sphinx warnings when adding or updating docstrings,
> +   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.

It looks like something to write in the contributing guide.


> +++ b/dts/doc/meson.build
> @@ -0,0 +1,27 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023 PANTHEON.tech s.r.o.
> +
> +sphinx = find_program('sphinx-build', required: false)
> +sphinx_apidoc = find_program('sphinx-apidoc', required: false)
> +
> +if not sphinx.found() or not sphinx_apidoc.found()

You should include the option "enable_docs" here.

> +    subdir_done()
> +endif




^ permalink raw reply	[relevance 0%]

* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-25 15:53  0%   ` Gowrishankar Muthukrishnan
@ 2024-07-30 14:39  0%     ` Gowrishankar Muthukrishnan
  2024-07-31 12:51  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-07-30 14:39 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, Kusztal, ArkadiuszX, dev,
	Anoob Joseph, Richardson, Bruce, ciara.power, Jerin Jacob,
	fanzhang.oss, Ji, Kai, jack.bond-preston, Marchand, David,
	hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona, Doherty,
	Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
	maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
	ajit.khaparde, raveendra.padasalagi, vikas.gupta, zhangfei.gao,
	g.singh, jianjay.zhou, Daly, Lee

[-- Attachment #1: Type: text/plain, Size: 2294 bytes --]

Hi,
We need to fix padding info in DPDK as per VirtIO specification in order to support RSA in virtio devices. VirtIO-crypto specification and DPDK specification differs in the way padding is handled.
With current DPDK & virtio specification, it is impossible to support RSA in virtio-crypto. If you think DPDK spec should not be modified, we will try to amend the virtIO spec to match DPDK, but since we do not know if the virtIO community would accept, can we merge the deprecation notice?

Thanks,
Gowrishankar

ZjQcmQRYFpfptBannerEnd

>>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from

>>> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,

>>> +  breaking ABI. The new location is recommended to comply with

>>> +  virtio-crypto specification. Applications and drivers using

>>> +  this struct will be updated.

>>> +



>> The problem here, I see is that there is one private key but multiple combinations of padding.

>> Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.

>> The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.



> Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).

> When we share the same private key for Sign (and its public key in case of Encryption) between

> multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme

> could potentially open door to used private key in the session and hence take advantage

> on other crypto operations.



> I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.

> Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,

> only that session could be destroyed.



>>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key

>>> +  in either exponent or quintuple format is changed from union to

>>> +struct

>>> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).

>>> +  This change will not break existing applications.

>>This one I agree. RFC 8017 obsoletes RFC 3447.



> Thanks,

> Gowrishankar



[-- Attachment #2: Type: text/html, Size: 8293 bytes --]

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: announce dmadev new capability addition
  2024-07-29 15:20  3% ` Jerin Jacob
  2024-07-29 17:17  0%   ` Morten Brørup
@ 2024-07-31 10:24  0%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 10:24 UTC (permalink / raw)
  To: Vamsi Attunuru, Morten Brørup, dev
  Cc: fengchengwen, kevin.laatz, bruce.richardson, jerinj, anoobj, Jerin Jacob

29/07/2024 17:20, Jerin Jacob:
> On Mon, Jul 29, 2024 at 6:19 PM Vamsi Attunuru <vattunuru@marvell.com> wrote:
> >
> > Announce addition of new capability flag and fields in
> 
> The new capability flag won't break ABI. We can mention only fields
> update rte_dma_info and rte_dma_conf structures.
> 
> Another option is new set APIs for priority enablement.  The downside
> is more code. All, opinions?
> 
> > rte_dma_info and rte_dma_conf structures.

I'm fine with just updating these structs.

> > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

Any other opinions?



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-30 14:39  0%     ` Gowrishankar Muthukrishnan
@ 2024-07-31 12:51  0%       ` Thomas Monjalon
  2024-07-31 14:26  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-07-31 12:51 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan
  Cc: Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
	ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
	jack.bond-preston, Marchand, David, hemant.agrawal,
	De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
	ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, Daly, Lee

30/07/2024 16:39, Gowrishankar Muthukrishnan:
> Hi,
> We need to fix padding info in DPDK as per VirtIO specification in order to support RSA in virtio devices. VirtIO-crypto specification and DPDK specification differs in the way padding is handled.
> With current DPDK & virtio specification, it is impossible to support RSA in virtio-crypto. If you think DPDK spec should not be modified, we will try to amend the virtIO spec to match DPDK, but since we do not know if the virtIO community would accept, can we merge the deprecation notice?

There is a long list of Cc but I see no support outside of Marvell.



> >>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> >>> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> >>> +  breaking ABI. The new location is recommended to comply with
> >>> +  virtio-crypto specification. Applications and drivers using
> >>> +  this struct will be updated.
> >>> +
> 
> 
> >> The problem here, I see is that there is one private key but multiple combinations of padding.
> >> Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
> >> The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
> > 
> > Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
> > When we share the same private key for Sign (and its public key in case of Encryption) between
> > multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
> > could potentially open door to used private key in the session and hence take advantage
> > on other crypto operations.
> > 
> > I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
> > Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
> > only that session could be destroyed.
> 
> 
> >>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> >>> +  in either exponent or quintuple format is changed from union to
> >>> +struct
> >>> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> >>> +  This change will not break existing applications.
> > >
> > > This one I agree. RFC 8017 obsoletes RFC 3447.




^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: announce cryptodev change to support EDDSA
  2024-07-25 15:01  0% ` Kusztal, ArkadiuszX
@ 2024-07-31 12:57  3%   ` Thomas Monjalon
  2024-08-07 17:21  0%     ` [EXTERNAL] " Gowrishankar Muthukrishnan
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-07-31 12:57 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan, dev, Richardson, Bruce, ciara.power,
	jerinj, fanzhang.oss, Ji, Kai, jack.bond-preston, Marchand,
	David, hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona,
	Doherty, Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
	maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
	ajit.khaparde, raveendra.padasalagi, vikas.gupta, g.singh,
	jianjay.zhou, Daly, Lee
  Cc: Anoob Joseph, zhangfei.gao, Kusztal, ArkadiuszX

25/07/2024 17:01, Kusztal, ArkadiuszX:
> > Announce the additions in cryptodev ABI to support EDDSA algorithm.
> > 
> > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> 
> Acked-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>

Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>

Applied, thanks.

It means we are not able to add an algo without breaking ABI.
Is it something we can improve?



^ permalink raw reply	[relevance 3%]

* Re: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-31 12:51  0%       ` Thomas Monjalon
@ 2024-07-31 14:26  0%         ` Thomas Monjalon
  2024-08-07 13:31  0%           ` Kusztal, ArkadiuszX
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-07-31 14:26 UTC (permalink / raw)
  To: Gowrishankar Muthukrishnan
  Cc: dev, Kusztal, ArkadiuszX, dev, Anoob Joseph, Richardson, Bruce,
	ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
	jack.bond-preston, Marchand, David, hemant.agrawal,
	De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
	ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, zhangfei.gao, g.singh,
	jianjay.zhou, Daly, Lee

I'm not sure why we don't have a consensus on an idea proposed as RFC in September 2023.

Because there is not enough involvement outside of the Marvell team,
I will keep a vague announce for the first item:

cryptodev: Some changes may happen to manage RSA padding for virtio-crypto.

The second item is applied verbatim, thanks.


31/07/2024 14:51, Thomas Monjalon:
> 30/07/2024 16:39, Gowrishankar Muthukrishnan:
> > Hi,
> > We need to fix padding info in DPDK as per VirtIO specification in order to support RSA in virtio devices. VirtIO-crypto specification and DPDK specification differs in the way padding is handled.
> > With current DPDK & virtio specification, it is impossible to support RSA in virtio-crypto. If you think DPDK spec should not be modified, we will try to amend the virtIO spec to match DPDK, but since we do not know if the virtIO community would accept, can we merge the deprecation notice?
> 
> There is a long list of Cc but I see no support outside of Marvell.
> 
> 
> 
> > >>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved from
> > >>> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform struct,
> > >>> +  breaking ABI. The new location is recommended to comply with
> > >>> +  virtio-crypto specification. Applications and drivers using
> > >>> +  this struct will be updated.
> > >>> +
> > 
> > 
> > >> The problem here, I see is that there is one private key but multiple combinations of padding.
> > >> Therefore, for every padding variation, we need to copy the same private key anew, duplicating it in memory.
> > >> The only reason for me to keep a session-like struct in asymmetric crypto was exactly this.
> > > 
> > > Each padding scheme in RSA has its own pros and cons (in terms of implementations as well).
> > > When we share the same private key for Sign (and its public key in case of Encryption) between
> > > multiple crypto ops (varying by padding schemes among cops), a vulnerable attack against one scheme
> > > could potentially open door to used private key in the session and hence take advantage
> > > on other crypto operations.
> > > 
> > > I think, this could be one reason for why VirtIO spec mandates padding info as session parameter.
> > > Hence, more than duplicating in memory, private and public keys are secured and in catastrophe,
> > > only that session could be destroyed.
> > 
> > 
> > >>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold private key
> > >>> +  in either exponent or quintuple format is changed from union to
> > >>> +struct
> > >>> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix A.1.2).
> > >>> +  This change will not break existing applications.
> > > >
> > > > This one I agree. RFC 8017 obsoletes RFC 3447.




^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4] doc: announce changes to dma device structures
  @ 2024-07-31 16:06  3%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 16:06 UTC (permalink / raw)
  To: Amit Prakash Shukla
  Cc: dev, fengchengwen, kevin.laatz, bruce.richardson, conor.walsh,
	gmuthukrishn, vvelumuri, g.singh, sachin.saxena, hemant.agrawal,
	dev, jerinj, vattunuru, anoobj, mb, Jerin Jacob

31/07/2024 13:01, Thomas Monjalon:
> 30/07/2024 19:27, Jerin Jacob:
> > On Tue, Jul 30, 2024 at 8:25 PM Amit Prakash Shukla
> > <amitprakashs@marvell.com> wrote:
> > >
> > > A new flag RTE_DMA_CAPA_QOS will be introduced to advertise dma
> > > device's QoS capability. In order to support the parameters for this
> > > flag, new fields will be added in rte_dma_info and rte_dma_conf
> > > structures to get device supported priority levels and to configure the
> > > required priority level.
> > >
> > > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> > > Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> > 
> > 
> > Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

The RFC and the deprecation notices are sent a bit late.
We cannot conclude there is consensus.

I propose to raise it to the techboard if an ABI breakage is still required for 24.11.
As dmadev is quite new, I don't think it is big issue.




^ permalink raw reply	[relevance 3%]

* DPDK 24.07 released
@ 2024-07-31 19:22  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-07-31 19:22 UTC (permalink / raw)
  To: announce

A new major release is available:
	https://fast.dpdk.org/rel/dpdk-24.07.tar.xz

This is the work achieved during the last months:
	954 commits from 191 authors
	1631 files changed, 80005 insertions(+), 25069 deletions(-)


It is not planned to start a maintenance branch for 24.07.
This version is ABI-compatible with 23.11 and 24.03.

Below are some new features:
	- pointer compression library
	- SVE support in the hash library
	- FEC support in Intel i40e and ice
	- Intel E830 support in ice driver
	- Napatech ntnic driver initialization
	- AMD Pensando ionic crypto driver
	- UADK compress driver
	- Marvell Odyssey ODM DMA driver
	- log cleanups in drivers
	- more cleanups to prepare MSVC build

More details in the release notes:
	https://doc.dpdk.org/guides/rel_notes/release_24_07.html


There are 49 new contributors (including authors, reviewers and testers).
Welcome to Adrian Pielech, Alessio Igor Bogani, Alex Chapman,
Alexander Skorichenko, Anthony Harivel, Aviraj CJ, Barbara Skobiej,
Chenming Chang, Daniel Gregory, Daniil Ushkov, Dean Marx, Eryk Rybak,
Francis Racicot, Haoqian He, Harjot Singh, Hongbo Li, Igor Gutorov,
Jan Sokolowski, Jedrzej Jagielski, Jiang Yu, Joel Kavanagh, Jun Wang,
Kiran Vedere, Mahmoud Maatuq, Marcin Jurczak, Marek Mical,
Marek Zalfresso-jundzillo, Michael Theodore Stolarchuk, Pabitra Dalai,
Pawel Sobczyk, Piotr Raczynski, Potnuri Bharat Teja,
Prathisna Padmasanan, Przemek Kitszel, Radoslaw Tyl, Remigiusz Konca,
Serhii Iliushyk, Shaiq Wani, Shreesh Adiga, Shuo Li, Soumyadeep Hore,
Sriram Yagnaraman, Tathagat Priyadarshi, Tomasz Wakula, Varun Sethi,
Vipin Padmam Ramesh, Waldemar Dworakowski, Yoan Picchi, Yochai Hagvi, 
and Yuan Zhiyuan.


Below is the number of commits per employer (with authors count):
	247     Intel (77)
	135     NVIDIA (23)
	111     Corigine (5)
	 91     Marvell (18)
	 54     Red Hat (5)
	 46     AMD (5)
	 40     Arm (5)
	 39     networkplumber.org (1)
	 36     Microsoft (3)
	 23     NXP (7)
	 22     Napatech (2)
	 21     Trustnet (1)
	 18     Huawei (5)
	 15     Amazon (1)
	        ...

A big thank to all courageous people who took on the non rewarding task
of reviewing other's job.
Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
	 35     Morten Brørup <mb@smartsharesystems.com>
	 27     Juraj Linkeš <juraj.linkes@pantheon.tech>
	 25     Ferruh Yigit <ferruh.yigit@amd.com>
	 24     Jeremy Spewock <jspewock@iol.unh.edu>
	 24     Akhil Goyal <gakhil@marvell.com>
	 22     Paul Szczepanek <paul.szczepanek@arm.com>
	 19     Bruce Richardson <bruce.richardson@intel.com>
	 18     Stephen Hemminger <stephen@networkplumber.org>


The next version will be 24.11 in November.
The new features for 24.11 can be submitted during the next 5 weeks:
	http://core.dpdk.org/roadmap#dates
Please share your roadmap.


Don't forget to register for the DPDK Summit in September:
	https://events.linuxfoundation.org/dpdk-summit/

Thanks everyone, see you in Montreal



^ permalink raw reply	[relevance 3%]

* [PATCH v9 5/5] dts: add API doc generation
  @ 2024-08-01  9:18  3%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-01  9:18 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py |  3 +++
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             |  4 ++++
 doc/guides/conf.py              | 33 +++++++++++++++++++++++++++++++-
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 ++++++++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 ++++++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++++
 meson.build                     |  1 +
 10 files changed, 122 insertions(+), 2 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 2034160049..102f496599 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -16,10 +16,13 @@
 parser.add_argument('version')
 parser.add_argument('src')
 parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
 args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
 sphinx_cmd = [args.sphinx] + extra_args
 
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,10 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 8b440fb2a9..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -9,7 +9,7 @@
 from os import environ
 from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -23,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings,
+   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v10 5/5] dts: add API doc generation
  @ 2024-08-01  9:37  3%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-01  9:37 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Luca Vizzarro

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Luca Vizzarro <luca.vizzarro@arm.com>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py |  5 ++++-
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             |  4 ++++
 doc/guides/conf.py              | 33 +++++++++++++++++++++++++++++++-
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 ++++++++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 ++++++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++++
 meson.build                     |  1 +
 10 files changed, 123 insertions(+), 3 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 2034160049..c55d4f6bc9 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -16,10 +16,13 @@
 parser.add_argument('version')
 parser.add_argument('src')
 parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
 args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
 sphinx_cmd = [args.sphinx] + extra_args
 
@@ -46,7 +49,7 @@
 css = 'custom.css'
 src_css = join(args.src, css)
 dst_css = join(args.dst, 'html', '_static', 'css', css)
-if not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css):
+if os.path.exists(src_css) and (not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css)):
     os.makedirs(os.path.dirname(dst_css), exist_ok=True)
     shutil.copyfile(src_css, dst_css)
 
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,10 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 8b440fb2a9..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -9,7 +9,7 @@
 from os import environ
 from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -23,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..77df7a0378 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -292,7 +292,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +432,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system.
+After executing the meson command and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings,
+   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v8 5/5] dts: add API doc generation
  2024-07-30 13:51  0%     ` Thomas Monjalon
@ 2024-08-01 13:03  0%       ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-01 13:03 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte, dev



On 30. 7. 2024 15:51, Thomas Monjalon wrote:
> 12/07/2024 10:57, Juraj Linkeš:
>> The tool used to generate DTS API docs is Sphinx, which is already in
>> use in DPDK. The same configuration is used to preserve style with one
>> DTS-specific configuration (so that the DPDK docs are unchanged) that
>> modifies how the sidebar displays the content.
> 
> What is changed in the sidebar?
> 

These are the two changes:
html_theme_options = {
     'collapse_navigation': False,
     'navigation_depth': -1,
}

The first allows you to explore the structure without needing to enter 
any specific section - it puts the + at each section so everything is 
expandable.
The second just means that each section can be fully expanded (there's 
no limit).

> 
>> --- a/doc/api/doxy-api-index.md
>> +++ b/doc/api/doxy-api-index.md
>> @@ -244,3 +244,6 @@ The public API headers are grouped by topics:
>>     [experimental APIs](@ref rte_compat.h),
>>     [ABI versioning](@ref rte_function_versioning.h),
>>     [version](@ref rte_version.h)
>> +
>> +- **tests**:
>> +  [**DTS**](@dts_api_main_page)
> 
> OK looks good
> 
> 
>> --- a/doc/api/doxy-api.conf.in
>> +++ b/doc/api/doxy-api.conf.in
>> @@ -124,6 +124,8 @@ SEARCHENGINE            = YES
>>   SORT_MEMBER_DOCS        = NO
>>   SOURCE_BROWSER          = YES
>>   
>> +ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
> 
> Why is it needed?
> That's the only way to reference it in doxy-api-index.md?
> Would be nice to explain in the commit log.
> 

I can add something to the commit log. The questions are answered below, 
in your other related comment.

>> --- a/doc/api/meson.build
>> +++ b/doc/api/meson.build
>> +# A local reference must be relative to the main index.html page
>> +# The path below can't be taken from the DTS meson file as that would
>> +# require recursive subdir traversal (doc, dts, then doc again)
> 
> This comment is really obscure.
> 

I guess it is. I just wanted to explain that there's not way to do this 
without spelling out the path this way. At least I didn't find a way.
Should I remove the comment or reword it?

>> +cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
> 
> Oh I think I get it:
> 	- DTS_API_MAIN_PAGE is the Meson variable
> 	- dts_api_main_page is the Doxygen variable
> 

Yes, this is a way to make it work. Maybe there's something else (I'm 
not that familiar with Doxygen), but from what I can tell, there wasn't 
a command line option that would set a variable (passing the path form 
Meson to Doxygen) and nothing else I found worked.

Is this solution ok? If we want to explore something else, is there 
someone with more experience with Doxygen who could help?

> 
>> +# Napoleon enables the Google format of Python doscstrings, used in DTS
>> +# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
> 
> Close sentences with a dot, it is easier to read.
> 

Ack.

>> +extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
>> +
>> +# DTS Python docstring options
>> +autodoc_default_options = {
>> +    'members': True,
>> +    'member-order': 'bysource',
>> +    'show-inheritance': True,
>> +}
>> +autodoc_class_signature = 'separated'
>> +autodoc_typehints = 'both'
>> +autodoc_typehints_format = 'short'
>> +autodoc_typehints_description_target = 'documented'
>> +napoleon_numpy_docstring = False
>> +napoleon_attr_annotations = True
>> +napoleon_preprocess_types = True
>> +add_module_names = False
>> +toc_object_entries = True
>> +toc_object_entries_show_parents = 'hide'
>> +intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
>> +
>> +dts_root = environ.get('DTS_ROOT')
> 
> Why does it need to be passed as an environment variable?
> Isn't it a fixed absolute path?
> 

The path to DTS needs to be passed in some way (and added to sys.path) 
so that Sphinx knows where the sources are in order to import them.

Do you want us to not pass the path, but just hardcode it here? I didn't 
really think about that, maybe that could work.

>> +if dts_root:
>> +    path.append(dts_root)
>> +    # DTS Sidebar config
>> +    html_theme_options = {
>> +        'collapse_navigation': False,
>> +        'navigation_depth': -1,
>> +    }
> 
> [...]
> 
>> +To build DTS API docs, install the dependencies with Poetry, then enter its shell:
> 
> I don't plan to use Poetry on my machine.
> Can we simply describe the dependencies even if the versions are not specified?
> 

The reason we don't list the dependencies anywhere is that doing it with 
Poetry is much easier (and a bit safer, as Poetry is going to install 
tested versions).

But I can add references to the two relevant sections of 
dts/pyproject.toml which contain the dependencies with a note that they 
can be installed with pip (and I guess that would be another 
dependency), but at that point it's that not much different than using 
Poetry.

>> +
>> +.. code-block:: console
>> +
>> +   poetry install --no-root --with docs
>> +   poetry shell
>> +
>> +The documentation is built using the standard DPDK build system.
>> +After executing the meson command and entering Poetry's shell, build the documentation with:
>> +
>> +.. code-block:: console
>> +
>> +   ninja -C build dts-doc
> 
> Don't we rely on the Meson option "enable_docs"?

I had a discussion about this with Bruce, but I can't find it anywhere, 
so here's what I remember:
1. We didn't want to tie the dts api doc build to dpdk doc build because 
of the dependencies.
2. There's a way to build docs without the enable_docs option (running 
ninja with the target), which is what we added for dts. This doesn't tie 
the dts api doc build to the dpdk doc build.
3. We had an "enable_dts_docs" Meson option in the past (to keep it 
separate from dpdk doc build), but decided to drop it. My memory is hazy 
on this, but I think it was, again, because of the additional steps 
needed to bring up the dependency (poetry shell) - at that point, 
supporting just the ninja build way is sufficient. Bruce may shed more 
light on this.

>> +
>> +The output is generated in ``build/doc/api/dts/html``.
>> +
>> +.. Note::
> 
> In general the RST expressions are lowercase.
> 

Ack.

>> +
>> +   Make sure to fix any Sphinx warnings when adding or updating docstrings,
>> +   and also run the ``devtools/dts-check-format.sh`` script and address any issues it finds.
> 
> It looks like something to write in the contributing guide.
> 

I could add it there, where is the right place? In patches.rst, section 
"Checking the Patches"?

> 
>> +++ b/dts/doc/meson.build
>> @@ -0,0 +1,27 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2023 PANTHEON.tech s.r.o.
>> +
>> +sphinx = find_program('sphinx-build', required: false)
>> +sphinx_apidoc = find_program('sphinx-apidoc', required: false)
>> +
>> +if not sphinx.found() or not sphinx_apidoc.found()
> 
> You should include the option "enable_docs" here.
> 
>> +    subdir_done()
>> +endif
> 
> 
> 

^ permalink raw reply	[relevance 0%]

* [PATCH v11 1/7] mbuf: replace term sanity check
  @ 2024-08-01 15:46  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-08-01 15:46 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Andrew Rybchenko, Morten Brørup

Replace rte_mbuf_sanity_check() with rte_mbuf_verify()
to match the similar macro RTE_VERIFY() in rte_debug.h

The term sanity check is on the Tier 2 list of words
that should be replaced.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_mbuf.c                 | 28 +++++------
 doc/guides/prog_guide/mbuf_lib.rst   |  4 +-
 doc/guides/rel_notes/deprecation.rst |  3 ++
 drivers/net/avp/avp_ethdev.c         | 18 +++----
 drivers/net/sfc/sfc_ef100_rx.c       |  6 +--
 drivers/net/sfc/sfc_ef10_essb_rx.c   |  4 +-
 drivers/net/sfc/sfc_ef10_rx.c        |  4 +-
 drivers/net/sfc/sfc_rx.c             |  2 +-
 examples/ipv4_multicast/main.c       |  2 +-
 lib/mbuf/rte_mbuf.c                  | 23 +++++----
 lib/mbuf/rte_mbuf.h                  | 71 +++++++++++++++-------------
 lib/mbuf/version.map                 |  1 +
 12 files changed, 90 insertions(+), 76 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 17be977f31..3fbb5dea8b 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -262,8 +262,8 @@ test_one_pktmbuf(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("Buffer should be continuous");
 	memset(hdr, 0x55, MBUF_TEST_HDR2_LEN);
 
-	rte_mbuf_sanity_check(m, 1);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 1);
+	rte_mbuf_verify(m, 0);
 	rte_pktmbuf_dump(stdout, m, 0);
 
 	/* this prepend should fail */
@@ -1162,7 +1162,7 @@ test_refcnt_mbuf(void)
 
 #ifdef RTE_EXEC_ENV_WINDOWS
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	RTE_SET_USED(pktmbuf_pool);
 	return TEST_SKIPPED;
@@ -1181,12 +1181,12 @@ mbuf_check_pass(struct rte_mbuf *buf)
 }
 
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	struct rte_mbuf *buf;
 	struct rte_mbuf badbuf;
 
-	printf("Checking rte_mbuf_sanity_check for failure conditions\n");
+	printf("Checking rte_mbuf_verify for failure conditions\n");
 
 	/* get a good mbuf to use to make copies */
 	buf = rte_pktmbuf_alloc(pktmbuf_pool);
@@ -1708,7 +1708,7 @@ test_mbuf_validate_tx_offload(const char *test_name,
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 	m->ol_flags = ol_flags;
 	m->tso_segsz = segsize;
 	ret = rte_validate_tx_offload(m);
@@ -1915,7 +1915,7 @@ test_pktmbuf_read(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	data = rte_pktmbuf_append(m, MBUF_TEST_DATA_LEN2);
 	if (data == NULL)
@@ -1964,7 +1964,7 @@ test_pktmbuf_read_from_offset(struct rte_mempool *pktmbuf_pool)
 
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	/* prepend an ethernet header */
 	hdr = (struct ether_hdr *)rte_pktmbuf_prepend(m, hdr_len);
@@ -2109,7 +2109,7 @@ create_packet(struct rte_mempool *pktmbuf_pool,
 			GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 		if (rte_pktmbuf_pkt_len(pkt_seg) != 0)
 			GOTO_FAIL("%s: Bad packet length\n", __func__);
-		rte_mbuf_sanity_check(pkt_seg, 0);
+		rte_mbuf_verify(pkt_seg, 0);
 		/* Add header only for the first segment */
 		if (test_data->flags == MBUF_HEADER && seg == 0) {
 			hdr_len = sizeof(struct rte_ether_hdr);
@@ -2321,7 +2321,7 @@ test_pktmbuf_ext_shinfo_init_helper(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	ext_buf_addr = rte_malloc("External buffer", buf_len,
 			RTE_CACHE_LINE_SIZE);
@@ -2482,8 +2482,8 @@ test_pktmbuf_ext_pinned_buffer(struct rte_mempool *std_pool)
 		GOTO_FAIL("%s: test_pktmbuf_copy(pinned) failed\n",
 			  __func__);
 
-	if (test_failing_mbuf_sanity_check(pinned_pool) < 0)
-		GOTO_FAIL("%s: test_failing_mbuf_sanity_check(pinned)"
+	if (test_failing_mbuf_verify(pinned_pool) < 0)
+		GOTO_FAIL("%s: test_failing_mbuf_verify(pinned)"
 			  " failed\n", __func__);
 
 	if (test_mbuf_linearize_check(pinned_pool) < 0)
@@ -2857,8 +2857,8 @@ test_mbuf(void)
 		goto err;
 	}
 
-	if (test_failing_mbuf_sanity_check(pktmbuf_pool) < 0) {
-		printf("test_failing_mbuf_sanity_check() failed\n");
+	if (test_failing_mbuf_verify(pktmbuf_pool) < 0) {
+		printf("test_failing_mbuf_verify() failed\n");
 		goto err;
 	}
 
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 749f9c97a8..0a197437a0 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -266,8 +266,8 @@ can be found in several of the sample applications, for example, the IPv4 Multic
 Debug
 -----
 
-In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption,
-bad type, and so on).
+In debug mode, the functions of the mbuf library perform consistency checks
+before any operation (such as, buffer corruption, bad type, and so on).
 
 Use Cases
 ---------
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 211f59fdc9..17f08500aa 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -230,3 +230,6 @@ Deprecation Notices
   The structures ``rte_node``, ``rte_node_register``
   and ``rte_graph_cluster_node_stats`` will be extended
   to include node error counters and error description.
+
+* mbuf: The function ``rte_mbuf_sanity_check`` is deprecated.
+  Use the new function ``rte_mbuf_verify`` instead.
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index 6733462c86..bafc08fd60 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -1231,7 +1231,7 @@ _avp_mac_filter(struct avp_dev *avp, struct rte_mbuf *m)
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 static inline void
-__avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
+__avp_dev_buffer_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 {
 	struct rte_avp_desc *first_buf;
 	struct rte_avp_desc *pkt_buf;
@@ -1272,12 +1272,12 @@ __avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 			  first_buf->pkt_len, pkt_len);
 }
 
-#define avp_dev_buffer_sanity_check(a, b) \
-	__avp_dev_buffer_sanity_check((a), (b))
+#define avp_dev_buffer_check(a, b) \
+	__avp_dev_buffer_check((a), (b))
 
 #else /* RTE_LIBRTE_AVP_DEBUG_BUFFERS */
 
-#define avp_dev_buffer_sanity_check(a, b) do {} while (0)
+#define avp_dev_buffer_check(a, b) do {} while (0)
 
 #endif
 
@@ -1302,7 +1302,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	void *pkt_data;
 	unsigned int i;
 
-	avp_dev_buffer_sanity_check(avp, buf);
+	avp_dev_buffer_check(avp, buf);
 
 	/* setup the first source buffer */
 	pkt_buf = avp_dev_translate_buffer(avp, buf);
@@ -1370,7 +1370,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	rte_pktmbuf_pkt_len(m) = total_length;
 	m->vlan_tci = vlan_tci;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	return m;
 }
@@ -1614,7 +1614,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 	char *pkt_data;
 	unsigned int i;
 
-	__rte_mbuf_sanity_check(mbuf, 1);
+	__rte_mbuf_verify(mbuf, 1);
 
 	m = mbuf;
 	src_offset = 0;
@@ -1680,7 +1680,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 		first_buf->vlan_tci = mbuf->vlan_tci;
 	}
 
-	avp_dev_buffer_sanity_check(avp, buffers[0]);
+	avp_dev_buffer_check(avp, buffers[0]);
 
 	return total_length;
 }
@@ -1798,7 +1798,7 @@ avp_xmit_scattered_pkts(void *tx_queue,
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 	for (i = 0; i < nb_pkts; i++)
-		avp_dev_buffer_sanity_check(avp, tx_bufs[i]);
+		avp_dev_buffer_check(avp, tx_bufs[i]);
 #endif
 
 	/* send the packets */
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index e283879e6b..5ebfba4dcf 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -179,7 +179,7 @@ sfc_ef100_rx_qrefill(struct sfc_ef100_rxq *rxq)
 			struct sfc_ef100_rx_sw_desc *rxd;
 			rte_iova_t dma_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			dma_addr = rte_mbuf_data_iova_default(m);
 			if (rxq->flags & SFC_EF100_RXQ_NIC_DMA_MAP) {
@@ -551,7 +551,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 		rxq->ready_pkts--;
 
 		pkt = sfc_ef100_rx_next_mbuf(rxq);
-		__rte_mbuf_raw_sanity_check(pkt);
+		__rte_mbuf_raw_verify(pkt);
 
 		RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data[0]) !=
 				 sizeof(rxq->rearm_data));
@@ -575,7 +575,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 			struct rte_mbuf *seg;
 
 			seg = sfc_ef100_rx_next_mbuf(rxq);
-			__rte_mbuf_raw_sanity_check(seg);
+			__rte_mbuf_raw_verify(seg);
 
 			seg->data_off = RTE_PKTMBUF_HEADROOM;
 
diff --git a/drivers/net/sfc/sfc_ef10_essb_rx.c b/drivers/net/sfc/sfc_ef10_essb_rx.c
index 78bd430363..74647e2792 100644
--- a/drivers/net/sfc/sfc_ef10_essb_rx.c
+++ b/drivers/net/sfc/sfc_ef10_essb_rx.c
@@ -125,7 +125,7 @@ sfc_ef10_essb_next_mbuf(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -136,7 +136,7 @@ sfc_ef10_essb_mbuf_by_index(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + idx * rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
diff --git a/drivers/net/sfc/sfc_ef10_rx.c b/drivers/net/sfc/sfc_ef10_rx.c
index 60442930b3..f4fc815570 100644
--- a/drivers/net/sfc/sfc_ef10_rx.c
+++ b/drivers/net/sfc/sfc_ef10_rx.c
@@ -148,7 +148,7 @@ sfc_ef10_rx_qrefill(struct sfc_ef10_rxq *rxq)
 			struct sfc_ef10_rx_sw_desc *rxd;
 			rte_iova_t phys_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			SFC_ASSERT((id & ~ptr_mask) == 0);
 			rxd = &rxq->sw_ring[id];
@@ -297,7 +297,7 @@ sfc_ef10_rx_process_event(struct sfc_ef10_rxq *rxq, efx_qword_t rx_ev,
 		rxd = &rxq->sw_ring[pending++ & ptr_mask];
 		m = rxd->mbuf;
 
-		__rte_mbuf_raw_sanity_check(m);
+		__rte_mbuf_raw_verify(m);
 
 		m->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_pktmbuf_data_len(m) = seg_len;
diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index a193229265..c885ce2b05 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -120,7 +120,7 @@ sfc_efx_rx_qrefill(struct sfc_efx_rxq *rxq)
 		     ++i, id = (id + 1) & rxq->ptr_mask) {
 			m = objs[i];
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			rxd = &rxq->sw_desc[id];
 			rxd->mbuf = m;
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 1eed645d02..3bfab37012 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -258,7 +258,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	hdr->pkt_len = (uint16_t)(hdr->data_len + pkt->pkt_len);
 	hdr->nb_segs = pkt->nb_segs + 1;
 
-	__rte_mbuf_sanity_check(hdr, 1);
+	__rte_mbuf_verify(hdr, 1);
 	return hdr;
 }
 /* >8 End of mcast_out_kt. */
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 559d5ad8a7..fc5d4ba29d 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -367,9 +367,9 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
 	return mp;
 }
 
-/* do some sanity checks on a mbuf: panic if it fails */
+/* do some checks on a mbuf: panic if it fails */
 void
-rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header)
 {
 	const char *reason;
 
@@ -377,6 +377,13 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
 		rte_panic("%s\n", reason);
 }
 
+/* For ABI compatibility, to be removed in next release */
+void
+rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+{
+	rte_mbuf_verify(m, is_header);
+}
+
 int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason)
 {
@@ -496,7 +503,7 @@ void rte_pktmbuf_free_bulk(struct rte_mbuf **mbufs, unsigned int count)
 		if (unlikely(m == NULL))
 			continue;
 
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 		do {
 			m_next = m->next;
@@ -546,7 +553,7 @@ rte_pktmbuf_clone(struct rte_mbuf *md, struct rte_mempool *mp)
 		return NULL;
 	}
 
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -596,7 +603,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	struct rte_mbuf *mc, *m_last, **prev;
 
 	/* garbage in check */
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	/* check for request to copy at offset past end of mbuf */
 	if (unlikely(off >= m->pkt_len))
@@ -660,7 +667,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	}
 
 	/* garbage out check */
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -671,7 +678,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	unsigned int len;
 	unsigned int nb_segs;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	fprintf(f, "dump mbuf at %p, iova=%#" PRIx64 ", buf_len=%u\n", m, rte_mbuf_iova_get(m),
 		m->buf_len);
@@ -689,7 +696,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	nb_segs = m->nb_segs;
 
 	while (m && nb_segs != 0) {
-		__rte_mbuf_sanity_check(m, 0);
+		__rte_mbuf_verify(m, 0);
 
 		fprintf(f, "  segment at %p, data=%p, len=%u, off=%u, refcnt=%u\n",
 			m, rte_pktmbuf_mtod(m, void *),
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index babe16c72c..35e89e60e2 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -339,13 +339,13 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
+/**  do mbuf type in debug mode */
+#define __rte_mbuf_verify(m, is_h) rte_mbuf_verify(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
+/**  ignore mbuf checks if not in debug mode */
+#define __rte_mbuf_verify(m, is_h) do { } while (0)
 
 #endif /*  RTE_LIBRTE_MBUF_DEBUG */
 
@@ -514,10 +514,9 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 
 
 /**
- * Sanity checks on an mbuf.
+ * Check that the mbuf is valid and panic if corrupted.
  *
- * Check the consistency of the given mbuf. The function will cause a
- * panic if corruption is detected.
+ * Acts assertion that mbuf is consistent. If not it calls rte_panic().
  *
  * @param m
  *   The mbuf to be checked.
@@ -526,13 +525,17 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
  *   of a packet (in this case, some fields like nb_segs are not checked)
  */
 void
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header);
+
+/* Older deprecated name for rte_mbuf_verify() */
+void __rte_deprecated
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 /**
- * Sanity checks on a mbuf.
+ * Do consistency checks on a mbuf.
  *
- * Almost like rte_mbuf_sanity_check(), but this function gives the reason
- * if corruption is detected rather than panic.
+ * Check the consistency of the given mbuf and if not valid
+ * return the reason.
  *
  * @param m
  *   The mbuf to be checked.
@@ -551,7 +554,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason);
 
 /**
- * Sanity checks on a reinitialized mbuf in debug mode.
+ * Do checks on a reinitialized mbuf in debug mode.
  *
  * Check the consistency of the given reinitialized mbuf.
  * The function will cause a panic if corruption is detected.
@@ -563,16 +566,16 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
  *   The mbuf to be checked.
  */
 static __rte_always_inline void
-__rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
+__rte_mbuf_raw_verify(__rte_unused const struct rte_mbuf *m)
 {
 	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
 	RTE_ASSERT(m->next == NULL);
 	RTE_ASSERT(m->nb_segs == 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 }
 
 /** For backwards compatibility. */
-#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check(m)
+#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_verify(m)
 
 /**
  * Allocate an uninitialized mbuf from mempool *mp*.
@@ -599,7 +602,7 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
 
 	if (rte_mempool_get(mp, (void **)&m) < 0)
 		return NULL;
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -622,7 +625,7 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
 {
 	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
 		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	rte_mempool_put(m->pool, m);
 }
 
@@ -885,7 +888,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 	rte_pktmbuf_reset_headroom(m);
 
 	m->data_len = 0;
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 }
 
 /**
@@ -941,22 +944,22 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
 	switch (count % 4) {
 	case 0:
 		while (idx != count) {
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 3:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 2:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 1:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
@@ -1187,8 +1190,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 	mi->pkt_len = mi->data_len;
 	mi->nb_segs = 1;
 
-	__rte_mbuf_sanity_check(mi, 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(mi, 1);
+	__rte_mbuf_verify(m, 0);
 }
 
 /**
@@ -1343,7 +1346,7 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 static __rte_always_inline struct rte_mbuf *
 rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 
 	if (likely(rte_mbuf_refcnt_read(m) == 1)) {
 
@@ -1414,7 +1417,7 @@ static inline void rte_pktmbuf_free(struct rte_mbuf *m)
 	struct rte_mbuf *m_next;
 
 	if (m != NULL)
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 	while (m != NULL) {
 		m_next = m->next;
@@ -1495,7 +1498,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
  */
 static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	do {
 		rte_mbuf_refcnt_update(m, v);
@@ -1512,7 +1515,7 @@ static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
  */
 static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return m->data_off;
 }
 
@@ -1526,7 +1529,7 @@ static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
  */
 static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return (uint16_t)(m->buf_len - rte_pktmbuf_headroom(m) -
 			  m->data_len);
 }
@@ -1541,7 +1544,7 @@ static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
  */
 static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	while (m->next != NULL)
 		m = m->next;
 	return m;
@@ -1585,7 +1588,7 @@ static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 static inline char *rte_pktmbuf_prepend(struct rte_mbuf *m,
 					uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > rte_pktmbuf_headroom(m)))
 		return NULL;
@@ -1620,7 +1623,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
 	void *tail;
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > rte_pktmbuf_tailroom(m_last)))
@@ -1648,7 +1651,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
  */
 static inline char *rte_pktmbuf_adj(struct rte_mbuf *m, uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > m->data_len))
 		return NULL;
@@ -1680,7 +1683,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
 {
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > m_last->data_len))
@@ -1702,7 +1705,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
  */
 static inline int rte_pktmbuf_is_contiguous(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	return m->nb_segs == 1;
 }
 
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index daa65e2bbd..c85370e430 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -31,6 +31,7 @@ DPDK_24 {
 	rte_mbuf_set_platform_mempool_ops;
 	rte_mbuf_set_user_mempool_ops;
 	rte_mbuf_user_mempool_ops;
+	rte_mbuf_verify;
 	rte_pktmbuf_clone;
 	rte_pktmbuf_copy;
 	rte_pktmbuf_dump;
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v11 5/5] dts: add API doc generation
  @ 2024-08-05 13:59  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-05 13:59 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           | 10 ++-
 buildtools/get-dts-deps.py                | 78 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/meson.build                       |  1 +
 doc/guides/conf.py                        | 41 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/meson.build                    |  1 +
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 dts/doc/meson.build                       | 30 +++++++++
 dts/meson.build                           | 15 +++++
 meson.build                               |  1 +
 14 files changed, 225 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-deps.py
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..5dd59907cd 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,11 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+conf_src = src
+if src.find('dts') != -1:
+    if '-c' in extra_args:
+        conf_src = extra_args[extra_args.index('-c') + 1]
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
@@ -23,6 +28,9 @@
 for root, dirs, files in os.walk(src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(dst):
+    os.makedirs(dst)
+
 # run sphinx, putting the html output in a "html" directory
 with open(join(dst, 'sphinx_html.out'), 'w') as out:
     process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
@@ -34,7 +42,7 @@
 
 # copy custom CSS file
 css = 'custom.css'
-src_css = join(src, css)
+src_css = join(conf_src, css)
 dst_css = join(dst, 'html', '_static', 'css', css)
 if not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css):
     os.makedirs(os.path.dirname(dst_css), exist_ok=True)
diff --git a/buildtools/get-dts-deps.py b/buildtools/get-dts-deps.py
new file mode 100755
index 0000000000..7114aeb710
--- /dev/null
+++ b/buildtools/get-dts-deps.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script returns the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+a returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_version_tuple(version_str):
+    return tuple(map(int, version_str.split(".")))
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    deps = {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+    doc_deps_section = cfg['tool.poetry.group.docs.dependencies']
+    doc_deps = {dep: doc_deps_section[dep].strip("\"'") for dep in doc_deps_section}
+
+    return deps | doc_deps
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = _get_version_tuple(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = _get_version_tuple(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = _get_version_tuple(platform.python_version())
+        req_ver = _get_version_tuple(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..599653bea4 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_deps = py3 + files('get-dts-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..b893931b92 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..eab3387874 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +24,45 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS.
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS.
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options.
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+if environ.get('DTS_BUILD'):
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-deps').get_missing_imports()
+
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd715f8072 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..c2df99bbc6
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_deps).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..6ed3c93fe1
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v12 5/5] dts: add API doc generation
  @ 2024-08-06  6:14  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-06  6:14 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           | 10 ++-
 buildtools/get-dts-deps.py                | 78 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/meson.build                       |  1 +
 doc/guides/conf.py                        | 41 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/meson.build                    |  1 +
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 dts/doc/meson.build                       | 30 +++++++++
 dts/meson.build                           | 15 +++++
 meson.build                               |  1 +
 14 files changed, 225 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-deps.py
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..5dd59907cd 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,11 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+conf_src = src
+if src.find('dts') != -1:
+    if '-c' in extra_args:
+        conf_src = extra_args[extra_args.index('-c') + 1]
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
@@ -23,6 +28,9 @@
 for root, dirs, files in os.walk(src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(dst):
+    os.makedirs(dst)
+
 # run sphinx, putting the html output in a "html" directory
 with open(join(dst, 'sphinx_html.out'), 'w') as out:
     process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
@@ -34,7 +42,7 @@
 
 # copy custom CSS file
 css = 'custom.css'
-src_css = join(src, css)
+src_css = join(conf_src, css)
 dst_css = join(dst, 'html', '_static', 'css', css)
 if not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css):
     os.makedirs(os.path.dirname(dst_css), exist_ok=True)
diff --git a/buildtools/get-dts-deps.py b/buildtools/get-dts-deps.py
new file mode 100755
index 0000000000..309b83cb5c
--- /dev/null
+++ b/buildtools/get-dts-deps.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script returns the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+a returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_version_tuple(version_str):
+    return tuple(map(int, version_str.split(".")))
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    deps = {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+    doc_deps_section = cfg['tool.poetry.group.docs.dependencies']
+    doc_deps = {dep: doc_deps_section[dep].strip("\"'") for dep in doc_deps_section}
+
+    return deps | doc_deps
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = _get_version_tuple(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = _get_version_tuple(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = _get_version_tuple(platform.python_version())
+        req_ver = _get_version_tuple(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..599653bea4 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_deps = py3 + files('get-dts-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..b893931b92 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..eab3387874 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +24,45 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS.
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS.
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options.
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+if environ.get('DTS_BUILD'):
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-deps').get_missing_imports()
+
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd715f8072 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..c2df99bbc6
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_deps).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..6ed3c93fe1
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v13 6/6] dts: add API doc generation
  @ 2024-08-06  8:46  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-06  8:46 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           | 10 ++-
 buildtools/get-dts-deps.py                | 78 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/meson.build                       |  1 +
 doc/guides/conf.py                        | 41 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/meson.build                    |  1 +
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 dts/doc/meson.build                       | 30 +++++++++
 dts/meson.build                           | 15 +++++
 meson.build                               |  1 +
 14 files changed, 225 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-deps.py
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..5dd59907cd 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,11 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+conf_src = src
+if src.find('dts') != -1:
+    if '-c' in extra_args:
+        conf_src = extra_args[extra_args.index('-c') + 1]
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
@@ -23,6 +28,9 @@
 for root, dirs, files in os.walk(src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(dst):
+    os.makedirs(dst)
+
 # run sphinx, putting the html output in a "html" directory
 with open(join(dst, 'sphinx_html.out'), 'w') as out:
     process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
@@ -34,7 +42,7 @@
 
 # copy custom CSS file
 css = 'custom.css'
-src_css = join(src, css)
+src_css = join(conf_src, css)
 dst_css = join(dst, 'html', '_static', 'css', css)
 if not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css):
     os.makedirs(os.path.dirname(dst_css), exist_ok=True)
diff --git a/buildtools/get-dts-deps.py b/buildtools/get-dts-deps.py
new file mode 100755
index 0000000000..309b83cb5c
--- /dev/null
+++ b/buildtools/get-dts-deps.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script returns the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+a returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_version_tuple(version_str):
+    return tuple(map(int, version_str.split(".")))
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    deps = {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+    doc_deps_section = cfg['tool.poetry.group.docs.dependencies']
+    doc_deps = {dep: doc_deps_section[dep].strip("\"'") for dep in doc_deps_section}
+
+    return deps | doc_deps
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = _get_version_tuple(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = _get_version_tuple(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = _get_version_tuple(platform.python_version())
+        req_ver = _get_version_tuple(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..599653bea4 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_deps = py3 + files('get-dts-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..b893931b92 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..eab3387874 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +24,45 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS.
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS.
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options.
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+if environ.get('DTS_BUILD'):
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-deps').get_missing_imports()
+
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd715f8072 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..c2df99bbc6
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_deps).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..6ed3c93fe1
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v14 6/6] dts: add API doc generation
  @ 2024-08-06 11:17  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-06 11:17 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

The generated DTS API docs are linked with the DPDK API docs according
to their placement after installing them with 'meson install'. However,
the build path differs from the install path, requiring a symlink from
DPDK API doc build path to DTS API build path to produce the proper link
in the build directory.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           | 10 ++-
 buildtools/get-dts-deps.py                | 78 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/meson.build                       |  1 +
 doc/guides/conf.py                        | 41 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/meson.build                    |  1 +
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 dts/doc/meson.build                       | 43 +++++++++++++
 dts/meson.build                           | 15 +++++
 meson.build                               |  1 +
 14 files changed, 238 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-deps.py
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..5dd59907cd 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,11 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+conf_src = src
+if src.find('dts') != -1:
+    if '-c' in extra_args:
+        conf_src = extra_args[extra_args.index('-c') + 1]
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
@@ -23,6 +28,9 @@
 for root, dirs, files in os.walk(src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(dst):
+    os.makedirs(dst)
+
 # run sphinx, putting the html output in a "html" directory
 with open(join(dst, 'sphinx_html.out'), 'w') as out:
     process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
@@ -34,7 +42,7 @@
 
 # copy custom CSS file
 css = 'custom.css'
-src_css = join(src, css)
+src_css = join(conf_src, css)
 dst_css = join(dst, 'html', '_static', 'css', css)
 if not os.path.exists(dst_css) or not filecmp.cmp(src_css, dst_css):
     os.makedirs(os.path.dirname(dst_css), exist_ok=True)
diff --git a/buildtools/get-dts-deps.py b/buildtools/get-dts-deps.py
new file mode 100755
index 0000000000..309b83cb5c
--- /dev/null
+++ b/buildtools/get-dts-deps.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script returns the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+a returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_version_tuple(version_str):
+    return tuple(map(int, version_str.split(".")))
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    deps = {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+    doc_deps_section = cfg['tool.poetry.group.docs.dependencies']
+    doc_deps = {dep: doc_deps_section[dep].strip("\"'") for dep in doc_deps_section}
+
+    return deps | doc_deps
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = _get_version_tuple(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = _get_version_tuple(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = _get_version_tuple(platform.python_version())
+        req_ver = _get_version_tuple(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..599653bea4 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_deps = py3 + files('get-dts-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index b828b1ed66..b893931b92 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -41,6 +41,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..eab3387874 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +24,45 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS.
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS.
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options.
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+if environ.get('DTS_BUILD'):
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-deps').get_missing_imports()
+
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index f8bbfba9f5..b34b7b8eb0 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..bd715f8072 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..d48a7f2003
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_deps).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
+
+build_html_dir = join_paths(meson.current_build_dir(), 'html')
+dts_api_symlink = custom_target('dts_api_symlink',
+        output: 'symlink',
+        depends: dts_api_html,
+        command: ['mkdir', '-p', dts_doc_api_build_dir, '&&',
+            'ln', '-sf', build_html_dir, dts_doc_api_build_dir],
+        build_by_default: get_option('enable_docs'),
+        install: false)
+
+doc_targets += dts_api_symlink
+doc_target_names += 'DTS_API_SYMLINK'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..6ed3c93fe1
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v15 5/5] dts: add API doc generation
  @ 2024-08-06 15:19  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-06 15:19 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           |  2 +
 buildtools/get-dts-deps.py                | 78 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/dts/custom.css                    |  1 +
 doc/api/dts/meson.build                   | 29 +++++++++
 doc/api/meson.build                       | 13 ++++
 doc/guides/conf.py                        | 41 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 doc/meson.build                           |  1 +
 13 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100755 buildtools/get-dts-deps.py
 create mode 120000 doc/api/dts/custom.css
 create mode 100644 doc/api/dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..45724ffcd4 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,8 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+if src.find('dts') != -1:
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
diff --git a/buildtools/get-dts-deps.py b/buildtools/get-dts-deps.py
new file mode 100755
index 0000000000..309b83cb5c
--- /dev/null
+++ b/buildtools/get-dts-deps.py
@@ -0,0 +1,78 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script returns the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+a returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_version_tuple(version_str):
+    return tuple(map(int, version_str.split(".")))
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    deps = {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+    doc_deps_section = cfg['tool.poetry.group.docs.dependencies']
+    doc_deps = {dep: doc_deps_section[dep].strip("\"'") for dep in doc_deps_section}
+
+    return deps | doc_deps
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = _get_version_tuple(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = _get_version_tuple(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = _get_version_tuple(platform.python_version())
+        req_ver = _get_version_tuple(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..599653bea4 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_deps = py3 + files('get-dts-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/dts/custom.css b/doc/api/dts/custom.css
new file mode 120000
index 0000000000..3c9480c4a0
--- /dev/null
+++ b/doc/api/dts/custom.css
@@ -0,0 +1 @@
+../../guides/custom.css
\ No newline at end of file
diff --git a/doc/api/dts/meson.build b/doc/api/dts/meson.build
new file mode 100644
index 0000000000..329b60cb1f
--- /dev/null
+++ b/doc/api/dts/meson.build
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_deps).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+extra_sphinx_args = ['-E', '-c', join_paths(doc_source_dir, 'guides')]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+dts_doc_targets += dts_api_html
+dts_doc_target_names += 'DTS_API_HTML'
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..788129336b 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+dts_doc_targets = []
+dts_doc_target_names = []
+subdir('dts')
+
+if dts_doc_targets.length() == 0
+    dts_message = 'No DTS docs targets found'
+else
+    dts_message = 'Building DTS docs:'
+endif
+run_target('dts-doc', command: [echo, dts_message, dts_doc_target_names],
+    depends: dts_doc_targets)
+
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -40,6 +52,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..eab3387874 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +24,45 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS.
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS.
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options.
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+if environ.get('DTS_BUILD'):
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-deps').get_missing_imports()
+
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..18cc7908cf 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``doc/api/dts``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/doc/meson.build b/doc/meson.build
index 6f74706aa2..1e0cfa4127 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_source_dir = meson.current_source_dir()
 doc_targets = []
 doc_target_names = []
 subdir('api')
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* RE: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
  2024-07-31 14:26  0%         ` Thomas Monjalon
@ 2024-08-07 13:31  0%           ` Kusztal, ArkadiuszX
  0 siblings, 0 replies; 200+ results
From: Kusztal, ArkadiuszX @ 2024-08-07 13:31 UTC (permalink / raw)
  To: Thomas Monjalon, Gowrishankar Muthukrishnan
  Cc: dev, dev, Anoob Joseph, Richardson, Bruce, ciara.power,
	Jerin Jacob, fanzhang.oss, Ji,  Kai, jack.bond-preston, Marchand,
	David, hemant.agrawal, De Lara Guarch, Pablo, Trahe, Fiona,
	Doherty, Declan, matan, ruifeng.wang, Gujjar, Abhinandan S,
	maxime.coquelin, chenbox, sunilprakashrao.uttarwar, andrew.boyer,
	ajit.khaparde, raveendra.padasalagi, vikas.gupta, zhangfei.gao,
	g.singh, jianjay.zhou, Daly, Lee



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, July 31, 2024 4:27 PM
> To: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
> Cc: dev@dpdk.org; Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>;
> dev@dpdk.org; Anoob Joseph <anoobj@marvell.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; ciara.power@intel.com; Jerin Jacob
> <jerinj@marvell.com>; fanzhang.oss@gmail.com; Ji, Kai <kai.ji@intel.com>;
> jack.bond-preston@foss.arm.com; Marchand, David
> <david.marchand@redhat.com>; hemant.agrawal@nxp.com; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Doherty, Declan <declan.doherty@intel.com>;
> matan@nvidia.com; ruifeng.wang@arm.com; Gujjar, Abhinandan S
> <abhinandan.gujjar@intel.com>; maxime.coquelin@redhat.com;
> chenbox@nvidia.com; sunilprakashrao.uttarwar@amd.com;
> andrew.boyer@amd.com; ajit.khaparde@broadcom.com;
> raveendra.padasalagi@broadcom.com; vikas.gupta@broadcom.com;
> zhangfei.gao@linaro.org; g.singh@nxp.com; jianjay.zhou@huawei.com; Daly,
> Lee <lee.daly@intel.com>
> Subject: Re: [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO
> 
> I'm not sure why we don't have a consensus on an idea proposed as RFC in
> September 2023.
> 
> Because there is not enough involvement outside of the Marvell team, I will
> keep a vague announce for the first item:
> 
> cryptodev: Some changes may happen to manage RSA padding for virtio-crypto.
> 
> The second item is applied verbatim, thanks.
> 
> 
> 31/07/2024 14:51, Thomas Monjalon:
> > 30/07/2024 16:39, Gowrishankar Muthukrishnan:
> > > Hi,
> > > We need to fix padding info in DPDK as per VirtIO specification in order to
> support RSA in virtio devices. VirtIO-crypto specification and DPDK specification
> differs in the way padding is handled.
> > > With current DPDK & virtio specification, it is impossible to support RSA in
> virtio-crypto. If you think DPDK spec should not be modified, we will try to
> amend the virtIO spec to match DPDK, but since we do not know if the virtIO
> community would accept, can we merge the deprecation notice?
> >
> > There is a long list of Cc but I see no support outside of Marvell.
> >
> >
> >
> > > >>> +* cryptodev: The struct rte_crypto_rsa_padding will be moved
> > > >>> +from
> > > >>> +  rte_crypto_rsa_op_param struct to rte_crypto_rsa_xform
> > > >>> +struct,
> > > >>> +  breaking ABI. The new location is recommended to comply with
> > > >>> +  virtio-crypto specification. Applications and drivers using
> > > >>> +  this struct will be updated.
> > > >>> +
> > >
> > >
> > > >> The problem here, I see is that there is one private key but multiple
> combinations of padding.
> > > >> Therefore, for every padding variation, we need to copy the same private
> key anew, duplicating it in memory.
> > > >> The only reason for me to keep a session-like struct in asymmetric crypto
> was exactly this.
> > > >
> > > > Each padding scheme in RSA has its own pros and cons (in terms of
> implementations as well).
> > > > When we share the same private key for Sign (and its public key in
> > > > case of Encryption) between multiple crypto ops (varying by
> > > > padding schemes among cops), a vulnerable attack against one
> > > > scheme could potentially open door to used private key in the session and
> hence take advantage on other crypto operations.
> > > >
> > > > I think, this could be one reason for why VirtIO spec mandates padding info
> as session parameter.
> > > > Hence, more than duplicating in memory, private and public keys
> > > > are secured and in catastrophe, only that session could be destroyed.

Hi Gowrishankar,

Sorry for the delayed response.

I do not have any particular security issues in mind here, and if PMD need to copy keys internally, for alignment or padding purposes, redundancy problems can be overcome. My concern was, that it is the more natural way of handling the API; we have one key, multiple padding schemes, so we reflect this logic in the API.

Both options are widely used; libcrypto, for example is setting padding within session, other languages like Go, Rust are setting it as an argument to the method of the key struct.

If this is that problematic with VirtIO compatibility, I say this change is okay.

> > >
> > >
> > > >>> +* cryptodev: The rte_crypto_rsa_xform struct member to hold
> > > >>> +private key
> > > >>> +  in either exponent or quintuple format is changed from union
> > > >>> +to struct
> > > >>> +  data type. This change is to support ASN.1 syntax (RFC 3447 Appendix
> A.1.2).
> > > >>> +  This change will not break existing applications.
> > > > >
> > > > > This one I agree. RFC 8017 obsoletes RFC 3447.
> 
> 


^ permalink raw reply	[relevance 0%]

* RE: [EXTERNAL] Re: [PATCH] doc: announce cryptodev change to support EDDSA
  2024-07-31 12:57  3%   ` Thomas Monjalon
@ 2024-08-07 17:21  0%     ` Gowrishankar Muthukrishnan
  0 siblings, 0 replies; 200+ results
From: Gowrishankar Muthukrishnan @ 2024-08-07 17:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Anoob Joseph, zhangfei.gao, Kusztal, ArkadiuszX, dev, Richardson,
	Bruce, ciara.power, Jerin Jacob, fanzhang.oss, Ji, Kai,
	jack.bond-preston, Marchand, David, hemant.agrawal,
	De Lara Guarch, Pablo, Trahe, Fiona, Doherty, Declan, matan,
	ruifeng.wang, Gujjar, Abhinandan S, maxime.coquelin, chenbox,
	sunilprakashrao.uttarwar, andrew.boyer, ajit.khaparde,
	raveendra.padasalagi, vikas.gupta, g.singh, jianjay.zhou, Daly,
	Lee

> It means we are not able to add an algo without breaking ABI.
> Is it something we can improve?
> 

Sure Thomas, we will address it in our patch, ensuring the long term solution as well.

Regards,
Gowrishankar

^ permalink raw reply	[relevance 0%]

* [PATCH] version: 24.11-rc0
@ 2024-08-08  8:03 12% David Marchand
  2024-08-08 12:00  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-08-08  8:03 UTC (permalink / raw)
  To: dev; +Cc: thomas

Start a new release cycle with empty release notes.

The ABI version becomes 25.0.
The map files are updated to the new ABI major number (25).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .github/workflows/build.yml                |   4 +-
 ABI_VERSION                                |   2 +-
 VERSION                                    |   2 +-
 devtools/libabigail.abignore               |  19 ---
 doc/guides/rel_notes/index.rst             |   1 +
 doc/guides/rel_notes/release_24_11.rst     | 136 +++++++++++++++++++++
 drivers/baseband/acc/version.map           |   2 +-
 drivers/baseband/fpga_5gnr_fec/version.map |   2 +-
 drivers/baseband/fpga_lte_fec/version.map  |   2 +-
 drivers/bus/fslmc/version.map              |   2 +-
 drivers/bus/pci/version.map                |   2 +-
 drivers/bus/platform/version.map           |   2 +-
 drivers/bus/vdev/version.map               |   2 +-
 drivers/bus/vmbus/version.map              |   2 +-
 drivers/crypto/octeontx/version.map        |   2 +-
 drivers/crypto/scheduler/version.map       |   2 +-
 drivers/dma/dpaa2/version.map              |   2 +-
 drivers/event/cnxk/version.map             |   2 +-
 drivers/event/dlb2/version.map             |   2 +-
 drivers/mempool/cnxk/version.map           |   2 +-
 drivers/mempool/dpaa2/version.map          |   2 +-
 drivers/net/atlantic/version.map           |   2 +-
 drivers/net/bnxt/version.map               |   2 +-
 drivers/net/bonding/version.map            |   2 +-
 drivers/net/cnxk/version.map               |   2 +-
 drivers/net/dpaa/version.map               |   2 +-
 drivers/net/dpaa2/version.map              |   2 +-
 drivers/net/i40e/version.map               |   2 +-
 drivers/net/iavf/version.map               |   2 +-
 drivers/net/ice/version.map                |   2 +-
 drivers/net/ipn3ke/version.map             |   2 +-
 drivers/net/ixgbe/version.map              |   2 +-
 drivers/net/mlx5/version.map               |   2 +-
 drivers/net/octeontx/version.map           |   2 +-
 drivers/net/ring/version.map               |   2 +-
 drivers/net/softnic/version.map            |   2 +-
 drivers/net/vhost/version.map              |   2 +-
 drivers/raw/ifpga/version.map              |   2 +-
 drivers/version.map                        |   2 +-
 lib/acl/version.map                        |   2 +-
 lib/bbdev/version.map                      |   2 +-
 lib/bitratestats/version.map               |   2 +-
 lib/bpf/version.map                        |   2 +-
 lib/cfgfile/version.map                    |   2 +-
 lib/cmdline/version.map                    |   2 +-
 lib/compressdev/version.map                |   2 +-
 lib/cryptodev/version.map                  |   2 +-
 lib/distributor/version.map                |   2 +-
 lib/dmadev/version.map                     |   2 +-
 lib/eal/version.map                        |   2 +-
 lib/efd/version.map                        |   2 +-
 lib/ethdev/version.map                     |   2 +-
 lib/eventdev/version.map                   |   2 +-
 lib/fib/version.map                        |   2 +-
 lib/graph/version.map                      |   2 +-
 lib/gro/version.map                        |   2 +-
 lib/gso/version.map                        |   2 +-
 lib/hash/version.map                       |  14 +--
 lib/ip_frag/version.map                    |   2 +-
 lib/ipsec/version.map                      |   2 +-
 lib/jobstats/version.map                   |   2 +-
 lib/kvargs/version.map                     |   2 +-
 lib/latencystats/version.map               |   2 +-
 lib/log/version.map                        |   4 +-
 lib/lpm/version.map                        |   2 +-
 lib/mbuf/version.map                       |   2 +-
 lib/member/version.map                     |   2 +-
 lib/mempool/version.map                    |   2 +-
 lib/meter/version.map                      |   2 +-
 lib/metrics/version.map                    |   2 +-
 lib/net/version.map                        |   2 +-
 lib/node/version.map                       |   2 +-
 lib/pcapng/version.map                     |   2 +-
 lib/pci/version.map                        |   2 +-
 lib/pdump/version.map                      |   2 +-
 lib/pipeline/version.map                   |   2 +-
 lib/port/version.map                       |   2 +-
 lib/power/version.map                      |   2 +-
 lib/rawdev/version.map                     |   2 +-
 lib/rcu/version.map                        |   2 +-
 lib/reorder/version.map                    |   2 +-
 lib/rib/version.map                        |   2 +-
 lib/ring/version.map                       |   2 +-
 lib/sched/version.map                      |   2 +-
 lib/security/version.map                   |   2 +-
 lib/stack/version.map                      |   2 +-
 lib/table/version.map                      |   2 +-
 lib/telemetry/version.map                  |   2 +-
 lib/timer/version.map                      |   2 +-
 lib/vhost/version.map                      |   2 +-
 90 files changed, 232 insertions(+), 114 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_24_11.rst

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index dbf25626d4..f7d3affbaa 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -27,7 +27,7 @@ jobs:
       MINGW: ${{ matrix.config.cross == 'mingw' }}
       MINI: ${{ matrix.config.mini != '' }}
       PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
-      REF_GIT_TAG: v24.03
+      REF_GIT_TAG: none
       RISCV64: ${{ matrix.config.cross == 'riscv64' }}
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
       STDATOMIC: ${{ contains(matrix.config.checks, 'stdatomic') }}
@@ -47,7 +47,7 @@ jobs:
             checks: stdatomic
           - os: ubuntu-22.04
             compiler: gcc
-            checks: abi+debug+doc+examples+tests
+            checks: debug+doc+examples+tests
           - os: ubuntu-22.04
             compiler: clang
             checks: asan+doc+tests
diff --git a/ABI_VERSION b/ABI_VERSION
index 9dc0ade502..be8e64f5a3 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-24.2
+25.0
diff --git a/VERSION b/VERSION
index 7af777b08b..7491a6c168 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-24.07.0
+24.11.0-rc0
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 96b16059a8..21b8cd6113 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,22 +33,3 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-[suppress_type]
-	name = rte_mbuf
-	type_kind = struct
-	has_size_change = no
-	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
-
-[suppress_type]
-	name = rte_pipeline_table_entry
-
-[suppress_type]
-	name = rte_rcu_qsbr
-
-[suppress_type]
-	name = rte_eth_fp_ops
-	has_data_member_inserted_between = {offset_of(reserved2), end}
-
-[suppress_type]
-	name = rte_crypto_fp_ops
-	has_data_member_inserted_between = {offset_of(reserved), end}
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 77a92b308f..74ddae3e81 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
+    release_24_11
     release_24_07
     release_24_03
     release_23_11
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
new file mode 100644
index 0000000000..0ff70d9057
--- /dev/null
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -0,0 +1,136 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2024 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 24.11
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      ninja -C build doc
+      xdg-open build/doc/guides/html/rel_notes/release_24_11.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significant)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the API change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the ABI change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
diff --git a/drivers/baseband/acc/version.map b/drivers/baseband/acc/version.map
index fa39a63f0f..3f427caf67 100644
--- a/drivers/baseband/acc/version.map
+++ b/drivers/baseband/acc/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/baseband/fpga_5gnr_fec/version.map b/drivers/baseband/fpga_5gnr_fec/version.map
index 855ce55703..fb32805028 100644
--- a/drivers/baseband/fpga_5gnr_fec/version.map
+++ b/drivers/baseband/fpga_5gnr_fec/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/baseband/fpga_lte_fec/version.map b/drivers/baseband/fpga_lte_fec/version.map
index 2c8e60375d..f6b2961ba2 100644
--- a/drivers/baseband/fpga_lte_fec/version.map
+++ b/drivers/baseband/fpga_lte_fec/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/bus/fslmc/version.map b/drivers/bus/fslmc/version.map
index f6bdf877bf..e19b8d1f6b 100644
--- a/drivers/bus/fslmc/version.map
+++ b/drivers/bus/fslmc/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_fslmc_vfio_mem_dmamap;
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 5d9dced5b2..cd653de5ac 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pci_dump;
diff --git a/drivers/bus/platform/version.map b/drivers/bus/platform/version.map
index 9e7111dd38..37c4a74f82 100644
--- a/drivers/bus/platform/version.map
+++ b/drivers/bus/platform/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/bus/vdev/version.map b/drivers/bus/vdev/version.map
index 16f187734b..51a0f1d5e1 100644
--- a/drivers/bus/vdev/version.map
+++ b/drivers/bus/vdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_vdev_add_custom_scan;
diff --git a/drivers/bus/vmbus/version.map b/drivers/bus/vmbus/version.map
index 08b008b311..365f71529f 100644
--- a/drivers/bus/vmbus/version.map
+++ b/drivers/bus/vmbus/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_vmbus_chan_close;
diff --git a/drivers/crypto/octeontx/version.map b/drivers/crypto/octeontx/version.map
index 54a0912e76..8803f974b2 100644
--- a/drivers/crypto/octeontx/version.map
+++ b/drivers/crypto/octeontx/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/crypto/scheduler/version.map b/drivers/crypto/scheduler/version.map
index 23380fb3c5..d7ba3874f2 100644
--- a/drivers/crypto/scheduler/version.map
+++ b/drivers/crypto/scheduler/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_cryptodev_scheduler_load_user_scheduler;
diff --git a/drivers/dma/dpaa2/version.map b/drivers/dma/dpaa2/version.map
index 713ed41f0c..fc16517f7a 100644
--- a/drivers/dma/dpaa2/version.map
+++ b/drivers/dma/dpaa2/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/event/cnxk/version.map b/drivers/event/cnxk/version.map
index 3dd9a8fdd1..a275ec2977 100644
--- a/drivers/event/cnxk/version.map
+++ b/drivers/event/cnxk/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/event/dlb2/version.map b/drivers/event/dlb2/version.map
index 1d0a0a75d7..c37d2302cd 100644
--- a/drivers/event/dlb2/version.map
+++ b/drivers/event/dlb2/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/mempool/cnxk/version.map b/drivers/mempool/cnxk/version.map
index 8249417527..c2905a610e 100644
--- a/drivers/mempool/cnxk/version.map
+++ b/drivers/mempool/cnxk/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/mempool/dpaa2/version.map b/drivers/mempool/dpaa2/version.map
index b2bf63eb79..c1acfc0c64 100644
--- a/drivers/mempool/dpaa2/version.map
+++ b/drivers/mempool/dpaa2/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_dpaa2_mbuf_from_buf_addr;
diff --git a/drivers/net/atlantic/version.map b/drivers/net/atlantic/version.map
index cbe9ee9263..5644e150ac 100644
--- a/drivers/net/atlantic/version.map
+++ b/drivers/net/atlantic/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/bnxt/version.map b/drivers/net/bnxt/version.map
index ff82396ca1..d29521f990 100644
--- a/drivers/net/bnxt/version.map
+++ b/drivers/net/bnxt/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_bnxt_get_vf_rx_status;
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index 09ee21c55f..a309469b1f 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_eth_bond_8023ad_agg_selection_get;
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 77f574bb16..1ad0616bdf 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/dpaa/version.map b/drivers/net/dpaa/version.map
index c06f4a56de..3fdb63caf3 100644
--- a/drivers/net/dpaa/version.map
+++ b/drivers/net/dpaa/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_dpaa_set_tx_loopback;
diff --git a/drivers/net/dpaa2/version.map b/drivers/net/dpaa2/version.map
index 283bcb42c1..ba756d26bd 100644
--- a/drivers/net/dpaa2/version.map
+++ b/drivers/net/dpaa2/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_dpaa2_mux_flow_create;
diff --git a/drivers/net/i40e/version.map b/drivers/net/i40e/version.map
index 52b7a3269a..e5d20fee71 100644
--- a/drivers/net/i40e/version.map
+++ b/drivers/net/i40e/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_i40e_add_vf_mac_addr;
diff --git a/drivers/net/iavf/version.map b/drivers/net/iavf/version.map
index 135a4ccd3d..98de64cca2 100644
--- a/drivers/net/iavf/version.map
+++ b/drivers/net/iavf/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/ice/version.map b/drivers/net/ice/version.map
index 8449e98aba..0052043264 100644
--- a/drivers/net/ice/version.map
+++ b/drivers/net/ice/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/ipn3ke/version.map b/drivers/net/ipn3ke/version.map
index 4a8f5e499a..e10d44858f 100644
--- a/drivers/net/ipn3ke/version.map
+++ b/drivers/net/ipn3ke/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/ixgbe/version.map b/drivers/net/ixgbe/version.map
index 9a6ef29b1d..8c4c0ca542 100644
--- a/drivers/net/ixgbe/version.map
+++ b/drivers/net/ixgbe/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_ixgbe_bypass_event_show;
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 104fa53df6..560f7ef79b 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
 
diff --git a/drivers/net/octeontx/version.map b/drivers/net/octeontx/version.map
index 219933550d..861dd3450e 100644
--- a/drivers/net/octeontx/version.map
+++ b/drivers/net/octeontx/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_octeontx_pchan_map;
diff --git a/drivers/net/ring/version.map b/drivers/net/ring/version.map
index 62d9a77f9c..3b408c8ba5 100644
--- a/drivers/net/ring/version.map
+++ b/drivers/net/ring/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_eth_from_ring;
diff --git a/drivers/net/softnic/version.map b/drivers/net/softnic/version.map
index f67475684c..15daeceb73 100644
--- a/drivers/net/softnic/version.map
+++ b/drivers/net/softnic/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_softnic_manage;
diff --git a/drivers/net/vhost/version.map b/drivers/net/vhost/version.map
index 4825afd411..63890911d8 100644
--- a/drivers/net/vhost/version.map
+++ b/drivers/net/vhost/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_eth_vhost_get_queue_event;
diff --git a/drivers/raw/ifpga/version.map b/drivers/raw/ifpga/version.map
index 7fc1b5e8ae..ebe50925a8 100644
--- a/drivers/raw/ifpga/version.map
+++ b/drivers/raw/ifpga/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pmd_ifpga_cleanup;
diff --git a/drivers/version.map b/drivers/version.map
index 5535c79061..17cc97bda6 100644
--- a/drivers/version.map
+++ b/drivers/version.map
@@ -1,3 +1,3 @@
-DPDK_24 {
+DPDK_25 {
 	local: *;
 };
diff --git a/lib/acl/version.map b/lib/acl/version.map
index fe3127a3a9..782b1fe464 100644
--- a/lib/acl/version.map
+++ b/lib/acl/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_acl_add_rules;
diff --git a/lib/bbdev/version.map b/lib/bbdev/version.map
index 1840d2b2a4..e0d82ff752 100644
--- a/lib/bbdev/version.map
+++ b/lib/bbdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_bbdev_allocate;
diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map
index 08831a62f4..edda26d552 100644
--- a/lib/bitratestats/version.map
+++ b/lib/bitratestats/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_stats_bitrate_calc;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 2e957494e9..239c62a96c 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_bpf_convert;
diff --git a/lib/cfgfile/version.map b/lib/cfgfile/version.map
index a3fe9b62f3..927b4822fe 100644
--- a/lib/cfgfile/version.map
+++ b/lib/cfgfile/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_cfgfile_add_entry;
diff --git a/lib/cmdline/version.map b/lib/cmdline/version.map
index c2a4f95e5b..6bcfebfcec 100644
--- a/lib/cmdline/version.map
+++ b/lib/cmdline/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	cirbuf_add_buf_head;
diff --git a/lib/compressdev/version.map b/lib/compressdev/version.map
index 2461b087b5..3849ae2740 100644
--- a/lib/compressdev/version.map
+++ b/lib/compressdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_comp_get_feature_name;
diff --git a/lib/cryptodev/version.map b/lib/cryptodev/version.map
index fdac0d876e..594c501855 100644
--- a/lib/cryptodev/version.map
+++ b/lib/cryptodev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_cryptodev_trace_dequeue_burst;
diff --git a/lib/distributor/version.map b/lib/distributor/version.map
index 2670c4201c..b5ec7dfaca 100644
--- a/lib/distributor/version.map
+++ b/lib/distributor/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_distributor_clear_returns;
diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
index f2df3025fc..822aaa2d3b 100644
--- a/lib/dmadev/version.map
+++ b/lib/dmadev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_dma_close;
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 3df50c3fbb..e3ff412683 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_panic;
diff --git a/lib/efd/version.map b/lib/efd/version.map
index baac60f7bc..354c7f88bd 100644
--- a/lib/efd/version.map
+++ b/lib/efd/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_efd_create;
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..1669055ca5 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_eth_add_first_rx_callback;
diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
index 520b190bb8..4947bb4ec6 100644
--- a/lib/eventdev/version.map
+++ b/lib/eventdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_eventdev_trace_crypto_adapter_enqueue;
diff --git a/lib/fib/version.map b/lib/fib/version.map
index 62dbada6bc..c6d2769611 100644
--- a/lib/fib/version.map
+++ b/lib/fib/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_fib6_add;
diff --git a/lib/graph/version.map b/lib/graph/version.map
index c84446cdba..2c83425ddc 100644
--- a/lib/graph/version.map
+++ b/lib/graph/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_graph_mcore_dispatch_sched_node_enqueue;
diff --git a/lib/gro/version.map b/lib/gro/version.map
index 13803ec814..c21c137fcd 100644
--- a/lib/gro/version.map
+++ b/lib/gro/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_gro_ctx_create;
diff --git a/lib/gso/version.map b/lib/gso/version.map
index f159b3f199..815baeb3e5 100644
--- a/lib/gso/version.map
+++ b/lib/gso/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_gso_segment;
diff --git a/lib/hash/version.map b/lib/hash/version.map
index d348dd9196..11a5394a45 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_fbk_hash_create;
@@ -47,16 +47,16 @@ DPDK_24 {
 	local: *;
 };
 
-INTERNAL {
+EXPERIMENTAL {
 	global:
 
-	rte_thash_gfni_stub;
-	rte_thash_gfni_bulk_stub;
+	# added in 24.07
+	rte_hash_rcu_qsbr_dq_reclaim;
 };
 
-EXPERIMENTAL {
+INTERNAL {
 	global:
 
-	# added in 24.07
-	rte_hash_rcu_qsbr_dq_reclaim;
+	rte_thash_gfni_stub;
+	rte_thash_gfni_bulk_stub;
 };
diff --git a/lib/ip_frag/version.map b/lib/ip_frag/version.map
index 3e7e573dc4..0c001c7bd5 100644
--- a/lib/ip_frag/version.map
+++ b/lib/ip_frag/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_ip_frag_free_death_row;
diff --git a/lib/ipsec/version.map b/lib/ipsec/version.map
index 9d01ebeadc..308f9d2e0d 100644
--- a/lib/ipsec/version.map
+++ b/lib/ipsec/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_ipsec_pkt_crypto_group;
diff --git a/lib/jobstats/version.map b/lib/jobstats/version.map
index 3b8f9d6ac4..55100e0699 100644
--- a/lib/jobstats/version.map
+++ b/lib/jobstats/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_jobstats_abort;
diff --git a/lib/kvargs/version.map b/lib/kvargs/version.map
index cda85d171f..b50f1a97a1 100644
--- a/lib/kvargs/version.map
+++ b/lib/kvargs/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_kvargs_count;
diff --git a/lib/latencystats/version.map b/lib/latencystats/version.map
index 86ded322cb..e8806c0046 100644
--- a/lib/latencystats/version.map
+++ b/lib/latencystats/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_latencystats_get;
diff --git a/lib/log/version.map b/lib/log/version.map
index 0648f8831a..19d7f9cdb6 100644
--- a/lib/log/version.map
+++ b/lib/log/version.map
@@ -1,10 +1,10 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_log;
+	rte_log_can_log;
 	rte_log_cur_msg_loglevel;
 	rte_log_cur_msg_logtype;
-	rte_log_can_log;
 	rte_log_dump;
 	rte_log_get_global_level;
 	rte_log_get_level;
diff --git a/lib/lpm/version.map b/lib/lpm/version.map
index b6bee8c18b..29d577c24b 100644
--- a/lib/lpm/version.map
+++ b/lib/lpm/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_lpm6_add;
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index daa65e2bbd..76f1832924 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_pktmbuf_linearize;
diff --git a/lib/member/version.map b/lib/member/version.map
index 3aeba8826b..fdc7adacf9 100644
--- a/lib/member/version.map
+++ b/lib/member/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_member_add;
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index 5e303e5d5f..6f16d417ae 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_mempool_audit;
diff --git a/lib/meter/version.map b/lib/meter/version.map
index 9628bd8cd9..ae434f34b5 100644
--- a/lib/meter/version.map
+++ b/lib/meter/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_meter_srtcm_config;
diff --git a/lib/metrics/version.map b/lib/metrics/version.map
index 9766a1af5b..f9c1996a7d 100644
--- a/lib/metrics/version.map
+++ b/lib/metrics/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_metrics_deinit;
diff --git a/lib/net/version.map b/lib/net/version.map
index 3e293c4715..bec4ce23ea 100644
--- a/lib/net/version.map
+++ b/lib/net/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_eth_random_addr;
diff --git a/lib/node/version.map b/lib/node/version.map
index 6bdb944c4c..a402182fbe 100644
--- a/lib/node/version.map
+++ b/lib/node/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_node_eth_config;
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
index 81c9652ad6..9f634b653e 100644
--- a/lib/pcapng/version.map
+++ b/lib/pcapng/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pcapng_add_interface;
diff --git a/lib/pci/version.map b/lib/pci/version.map
index aeca8a1c9e..f0f6ffef9f 100644
--- a/lib/pci/version.map
+++ b/lib/pci/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pci_addr_cmp;
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index ea5bd157cd..6eea4c1530 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pdump_disable;
diff --git a/lib/pipeline/version.map b/lib/pipeline/version.map
index 6997b69340..b56d022664 100644
--- a/lib/pipeline/version.map
+++ b/lib/pipeline/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_pipeline_ah_packet_drop;
diff --git a/lib/port/version.map b/lib/port/version.map
index fefcf29063..98fe0b08ab 100644
--- a/lib/port/version.map
+++ b/lib/port/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_port_ethdev_reader_ops;
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..c9a226614e 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_power_check_env_supported;
diff --git a/lib/rawdev/version.map b/lib/rawdev/version.map
index 21064a889b..f95d5dabae 100644
--- a/lib/rawdev/version.map
+++ b/lib/rawdev/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_rawdev_close;
diff --git a/lib/rcu/version.map b/lib/rcu/version.map
index 982ffd59d9..d96c4c4109 100644
--- a/lib/rcu/version.map
+++ b/lib/rcu/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_rcu_log_type;
diff --git a/lib/reorder/version.map b/lib/reorder/version.map
index 5baeab56f8..18e97942e1 100644
--- a/lib/reorder/version.map
+++ b/lib/reorder/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_reorder_create;
diff --git a/lib/rib/version.map b/lib/rib/version.map
index 39da637f75..145d9c2602 100644
--- a/lib/rib/version.map
+++ b/lib/rib/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_rib6_create;
diff --git a/lib/ring/version.map b/lib/ring/version.map
index 9eb6e254c8..8da094a69a 100644
--- a/lib/ring/version.map
+++ b/lib/ring/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_ring_create;
diff --git a/lib/sched/version.map b/lib/sched/version.map
index be1decaeee..a6ca9ee1ad 100644
--- a/lib/sched/version.map
+++ b/lib/sched/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_approx;
diff --git a/lib/security/version.map b/lib/security/version.map
index 7709ef41a3..2a4795f31d 100644
--- a/lib/security/version.map
+++ b/lib/security/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	__rte_security_set_pkt_metadata;
diff --git a/lib/stack/version.map b/lib/stack/version.map
index d191ef7791..53c7d3d1c5 100644
--- a/lib/stack/version.map
+++ b/lib/stack/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_stack_create;
diff --git a/lib/table/version.map b/lib/table/version.map
index 6c89910732..718138554e 100644
--- a/lib/table/version.map
+++ b/lib/table/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_table_acl_ops;
diff --git a/lib/telemetry/version.map b/lib/telemetry/version.map
index 7d12c92905..2907d28aa0 100644
--- a/lib/telemetry/version.map
+++ b/lib/telemetry/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_tel_data_add_array_container;
diff --git a/lib/timer/version.map b/lib/timer/version.map
index b180708e24..3f19be22d3 100644
--- a/lib/timer/version.map
+++ b/lib/timer/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_timer_alt_dump_stats;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 25b52e47d2..30bc312262 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -1,4 +1,4 @@
-DPDK_24 {
+DPDK_25 {
 	global:
 
 	rte_vdpa_find_device_by_name;
-- 
2.45.2


^ permalink raw reply	[relevance 12%]

* [PATCH v16 5/5] dts: add API doc generation
  @ 2024-08-08  8:54  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-08  8:54 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content. There's other Sphinx
configuration related to Python docstrings which doesn't affect DPDK doc
build. All new configuration is in a conditional block, applied only
when DTS API docs are built to not interfere with DPDK doc build.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

And finally, the DTS API docs can be accessed from the DPDK API doxygen
page.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           |  2 +
 buildtools/get-dts-runtime-deps.py        | 72 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/dts/custom.css                    |  1 +
 doc/api/dts/meson.build                   | 29 +++++++++
 doc/api/meson.build                       | 13 ++++
 doc/guides/conf.py                        | 44 +++++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/tools/dts.rst                  | 39 +++++++++++-
 doc/meson.build                           |  1 +
 13 files changed, 211 insertions(+), 2 deletions(-)
 create mode 100755 buildtools/get-dts-runtime-deps.py
 create mode 120000 doc/api/dts/custom.css
 create mode 100644 doc/api/dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..45724ffcd4 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,8 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+if src.find('dts') != -1:
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
diff --git a/buildtools/get-dts-runtime-deps.py b/buildtools/get-dts-runtime-deps.py
new file mode 100755
index 0000000000..5d629dd09d
--- /dev/null
+++ b/buildtools/get-dts-runtime-deps.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script exits with the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime and doc generation dependencies in the DTS pyproject.toml file
+and returns a list of module names used in an import statement that are missing.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+from packaging.version import Version
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    return {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+
+
+def get_missing_imports():
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = Version(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = Version(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = Version(platform.python_version())
+        req_ver = Version(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..6b938d767c 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_runtime_deps = py3 + files('get-dts-runtime-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/dts/custom.css b/doc/api/dts/custom.css
new file mode 120000
index 0000000000..3c9480c4a0
--- /dev/null
+++ b/doc/api/dts/custom.css
@@ -0,0 +1 @@
+../../guides/custom.css
\ No newline at end of file
diff --git a/doc/api/dts/meson.build b/doc/api/dts/meson.build
new file mode 100644
index 0000000000..b4b6f9d269
--- /dev/null
+++ b/doc/api/dts/meson.build
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_runtime_deps, check: false).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+extra_sphinx_args = ['-E', '-c', join_paths(doc_source_dir, 'guides')]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+dts_doc_targets += dts_api_html
+dts_doc_target_names += 'DTS_API_HTML'
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..788129336b 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+dts_doc_targets = []
+dts_doc_target_names = []
+subdir('dts')
+
+if dts_doc_targets.length() == 0
+    dts_message = 'No DTS docs targets found'
+else
+    dts_message = 'Building DTS docs:'
+endif
+run_target('dts-doc', command: [echo, dts_message, dts_doc_target_names],
+    depends: dts_doc_targets)
+
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -40,6 +52,7 @@ cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..d7f3030838 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -58,6 +58,48 @@
              ("tools/devbind", "dpdk-devbind",
               "check device status and bind/unbind them from drivers", "", 8)]
 
+# DTS API docs additional configuration
+if environ.get('DTS_BUILD'):
+    extensions = ['sphinx.ext.napoleon', 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx']
+    # Napoleon enables the Google format of Python doscstrings.
+    napoleon_numpy_docstring = False
+    napoleon_attr_annotations = True
+    napoleon_preprocess_types = True
+
+    # Autodoc pulls documentation from code.
+    autodoc_default_options = {
+        'members': True,
+        'member-order': 'bysource',
+        'show-inheritance': True,
+    }
+    autodoc_class_signature = 'separated'
+    autodoc_typehints = 'both'
+    autodoc_typehints_format = 'short'
+    autodoc_typehints_description_target = 'documented'
+
+    # Intersphinx allows linking to external projects, such as Python docs.
+    intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+    # DTS docstring options.
+    add_module_names = False
+    toc_object_entries = True
+    toc_object_entries_show_parents = 'hide'
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-runtime-deps').get_missing_imports()
+
 
 # ####### :numref: fallback ########
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..18cc7908cf 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``doc/api/dts``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/doc/meson.build b/doc/meson.build
index 6f74706aa2..1e0cfa4127 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_source_dir = meson.current_source_dir()
 doc_targets = []
 doc_target_names = []
 subdir('api')
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] version: 24.11-rc0
  2024-08-08  8:03 12% [PATCH] version: 24.11-rc0 David Marchand
@ 2024-08-08 12:00  0% ` Thomas Monjalon
  2024-08-08 12:55  3%   ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-08-08 12:00 UTC (permalink / raw)
  To: David Marchand; +Cc: dev

08/08/2024 10:03, David Marchand:
> Start a new release cycle with empty release notes.
> 
> The ABI version becomes 25.0.
> The map files are updated to the new ABI major number (25).
> The ABI exceptions are dropped and CI ABI checks are disabled because
> compatibility is not preserved.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>





^ permalink raw reply	[relevance 0%]

* Re: [PATCH] version: 24.11-rc0
  2024-08-08 12:00  0% ` Thomas Monjalon
@ 2024-08-08 12:55  3%   ` David Marchand
  2024-08-08 21:45  3%     ` [OS-Team] [dpdklab] " Patrick Robb
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-08-08 12:55 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Thomas Monjalon, ci, dpdklab

On Thu, Aug 8, 2024 at 2:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 08/08/2024 10:03, David Marchand:
> > Start a new release cycle with empty release notes.
> >
> > The ABI version becomes 25.0.
> > The map files are updated to the new ABI major number (25).
> > The ABI exceptions are dropped and CI ABI checks are disabled because
> > compatibility is not preserved.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied, thanks.

Heads up to CI: ABI checks must be disabled during v24.11 release.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [OS-Team] [dpdklab] Re: [PATCH] version: 24.11-rc0
  2024-08-08 12:55  3%   ` David Marchand
@ 2024-08-08 21:45  3%     ` Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-08-08 21:45 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Thomas Monjalon, ci, dpdklab

Thanks I will merge the change disabling ABI checks at UNH at start of
day tomorrow.

On Thu, Aug 8, 2024 at 8:56 AM David Marchand <david.marchand@redhat.com> wrote:
>
> On Thu, Aug 8, 2024 at 2:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 08/08/2024 10:03, David Marchand:
> > > Start a new release cycle with empty release notes.
> > >
> > > The ABI version becomes 25.0.
> > > The map files are updated to the new ABI major number (25).
> > > The ABI exceptions are dropped and CI ABI checks are disabled because
> > > compatibility is not preserved.
> > >
> > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > Acked-by: Thomas Monjalon <thomas@monjalon.net>
>
> Applied, thanks.
>
> Heads up to CI: ABI checks must be disabled during v24.11 release.
>
>
> --
> David Marchand
>

^ permalink raw reply	[relevance 3%]

* [PATCH v9 0/2] power: introduce PM QoS interface
                     ` (6 preceding siblings ...)
  2024-07-09  7:25  4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-08-09  9:50  4% ` Huisong Li
  2024-08-09  9:50  5%   ` [PATCH v9 1/2] power: introduce PM QoS API on CPU wide Huisong Li
  7 siblings, 1 reply; 200+ results
From: Huisong Li @ 2024-08-09  9:50 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Please see the description in kernel document[1].
Each cpuidle governor in Linux select which idle state to enter based on
this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

[1] https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us

---
 v9:
  - move new feature description from release_24_07.rst to release_24_11.rst.
 v8:
  - update the latest code to resolve CI warning
 v7:
  - remove a dead code rte_lcore_is_enabled in patch[2/2]
 v6:
  - update release_24_07.rst based on dpdk repo to resolve CI warning.
 v5:
  - use LINE_MAX to replace BUFSIZ, and use snprintf to replace sprintf.
 v4:
  - fix some comments basd on Stephen
  - add stdint.h include
  - add Acked-by Morten Brørup <mb@smartsharesystems.com>
 v3:
  - add RTE_POWER_xxx prefix for some macro in header
  - add the check for lcore_id with rte_lcore_is_enabled
 v2:
  - use PM QoS on CPU wide to replace the one on system wide

Huisong Li (2):
  power: introduce PM QoS API on CPU wide
  examples/l3fwd-power: add PM QoS configuration

 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_11.rst |   5 ++
 examples/l3fwd-power/main.c            |  24 ++++++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   4 +
 7 files changed, 246 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

-- 
2.22.0


^ permalink raw reply	[relevance 4%]

* [PATCH v9 1/2] power: introduce PM QoS API on CPU wide
  2024-08-09  9:50  4% ` [PATCH v9 0/2] power: introduce PM QoS interface Huisong Li
@ 2024-08-09  9:50  5%   ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2024-08-09  9:50 UTC (permalink / raw)
  To: dev
  Cc: mb, thomas, ferruh.yigit, anatoly.burakov, david.hunt,
	sivaprasad.tummala, stephen, david.marchand, liuyonglong,
	lihuisong

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 doc/guides/prog_guide/power_man.rst    |  24 ++++++
 doc/guides/rel_notes/release_24_11.rst |   5 ++
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 114 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  73 ++++++++++++++++
 lib/power/version.map                  |   4 +
 6 files changed, 222 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..faa32b4320 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,30 @@ Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency based
+on this sysfs.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
+idle state selection in Linux and limit just to enter the shallowest idle state
+to low the delay of resuming service after sleeping by setting strict resume
+latency (zero value).
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..bd72d0a595 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduce per-CPU PM QoS interface.**
+
+  * Add per-CPU PM QoS interface to low the delay after sleep by controlling
+    CPU idle state selection.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@ sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..375746f832
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_lcore.h>
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[LINE_MAX];
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		snprintf(buf, sizeof(buf), "%s", "n/a");
+	else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		snprintf(buf, sizeof(buf), "%u", 0);
+	else
+		snprintf(buf, sizeof(buf), "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[LINE_MAX];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (!rte_lcore_is_enabled(lcore_id)) {
+		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..990c488373
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
+#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
+ * if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index c9a226614e..4e4955a4cf 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,8 @@ EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+
+	# added in 24.11
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };
-- 
2.22.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
    2024-04-05  0:46  3%   ` Stephen Hemminger
  2024-04-05 15:15  3%   ` Stephen Hemminger
@ 2024-08-12  9:28  3%   ` Maxime Coquelin
  2024-08-12  9:56  0%     ` Maxime Coquelin
  2024-08-12 17:27  0%     ` Chautru, Nicolas
  2 siblings, 2 replies; 200+ results
From: Maxime Coquelin @ 2024-08-12  9:28 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas

Hi Nicolas,

On 4/4/24 23:04, Nicolas Chautru wrote:
> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>   lib/bbdev/rte_bbdev.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
> index 0cbfdd1c95..25514c58ac 100644
> --- a/lib/bbdev/rte_bbdev.h
> +++ b/lib/bbdev/rte_bbdev.h
> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
>   	 *     bbdev operation
>   	 */
>   	uint64_t acc_offload_cycles;
> +	/** Available number of enqueue batch on that queue. */
> +	uint16_t enqueue_depth_avail;
>   };
>   
>   /**

I think it needs to be documented in the ABI change section.

With that done, feel free to add:

Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  2024-08-12  9:28  3%   ` Maxime Coquelin
@ 2024-08-12  9:56  0%     ` Maxime Coquelin
  2024-08-12 17:27  0%     ` Chautru, Nicolas
  1 sibling, 0 replies; 200+ results
From: Maxime Coquelin @ 2024-08-12  9:56 UTC (permalink / raw)
  To: Nicolas Chautru, dev; +Cc: hemant.agrawal, david.marchand, hernan.vargas



On 8/12/24 11:28, Maxime Coquelin wrote:
> Hi Nicolas,
> 
> On 4/4/24 23:04, Nicolas Chautru wrote:
>> Capturing additional queue stats counter for the
>> depth of enqueue batch still available on the given
>> queue. This can help application to monitor that depth
>> at run time.
>>
>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>> ---
>>   lib/bbdev/rte_bbdev.h | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
>> index 0cbfdd1c95..25514c58ac 100644
>> --- a/lib/bbdev/rte_bbdev.h
>> +++ b/lib/bbdev/rte_bbdev.h
>> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
>>        *     bbdev operation
>>        */
>>       uint64_t acc_offload_cycles;
>> +    /** Available number of enqueue batch on that queue. */
>> +    uint16_t enqueue_depth_avail;
>>   };
>>   /**
> 
> I think it needs to be documented in the ABI change section.
> 
> With that done, feel free to add:
> 
> Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  2024-08-12  9:28  3%   ` Maxime Coquelin
  2024-08-12  9:56  0%     ` Maxime Coquelin
@ 2024-08-12 17:27  0%     ` Chautru, Nicolas
  2024-08-12 19:44  0%       ` Maxime Coquelin
  1 sibling, 1 reply; 200+ results
From: Chautru, Nicolas @ 2024-08-12 17:27 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: hemant.agrawal, Marchand, David, Vargas, Hernan

Hi Maxime, 

The branch origin/next-baseband-for-main doesn’t have yet the updates from main, such as the ./doc/rel_notes for 24.11. 
Can you refresh this? Or let me know how best to proceed.
Thanks, 
Nic

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, August 12, 2024 2:29 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; Marchand, David
> <david.marchand@redhat.com>; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue
> depth
> 
> Hi Nicolas,
> 
> On 4/4/24 23:04, Nicolas Chautru wrote:
> > Capturing additional queue stats counter for the depth of enqueue
> > batch still available on the given queue. This can help application to
> > monitor that depth at run time.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >   lib/bbdev/rte_bbdev.h | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index
> > 0cbfdd1c95..25514c58ac 100644
> > --- a/lib/bbdev/rte_bbdev.h
> > +++ b/lib/bbdev/rte_bbdev.h
> > @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
> >   	 *     bbdev operation
> >   	 */
> >   	uint64_t acc_offload_cycles;
> > +	/** Available number of enqueue batch on that queue. */
> > +	uint16_t enqueue_depth_avail;
> >   };
> >
> >   /**
> 
> I think it needs to be documented in the ABI change section.
> 
> With that done, feel free to add:
> 
> Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  2024-08-12 17:27  0%     ` Chautru, Nicolas
@ 2024-08-12 19:44  0%       ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2024-08-12 19:44 UTC (permalink / raw)
  To: Chautru, Nicolas, dev; +Cc: hemant.agrawal, Marchand, David, Vargas, Hernan

Hi Nicolas,

On 8/12/24 19:27, Chautru, Nicolas wrote:
> Hi Maxime,
> 
> The branch origin/next-baseband-for-main doesn’t have yet the updates from main, such as the ./doc/rel_notes for 24.11.
> Can you refresh this? Or let me know how best to proceed.

I rebased the branches to latest main, you can proceed.

Thanks,
Maxime

> Thanks,
> Nic
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Monday, August 12, 2024 2:29 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; dev@dpdk.org
>> Cc: hemant.agrawal@nxp.com; Marchand, David
>> <david.marchand@redhat.com>; Vargas, Hernan
>> <hernan.vargas@intel.com>
>> Subject: Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue
>> depth
>>
>> Hi Nicolas,
>>
>> On 4/4/24 23:04, Nicolas Chautru wrote:
>>> Capturing additional queue stats counter for the depth of enqueue
>>> batch still available on the given queue. This can help application to
>>> monitor that depth at run time.
>>>
>>> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
>>> ---
>>>    lib/bbdev/rte_bbdev.h | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index
>>> 0cbfdd1c95..25514c58ac 100644
>>> --- a/lib/bbdev/rte_bbdev.h
>>> +++ b/lib/bbdev/rte_bbdev.h
>>> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
>>>    	 *     bbdev operation
>>>    	 */
>>>    	uint64_t acc_offload_cycles;
>>> +	/** Available number of enqueue batch on that queue. */
>>> +	uint16_t enqueue_depth_avail;
>>>    };
>>>
>>>    /**
>>
>> I think it needs to be documented in the ABI change section.
>>
>> With that done, feel free to add:
>>
>> Maxime Coquelin <maxime.coquelin@redhat.com>
>>
>> Thanks,
>> Maxime
> 


^ permalink raw reply	[relevance 0%]

* [PATCH v2 0/3] bbdev: sdditional queue stats
@ 2024-08-12 23:41  3% Nicolas Chautru
  2024-08-12 23:41  6% ` [PATCH v2 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
  2024-08-13  7:22  0% ` [PATCH v2 0/3] bbdev: sdditional queue stats Hemant Agrawal
  0 siblings, 2 replies; 200+ results
From: Nicolas Chautru @ 2024-08-12 23:41 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

v2: update to ABI doc suggested by Maxime. 

 These series include introducing a new paramter in the queue stat
which can be used to monitor the number of available enqueue
still possible. 
The acc PMD is then refactored to use a set of common function
to update several queue status parameters including the new one.
The application is also updated.
Thanks
Nic

Nicolas Chautru (3):
  bbdev: new queue stat for available enqueue depth
  baseband/acc: refactor queue status update
  test/bbdev: update for queue stats

 app/test-bbdev/test_bbdev_perf.c       |  1 +
 doc/guides/rel_notes/release_24_11.rst |  3 ++
 drivers/baseband/acc/acc_common.h      | 18 ++++++++
 drivers/baseband/acc/rte_acc100_pmd.c  | 45 ++++++-------------
 drivers/baseband/acc/rte_vrb_pmd.c     | 61 ++++++++------------------
 lib/bbdev/rte_bbdev.h                  |  2 +
 6 files changed, 56 insertions(+), 74 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v2 1/3] bbdev: new queue stat for available enqueue depth
  2024-08-12 23:41  3% [PATCH v2 0/3] bbdev: sdditional queue stats Nicolas Chautru
@ 2024-08-12 23:41  6% ` Nicolas Chautru
  2024-08-13  7:22  0% ` [PATCH v2 0/3] bbdev: sdditional queue stats Hemant Agrawal
  1 sibling, 0 replies; 200+ results
From: Nicolas Chautru @ 2024-08-12 23:41 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

Capturing additional queue stats counter for the
depth of enqueue batch still available on the given
queue. This can help application to monitor that depth
at run time.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 doc/guides/rel_notes/release_24_11.rst | 3 +++
 lib/bbdev/rte_bbdev.h                  | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..a45b9b2dc6 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -88,6 +88,9 @@ API Changes
 ABI Changes
 -----------
 
+  * bbdev: Structure ``rte_bbdev_stats`` was updated to add new parameter
+    to optionally report number of enqueue batch available ``enqueue_depth_avail``.
+
 .. This section should contain ABI changes. Sample format:
 
    * sample: Add a short 1-2 sentence description of the ABI change
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 0cbfdd1c95..25514c58ac 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -283,6 +283,8 @@ struct rte_bbdev_stats {
 	 *     bbdev operation
 	 */
 	uint64_t acc_offload_cycles;
+	/** Available number of enqueue batch on that queue. */
+	uint16_t enqueue_depth_avail;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 0/3] bbdev: sdditional queue stats
  2024-08-12 23:41  3% [PATCH v2 0/3] bbdev: sdditional queue stats Nicolas Chautru
  2024-08-12 23:41  6% ` [PATCH v2 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
@ 2024-08-13  7:22  0% ` Hemant Agrawal
  1 sibling, 0 replies; 200+ results
From: Hemant Agrawal @ 2024-08-13  7:22 UTC (permalink / raw)
  To: dev

Series-

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>


On 13-08-2024 05:11, Nicolas Chautru wrote:
> v2: update to ABI doc suggested by Maxime.
>
>   These series include introducing a new paramter in the queue stat
> which can be used to monitor the number of available enqueue
> still possible.
> The acc PMD is then refactored to use a set of common function
> to update several queue status parameters including the new one.
> The application is also updated.
> Thanks
> Nic
>
> Nicolas Chautru (3):
>    bbdev: new queue stat for available enqueue depth
>    baseband/acc: refactor queue status update
>    test/bbdev: update for queue stats
>
>   app/test-bbdev/test_bbdev_perf.c       |  1 +
>   doc/guides/rel_notes/release_24_11.rst |  3 ++
>   drivers/baseband/acc/acc_common.h      | 18 ++++++++
>   drivers/baseband/acc/rte_acc100_pmd.c  | 45 ++++++-------------
>   drivers/baseband/acc/rte_vrb_pmd.c     | 61 ++++++++------------------
>   lib/bbdev/rte_bbdev.h                  |  2 +
>   6 files changed, 56 insertions(+), 74 deletions(-)
>

^ permalink raw reply	[relevance 0%]

* [RFC PATCH v2 26/26] config: add computed max queues define for compatibility
  @ 2024-08-13 16:00  5%   ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-08-13 16:00 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, thomas, mb, Bruce Richardson, Bruce Richardson

End applications may use the RTE_MAX_QUEUES_PER_PORT define in their
structure definitions, so keep a define present in DPDK for backward
compatibility. Rather than having a hard-coded value, we can use the
maximum of the Rx and Tx values as the overall max value. Rather than
using a macro which does the MAX() calculation inside it, we can compute
the actual value at configuration time and write it using meson.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/meson.build                     |  8 ++++++++
 doc/guides/rel_notes/deprecation.rst   | 11 +++++++++++
 doc/guides/rel_notes/release_24_11.rst |  8 +++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/config/meson.build b/config/meson.build
index fc41354c53..9677636754 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -372,6 +372,14 @@ if get_option('mbuf_refcnt_atomic')
 endif
 dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
 
+# set old MAX_QUEUES_PER_PORT option for compatibility. Compute
+# value as max of Rx and Tx counts
+if get_option('max_ethport_rx_queues') > get_option('max_ethport_tx_queues')
+    dpdk_conf.set('RTE_MAX_QUEUES_PER_PORT', get_option('max_ethport_rx_queues'))
+else
+    dpdk_conf.set('RTE_MAX_QUEUES_PER_PORT', get_option('max_ethport_tx_queues'))
+endif
+
 compile_time_cpuflags = []
 subdir(arch_subdir)
 dpdk_conf.set('RTE_COMPILE_TIME_CPUFLAGS', ','.join(compile_time_cpuflags))
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 211f59fdc9..e4ba00040f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -17,6 +17,17 @@ Other API and ABI deprecation notices are to be posted below.
 Deprecation Notices
 -------------------
 
+* config: The define ``RTE_MAX_QUEUES_PER_PORT`` should be considered deprecated
+  and may be removed in a future release.
+  Its use in apps should be replaced by ``RTE_MAX_ETHPORT_RX_QUEUES`` or ``RTE_MAX_ETHPORT_TX_QUEUES``,
+  as appropriate.
+
+* config: The ``RTE_MAX_QUEUES_PER_PORT`` value is no longer hard-coded to 1024.
+  Its value is now computed at configuration time to be the maximum of the configured max Rx and Tx queue values,
+  given by the meson options ``max_ethport_rx_queues`` and ``max_ethport_tx_queues``.
+  If these are unmodified from the defaults,
+  the value of ``RTE_MAX_QUEUES_PER_PORT`` will be 256.
+
 * build: The ``enable_kmods`` option is deprecated and will be removed in a future release.
   Setting/clearing the option has no impact on the build.
   Instead, kernel modules will be always built for OS's where out-of-tree kernel modules
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 825cc0fad9..130564d38e 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -67,9 +67,15 @@ New Features
 
    The default max values for Rx and Tx queue limits are reduced from 1024, in previous releases,
    to 256 in this release.
-   For application that require large numbers of queues,
+   For applications that require large numbers of queues,
    these defaults can be changed via the meson configuration options described above.
 
+.. note::
+
+   The define ``RTE_MAX_QUEUES_PER_PORT`` is kept for backward compatibility.
+   Its value is no longer hard-coded,
+   but is set, at configuration time, to the maximum of the configured max Rx and Tx queue values.
+
 
 Removed Items
 -------------
-- 
2.43.0


^ permalink raw reply	[relevance 5%]

* [PATCH v3 26/26] config: add computed max queues define for compatibility
  @ 2024-08-14 10:49  5%   ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-08-14 10:49 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, thomas, mb, Bruce Richardson

End applications may use the RTE_MAX_QUEUES_PER_PORT define in their
structure definitions, so keep a define present in DPDK for backward
compatibility. Rather than having a hard-coded value, we can use the
maximum of the Rx and Tx values as the overall max value. Rather than
using a macro which does the MAX() calculation inside it, we can compute
the actual value at configuration time and write it using meson.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 config/meson.build                     |  8 ++++++++
 doc/guides/rel_notes/deprecation.rst   | 11 +++++++++++
 doc/guides/rel_notes/release_24_11.rst |  8 +++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/config/meson.build b/config/meson.build
index fc41354c53..9677636754 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -372,6 +372,14 @@ if get_option('mbuf_refcnt_atomic')
 endif
 dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
 
+# set old MAX_QUEUES_PER_PORT option for compatibility. Compute
+# value as max of Rx and Tx counts
+if get_option('max_ethport_rx_queues') > get_option('max_ethport_tx_queues')
+    dpdk_conf.set('RTE_MAX_QUEUES_PER_PORT', get_option('max_ethport_rx_queues'))
+else
+    dpdk_conf.set('RTE_MAX_QUEUES_PER_PORT', get_option('max_ethport_tx_queues'))
+endif
+
 compile_time_cpuflags = []
 subdir(arch_subdir)
 dpdk_conf.set('RTE_COMPILE_TIME_CPUFLAGS', ','.join(compile_time_cpuflags))
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 211f59fdc9..e4ba00040f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -17,6 +17,17 @@ Other API and ABI deprecation notices are to be posted below.
 Deprecation Notices
 -------------------
 
+* config: The define ``RTE_MAX_QUEUES_PER_PORT`` should be considered deprecated
+  and may be removed in a future release.
+  Its use in apps should be replaced by ``RTE_MAX_ETHPORT_RX_QUEUES`` or ``RTE_MAX_ETHPORT_TX_QUEUES``,
+  as appropriate.
+
+* config: The ``RTE_MAX_QUEUES_PER_PORT`` value is no longer hard-coded to 1024.
+  Its value is now computed at configuration time to be the maximum of the configured max Rx and Tx queue values,
+  given by the meson options ``max_ethport_rx_queues`` and ``max_ethport_tx_queues``.
+  If these are unmodified from the defaults,
+  the value of ``RTE_MAX_QUEUES_PER_PORT`` will be 256.
+
 * build: The ``enable_kmods`` option is deprecated and will be removed in a future release.
   Setting/clearing the option has no impact on the build.
   Instead, kernel modules will be always built for OS's where out-of-tree kernel modules
diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 825cc0fad9..130564d38e 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -67,9 +67,15 @@ New Features
 
    The default max values for Rx and Tx queue limits are reduced from 1024, in previous releases,
    to 256 in this release.
-   For application that require large numbers of queues,
+   For applications that require large numbers of queues,
    these defaults can be changed via the meson configuration options described above.
 
+.. note::
+
+   The define ``RTE_MAX_QUEUES_PER_PORT`` is kept for backward compatibility.
+   Its value is no longer hard-coded,
+   but is set, at configuration time, to the maximum of the configured max Rx and Tx queue values.
+
 
 Removed Items
 -------------
-- 
2.43.0


^ permalink raw reply	[relevance 5%]

* [PATCH v17 5/5] dts: add API doc generation
  @ 2024-08-14 15:05  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-14 15:05 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content. There's other Sphinx
configuration related to Python docstrings which doesn't affect DPDK doc
build. All new configuration is in a conditional block, applied only
when DTS API docs are built to not interfere with DPDK doc build.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

And finally, the DTS API docs can be accessed from the DPDK API doxygen
page.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py           |  2 +
 buildtools/get-dts-runtime-deps.py        | 84 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/dts/custom.css                    |  1 +
 doc/api/dts/meson.build                   | 31 +++++++++
 doc/api/meson.build                       |  6 +-
 doc/guides/conf.py                        | 44 +++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 ++
 doc/guides/tools/dts.rst                  | 39 ++++++++++-
 doc/meson.build                           |  1 +
 13 files changed, 217 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-runtime-deps.py
 create mode 120000 doc/api/dts/custom.css
 create mode 100644 doc/api/dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..154e9f907b 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,8 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+if 'dts' in src:
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
diff --git a/buildtools/get-dts-runtime-deps.py b/buildtools/get-dts-runtime-deps.py
new file mode 100755
index 0000000000..68244480a3
--- /dev/null
+++ b/buildtools/get-dts-runtime-deps.py
@@ -0,0 +1,84 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script exits with the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime dependencies in the DTS pyproject.toml file
+and returns a list of module names used in an import statement (import packages) that are missing.
+This function is not used when the module is run as a script and is available to be imported.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+from packaging.version import Version
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {'invoke': '>=1.3', 'paramiko': '>=2.4'}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    return {dep: deps_section[dep].strip('"\'') for dep in deps_section}
+
+
+def get_missing_imports():
+    """Get missing DTS import packages from third party libraries.
+
+    Scan the DTS pyproject.toml file for dependencies and find those that are not installed.
+    The dependencies in pyproject.toml are listed by their distribution package names,
+    but the function finds the associated import packages - those used in import statements.
+
+    The function is not used when the module is run as a script. It should be imported.
+
+    Returns:
+        A list of missing import packages.
+    """
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, req_ver in (req_deps | _EXTRA_DEPS).items():
+        try:
+            req_ver = Version(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = Version(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(req_dep.lower().replace('-', '_'))
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = Version(platform.python_version())
+        req_ver = Version(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..6b938d767c 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_runtime_deps = py3 + files('get-dts-runtime-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/dts/custom.css b/doc/api/dts/custom.css
new file mode 120000
index 0000000000..3c9480c4a0
--- /dev/null
+++ b/doc/api/dts/custom.css
@@ -0,0 +1 @@
+../../guides/custom.css
\ No newline at end of file
diff --git a/doc/api/dts/meson.build b/doc/api/dts/meson.build
new file mode 100644
index 0000000000..f338eb69bf
--- /dev/null
+++ b/doc/api/dts/meson.build
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_runtime_deps, check: false).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
+
+extra_sphinx_args = ['-E', '-c', join_paths(doc_source_dir, 'guides')]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..71b861e42b 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,11 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+# initialize common Doxygen configuration
+cdata = configuration_data()
+
+subdir('dts')
+
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -30,7 +35,6 @@ example = custom_target('examples.dox',
         build_by_default: get_option('enable_docs'))
 
 # set up common Doxygen configuration
-cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
 cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
 cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..d7f3030838 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -58,6 +58,48 @@
              ("tools/devbind", "dpdk-devbind",
               "check device status and bind/unbind them from drivers", "", 8)]
 
+# DTS API docs additional configuration
+if environ.get('DTS_BUILD'):
+    extensions = ['sphinx.ext.napoleon', 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx']
+    # Napoleon enables the Google format of Python doscstrings.
+    napoleon_numpy_docstring = False
+    napoleon_attr_annotations = True
+    napoleon_preprocess_types = True
+
+    # Autodoc pulls documentation from code.
+    autodoc_default_options = {
+        'members': True,
+        'member-order': 'bysource',
+        'show-inheritance': True,
+    }
+    autodoc_class_signature = 'separated'
+    autodoc_typehints = 'both'
+    autodoc_typehints_format = 'short'
+    autodoc_typehints_description_target = 'documented'
+
+    # Intersphinx allows linking to external projects, such as Python docs.
+    intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+    # DTS docstring options.
+    add_module_names = False
+    toc_object_entries = True
+    toc_object_entries_show_parents = 'hide'
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-runtime-deps').get_missing_imports()
+
 
 # ####### :numref: fallback ########
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..9e8929f567 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``doc/api/dts``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/doc/meson.build b/doc/meson.build
index 6f74706aa2..1e0cfa4127 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_source_dir = meson.current_source_dir()
 doc_targets = []
 doc_target_names = []
 subdir('api')
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [RFC 6/6] ring: minimize reads of the counterpart cache-line
  @ 2024-08-15  8:53  2% ` Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-08-15  8:53 UTC (permalink / raw)
  To: dev
  Cc: honnappa.nagarahalli, jerinj, hemant.agrawal, bruce.richardson,
	drc, ruifeng.wang, mb, Konstantin Ananyev

From: Konstantin Ananyev <konstantin.ananyev@huawei.com>

Note upfront: this change shouldn't affect rte_ring public API.
Though as layout of public structures have changed - it is an ABI
breakage.

This is an attempt to implement rte_ring optimization
that was suggested by Morten and discussed on this mailing
list a while ago.
The idea is to optimize MP/SP & MC/SC ring enqueue/dequeue ops
by storing along with the head its Cached Foreign Tail (CFT) value.
I.E.: for producer we cache consumer tail value and visa-versa.
To avoid races head and CFT values are read/written using atomic
64-bit ops.
In theory that might help by reducing number of times producer
needs to access consumer's cache-line and visa-versa.
In practice, I didn't see any impressive boost so far:
-  ring_per_autotest micro-bench - results are a mixed bag,
   Some are a bit better, some are worse.
 - [so]ring_stress_autotest  micro-benchs: ~10-15% improvement
 - l3fwd in wqorder/wqundorder (see previous patch for details):
   no real difference.

Though so far my testing scope was quite limited, I tried it only
on x86 machines. So can I ask all interested parties:
different platform vendors (ARM, PPC, etc.)
and people who do use rte_ring extensively to give it a try and come up
with the feedback.

If there would be no real performance improvements on
any platform we support, or some problems will be encountered -
I am ok to drop that patch.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 drivers/net/mlx5/mlx5_hws_cnt.h   |  5 ++--
 drivers/net/ring/rte_eth_ring.c   |  2 +-
 lib/ring/rte_ring.c               |  6 ++--
 lib/ring/rte_ring_core.h          | 12 +++++++-
 lib/ring/rte_ring_generic_pvt.h   | 46 +++++++++++++++++++++----------
 lib/ring/rte_ring_peek_elem_pvt.h |  4 +--
 lib/ring/soring.c                 | 31 +++++++++++++++------
 lib/ring/soring.h                 |  4 +--
 8 files changed, 77 insertions(+), 33 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 996ac8dd9a..663146563c 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -388,11 +388,12 @@ __mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
 
 	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
 	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
-	current_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
+	current_head = rte_atomic_load_explicit(&r->prod.head.val.pos,
+			rte_memory_order_relaxed);
 	MLX5_ASSERT(n <= r->capacity);
 	MLX5_ASSERT(n <= rte_ring_count(r));
 	revert2head = current_head - n;
-	r->prod.head = revert2head; /* This ring should be SP. */
+	r->prod.head.val.pos = revert2head; /* This ring should be SP. */
 	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
 			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
 	/* Update tail */
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index 1346a0dba3..31009e90d2 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -325,7 +325,7 @@ eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	 */
 	pmc->addr = &rng->prod.head;
 	pmc->size = sizeof(rng->prod.head);
-	pmc->opaque[0] = rng->prod.head;
+	pmc->opaque[0] = rng->prod.head.val.pos;
 	pmc->fn = ring_monitor_callback;
 	return 0;
 }
diff --git a/lib/ring/rte_ring.c b/lib/ring/rte_ring.c
index aebb6d6728..cb2c39c7ad 100644
--- a/lib/ring/rte_ring.c
+++ b/lib/ring/rte_ring.c
@@ -102,7 +102,7 @@ reset_headtail(void *p)
 	switch (ht->sync_type) {
 	case RTE_RING_SYNC_MT:
 	case RTE_RING_SYNC_ST:
-		ht->head = 0;
+		ht->head.raw = 0;
 		ht->tail = 0;
 		break;
 	case RTE_RING_SYNC_MT_RTS:
@@ -373,9 +373,9 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  capacity=%"PRIu32"\n", r->capacity);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
-	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
+	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head.val.pos);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
-	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
+	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head.val.pos);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
 }
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 270869d214..b88a1bc352 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -66,8 +66,17 @@ enum rte_ring_sync_type {
  * Depending on sync_type format of that structure might be different,
  * but offset for *sync_type* and *tail* values should remain the same.
  */
+union __rte_ring_head_cft {
+	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
+	alignas(sizeof(uint64_t)) RTE_ATOMIC(uint64_t) raw;
+	struct {
+		uint32_t pos; /**< head position */
+		uint32_t cft; /**< cached foreign tail value*/
+	} val;
+};
+
 struct rte_ring_headtail {
-	volatile RTE_ATOMIC(uint32_t) head;      /**< prod/consumer head. */
+	uint32_t __unused;
 	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
 	union {
 		/** sync type of prod/cons */
@@ -75,6 +84,7 @@ struct rte_ring_headtail {
 		/** deprecated -  True if single prod/cons */
 		uint32_t single;
 	};
+	union __rte_ring_head_cft head;
 };
 
 union __rte_ring_rts_poscnt {
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
index 12f3595926..e70f4ff32c 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -38,17 +38,18 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 {
 	unsigned int max = n;
 	int success;
+	uint32_t tail;
+	union __rte_ring_head_cft nh, oh;
+
+	oh.raw = rte_atomic_load_explicit(&d->head.raw,
+			rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		*old_head = d->head;
-
-		/* add rmb barrier to avoid load/load reorder in weak
-		 * memory model. It is noop on x86
-		 */
-		rte_smp_rmb();
+		*old_head = oh.val.pos;
+		tail = oh.val.cft;
 
 		/*
 		 *  The subtraction is done between two unsigned 32bits value
@@ -56,24 +57,41 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 		 * *old_head > s->tail). So 'free_entries' is always between 0
 		 * and capacity (which is < size).
 		 */
-		*entries = (capacity + s->tail - *old_head);
+		*entries = (capacity + tail - *old_head);
 
-		/* check that we have enough room in ring */
-		if (unlikely(n > *entries))
-			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+		/* attempt #1: check that we have enough room with
+		 * cached-foreign-tail value.
+		 * Note that actual tail value can go forward till we cached
+		 * it, in that case we might have to update our cached value.
+		 */
+		if (unlikely(n > *entries)) {
+
+			tail = rte_atomic_load_explicit(&s->tail,
+				rte_memory_order_relaxed);
+			*entries = (capacity + tail - *old_head);
+
+			/* attempt #2: check that we have enough room in ring */
+			if (unlikely(n > *entries))
+				n = (behavior == RTE_RING_QUEUE_FIXED) ?
 					0 : *entries;
+		}
 
 		if (n == 0)
 			return 0;
 
 		*new_head = *old_head + n;
+		nh.val.pos = *new_head;
+		nh.val.cft = tail;
+
 		if (is_st) {
-			d->head = *new_head;
+			d->head.raw = nh.raw;
 			success = 1;
 		} else
-			success = rte_atomic32_cmpset(
-					(uint32_t *)(uintptr_t)&d->head,
-					*old_head, *new_head);
+			success = rte_atomic_compare_exchange_strong_explicit(
+				&d->head.raw, (uint64_t *)(uintptr_t)&oh.raw,
+				nh.raw, rte_memory_order_acquire,
+				rte_memory_order_acquire);
+
 	} while (unlikely(success == 0));
 	return n;
 }
diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
index b5f0822b7e..e4dd0ae094 100644
--- a/lib/ring/rte_ring_peek_elem_pvt.h
+++ b/lib/ring/rte_ring_peek_elem_pvt.h
@@ -33,7 +33,7 @@ __rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
 {
 	uint32_t h, n, t;
 
-	h = ht->head;
+	h = ht->head.val.pos;
 	t = ht->tail;
 	n = h - t;
 
@@ -58,7 +58,7 @@ __rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
 	RTE_SET_USED(enqueue);
 
 	pos = tail + num;
-	ht->head = pos;
+	ht->head.val.pos = pos;
 	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
 }
 
diff --git a/lib/ring/soring.c b/lib/ring/soring.c
index 929bde9697..baa449c872 100644
--- a/lib/ring/soring.c
+++ b/lib/ring/soring.c
@@ -90,7 +90,8 @@ __rte_soring_stage_finalize(struct soring_stage_headtail *sht,
 	 * already finished.
 	 */
 
-	head = rte_atomic_load_explicit(&sht->head, rte_memory_order_relaxed);
+	head = rte_atomic_load_explicit(&sht->head.val.pos,
+			rte_memory_order_relaxed);
 	n = RTE_MIN(head - ot.pos, maxn);
 	for (i = 0, tail = ot.pos; i < n; i += k, tail += k) {
 
@@ -213,22 +214,36 @@ __rte_soring_stage_move_head(struct soring_stage_headtail *d,
 	uint32_t *old_head, uint32_t *new_head, uint32_t *avail)
 {
 	uint32_t n, tail;
+	union __rte_ring_head_cft nh, oh;
 
-	*old_head = rte_atomic_load_explicit(&d->head,
+	oh.raw = rte_atomic_load_explicit(&d->head.raw,
 			rte_memory_order_acquire);
 
 	do {
 		n = num;
-		tail = rte_atomic_load_explicit(&s->tail,
-				rte_memory_order_relaxed);
+		*old_head = oh.val.pos;
+		tail = oh.val.cft;
 		*avail = capacity + tail - *old_head;
-		if (n > *avail)
-			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *avail;
+
+		if (n > *avail) {
+			tail = rte_atomic_load_explicit(&s->tail,
+				rte_memory_order_relaxed);
+			*avail = capacity + tail - *old_head;
+
+			if (n > *avail)
+				n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *avail;
+		}
+
 		if (n == 0)
 			return 0;
+
 		*new_head = *old_head + n;
-	} while (rte_atomic_compare_exchange_strong_explicit(&d->head,
-			old_head, *new_head, rte_memory_order_acquire,
+		nh.val.pos = *new_head;
+		nh.val.cft = tail;
+
+	} while (rte_atomic_compare_exchange_strong_explicit(&d->head.raw,
+			&oh.raw, nh.raw, rte_memory_order_acquire,
 			rte_memory_order_acquire) == 0);
 
 	return n;
diff --git a/lib/ring/soring.h b/lib/ring/soring.h
index 3a3f6efa76..0fb333aa71 100644
--- a/lib/ring/soring.h
+++ b/lib/ring/soring.h
@@ -60,8 +60,8 @@ union soring_stage_tail {
 
 struct soring_stage_headtail {
 	volatile union soring_stage_tail tail;
-	enum rte_ring_sync_type unused;  /**< unused */
-	volatile RTE_ATOMIC(uint32_t) head;
+	enum rte_ring_sync_type __unused;  /**< unused */
+	union __rte_ring_head_cft head;
 };
 
 /**
-- 
2.35.3


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 5/5] devtools: test different build types
  @ 2024-08-15 16:26  0%     ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-08-15 16:26 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, bruce.richardson, david.marchand, Andrew Rybchenko

On Sun,  8 Aug 2021 14:51:38 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> All builds were of type debugoptimized.
> It is kept only for builds having an ABI check.
> Others will have the default build type (release),
> except if specified differently as in the x86 generic build
> which will be a test of the non-optimized debug build type.
> Some static builds will test the minsize build type.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> ---
> 
> This patch cannot be merged now because it makes clang 11.1.0 crashing.
> ---

Dropping this patch from patchwork because of the clang crash.

^ permalink raw reply	[relevance 0%]

* [PATCH v18 5/5] dts: add API doc generation
  @ 2024-08-20 13:18  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-20 13:18 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš, Dean Marx

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content. There's other Sphinx
configuration related to Python docstrings which doesn't affect DPDK doc
build. All new configuration is in a conditional block, applied only
when DTS API docs are built to not interfere with DPDK doc build.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

And finally, the DTS API docs can be accessed from the DPDK API doxygen
page.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Reviewed-by: Dean Marx <dmarx@iol.unh.edu>
---
 buildtools/call-sphinx-build.py           |  2 +
 buildtools/get-dts-runtime-deps.py        | 95 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/dts/custom.css                    |  1 +
 doc/api/dts/meson.build                   | 31 ++++++++
 doc/api/meson.build                       |  6 +-
 doc/guides/conf.py                        | 44 ++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 +
 doc/guides/tools/dts.rst                  | 39 +++++++++-
 doc/meson.build                           |  1 +
 13 files changed, 228 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-runtime-deps.py
 create mode 120000 doc/api/dts/custom.css
 create mode 100644 doc/api/dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..154e9f907b 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,8 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+if 'dts' in src:
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
diff --git a/buildtools/get-dts-runtime-deps.py b/buildtools/get-dts-runtime-deps.py
new file mode 100755
index 0000000000..6f4d3def29
--- /dev/null
+++ b/buildtools/get-dts-runtime-deps.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script exits with the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime dependencies in the DTS pyproject.toml file
+and returns a list of module names used in an import statement (import packages) that are missing.
+This function is not used when the module is run as a script and is available to be imported.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+from packaging.version import Version
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {
+    'invoke': {'version': '>=1.3'},
+    'paramiko': {'version': '>=2.4'},
+    'PyYAML': {'version': '^6.0', 'import_package': 'yaml'}
+}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    return {dep: {'version': deps_section[dep].strip('"\'')} for dep in deps_section}
+
+
+def get_missing_imports():
+    """Get missing DTS import packages from third party libraries.
+
+    Scan the DTS pyproject.toml file for dependencies and find those that are not installed.
+    The dependencies in pyproject.toml are listed by their distribution package names,
+    but the function finds the associated import packages - those used in import statements.
+
+    The function is not used when the module is run as a script. It should be imported.
+
+    Returns:
+        A list of missing import packages.
+    """
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, dep_data in (req_deps | _EXTRA_DEPS).items():
+        req_ver = dep_data['version']
+        try:
+            import_package = dep_data['import_package']
+        except KeyError:
+            import_package = req_dep
+        import_package = import_package.lower().replace('-', '_')
+
+        try:
+            req_ver = Version(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = Version(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(import_package)
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = Version(platform.python_version())
+        req_ver = Version(python_version.strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..6b938d767c 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_runtime_deps = py3 + files('get-dts-runtime-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/dts/custom.css b/doc/api/dts/custom.css
new file mode 120000
index 0000000000..3c9480c4a0
--- /dev/null
+++ b/doc/api/dts/custom.css
@@ -0,0 +1 @@
+../../guides/custom.css
\ No newline at end of file
diff --git a/doc/api/dts/meson.build b/doc/api/dts/meson.build
new file mode 100644
index 0000000000..f338eb69bf
--- /dev/null
+++ b/doc/api/dts/meson.build
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_runtime_deps, check: false).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
+
+extra_sphinx_args = ['-E', '-c', join_paths(doc_source_dir, 'guides')]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..71b861e42b 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,11 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+# initialize common Doxygen configuration
+cdata = configuration_data()
+
+subdir('dts')
+
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -30,7 +35,6 @@ example = custom_target('examples.dox',
         build_by_default: get_option('enable_docs'))
 
 # set up common Doxygen configuration
-cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
 cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
 cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..d7f3030838 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -58,6 +58,48 @@
              ("tools/devbind", "dpdk-devbind",
               "check device status and bind/unbind them from drivers", "", 8)]
 
+# DTS API docs additional configuration
+if environ.get('DTS_BUILD'):
+    extensions = ['sphinx.ext.napoleon', 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx']
+    # Napoleon enables the Google format of Python doscstrings.
+    napoleon_numpy_docstring = False
+    napoleon_attr_annotations = True
+    napoleon_preprocess_types = True
+
+    # Autodoc pulls documentation from code.
+    autodoc_default_options = {
+        'members': True,
+        'member-order': 'bysource',
+        'show-inheritance': True,
+    }
+    autodoc_class_signature = 'separated'
+    autodoc_typehints = 'both'
+    autodoc_typehints_format = 'short'
+    autodoc_typehints_description_target = 'documented'
+
+    # Intersphinx allows linking to external projects, such as Python docs.
+    intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+    # DTS docstring options.
+    add_module_names = False
+    toc_object_entries = True
+    toc_object_entries_show_parents = 'hide'
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-runtime-deps').get_missing_imports()
+
 
 # ####### :numref: fallback ########
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..9e8929f567 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``doc/api/dts``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/doc/meson.build b/doc/meson.build
index 6f74706aa2..1e0cfa4127 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_source_dir = meson.current_source_dir()
 doc_targets = []
 doc_target_names = []
 subdir('api')
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH v19 5/5] dts: add API doc generation
  @ 2024-08-21 15:02  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-08-21 15:02 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, jspewock, probb, paul.szczepanek,
	Luca.Vizzarro, npratte, dmarx, alex.chapman
  Cc: dev, Juraj Linkeš, Bruce Richardson

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content. There's other Sphinx
configuration related to Python docstrings which doesn't affect DPDK doc
build. All new configuration is in a conditional block, applied only
when DTS API docs are built to not interfere with DPDK doc build.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to objects in external documentations, such as the Python documentation.

There is one requirement for building DTS docs - the same Python version
as DTS or higher, because Sphinx's autodoc extension imports the code.

The dependencies needed to import the code don't have to be satisfied,
as the autodoc extension allows us to mock the imports. The missing
packages are taken from the DTS pyproject.toml file.

And finally, the DTS API docs can be accessed from the DPDK API doxygen
page.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Cc: Bruce Richardson <bruce.richardson@intel.com>

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Reviewed-by: Dean Marx <dmarx@iol.unh.edu>
---
 buildtools/call-sphinx-build.py           |  2 +
 buildtools/get-dts-runtime-deps.py        | 95 +++++++++++++++++++++++
 buildtools/meson.build                    |  1 +
 doc/api/doxy-api-index.md                 |  3 +
 doc/api/doxy-api.conf.in                  |  2 +
 doc/api/dts/custom.css                    |  1 +
 doc/api/dts/meson.build                   | 31 ++++++++
 doc/api/meson.build                       |  6 +-
 doc/guides/conf.py                        | 44 ++++++++++-
 doc/guides/contributing/documentation.rst |  2 +
 doc/guides/contributing/patches.rst       |  4 +
 doc/guides/tools/dts.rst                  | 39 +++++++++-
 doc/meson.build                           |  1 +
 13 files changed, 228 insertions(+), 3 deletions(-)
 create mode 100755 buildtools/get-dts-runtime-deps.py
 create mode 120000 doc/api/dts/custom.css
 create mode 100644 doc/api/dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 623e7363ee..154e9f907b 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -15,6 +15,8 @@
 
 # set the version in environment for sphinx to pick up
 os.environ['DPDK_VERSION'] = version
+if 'dts' in src:
+    os.environ['DTS_BUILD'] = "y"
 
 sphinx_cmd = [sphinx] + extra_args
 
diff --git a/buildtools/get-dts-runtime-deps.py b/buildtools/get-dts-runtime-deps.py
new file mode 100755
index 0000000000..1636a6dbf5
--- /dev/null
+++ b/buildtools/get-dts-runtime-deps.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 PANTHEON.tech s.r.o.
+#
+
+"""Utilities for DTS dependencies.
+
+The module can be used as an executable script,
+which verifies that the running Python version meets the version requirement of DTS.
+The script exits with the standard exit codes in this mode (0 is success, 1 is failure).
+
+The module also contains a function, get_missing_imports,
+which looks for runtime dependencies in the DTS pyproject.toml file
+and returns a list of module names used in an import statement (import packages) that are missing.
+This function is not used when the module is run as a script and is available to be imported.
+"""
+
+import configparser
+import importlib.metadata
+import importlib.util
+import os.path
+import platform
+
+from packaging.version import Version
+
+_VERSION_COMPARISON_CHARS = '^<>='
+_EXTRA_DEPS = {
+    'invoke': {'version': '>=1.3'},
+    'paramiko': {'version': '>=2.4'},
+    'PyYAML': {'version': '^6.0', 'import_package': 'yaml'}
+}
+_DPDK_ROOT = os.path.dirname(os.path.dirname(__file__))
+_DTS_DEP_FILE_PATH = os.path.join(_DPDK_ROOT, 'dts', 'pyproject.toml')
+
+
+def _get_dependencies(cfg_file_path):
+    cfg = configparser.ConfigParser()
+    with open(cfg_file_path) as f:
+        dts_deps_file_str = f.read()
+        dts_deps_file_str = dts_deps_file_str.replace("\n]", "]")
+        cfg.read_string(dts_deps_file_str)
+
+    deps_section = cfg['tool.poetry.dependencies']
+    return {dep: {'version': deps_section[dep].strip('"\'')} for dep in deps_section}
+
+
+def get_missing_imports():
+    """Get missing DTS import packages from third party libraries.
+
+    Scan the DTS pyproject.toml file for dependencies and find those that are not installed.
+    The dependencies in pyproject.toml are listed by their distribution package names,
+    but the function finds the associated import packages - those used in import statements.
+
+    The function is not used when the module is run as a script. It should be imported.
+
+    Returns:
+        A list of missing import packages.
+    """
+    missing_imports = []
+    req_deps = _get_dependencies(_DTS_DEP_FILE_PATH)
+    req_deps.pop('python')
+
+    for req_dep, dep_data in (req_deps | _EXTRA_DEPS).items():
+        req_ver = dep_data['version']
+        try:
+            import_package = dep_data['import_package']
+        except KeyError:
+            import_package = req_dep
+        import_package = import_package.lower().replace('-', '_')
+
+        try:
+            req_ver = Version(req_ver.strip(_VERSION_COMPARISON_CHARS))
+            found_dep_ver = Version(importlib.metadata.version(req_dep))
+            if found_dep_ver < req_ver:
+                print(
+                    f'The version "{found_dep_ver}" of package "{req_dep}" '
+                    f'is lower than required "{req_ver}".'
+                )
+        except importlib.metadata.PackageNotFoundError:
+            print(f'Package "{req_dep}" not found.')
+            missing_imports.append(import_package)
+
+    return missing_imports
+
+
+if __name__ == '__main__':
+    python_version = _get_dependencies(_DTS_DEP_FILE_PATH).pop('python')
+    if python_version:
+        sys_ver = Version(platform.python_version())
+        req_ver = Version(python_version['version'].strip(_VERSION_COMPARISON_CHARS))
+        if sys_ver < req_ver:
+            print(
+                f'The available Python version "{sys_ver}" is lower than required "{req_ver}".'
+            )
+            exit(1)
diff --git a/buildtools/meson.build b/buildtools/meson.build
index 3adf34e1a8..6b938d767c 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -24,6 +24,7 @@ get_numa_count_cmd = py3 + files('get-numa-count.py')
 get_test_suites_cmd = py3 + files('get-test-suites.py')
 has_hugepages_cmd = py3 + files('has-hugepages.py')
 cmdline_gen_cmd = py3 + files('dpdk-cmdline-gen.py')
+get_dts_runtime_deps = py3 + files('get-dts-runtime-deps.py')
 
 # install any build tools that end-users might want also
 install_data([
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ab223bcdf7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -245,3 +245,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index a8823c046f..c94f02d411 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -124,6 +124,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/dts/custom.css b/doc/api/dts/custom.css
new file mode 120000
index 0000000000..3c9480c4a0
--- /dev/null
+++ b/doc/api/dts/custom.css
@@ -0,0 +1 @@
+../../guides/custom.css
\ No newline at end of file
diff --git a/doc/api/dts/meson.build b/doc/api/dts/meson.build
new file mode 100644
index 0000000000..f338eb69bf
--- /dev/null
+++ b/doc/api/dts/meson.build
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
+if not sphinx.found()
+    subdir_done()
+endif
+
+python_ver_satisfied = run_command(get_dts_runtime_deps, check: false).returncode()
+if python_ver_satisfied != 0
+    subdir_done()
+endif
+
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
+
+extra_sphinx_args = ['-E', '-c', join_paths(doc_source_dir, 'guides')]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), meson.current_build_dir(), extra_sphinx_args],
+        build_by_default: get_option('enable_docs'),
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..71b861e42b 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,11 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+# initialize common Doxygen configuration
+cdata = configuration_data()
+
+subdir('dts')
+
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -30,7 +35,6 @@ example = custom_target('examples.dox',
         build_by_default: get_option('enable_docs'))
 
 # set up common Doxygen configuration
-cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
 cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
 cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..d7f3030838 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -10,7 +10,7 @@
 from os.path import basename
 from os.path import dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -58,6 +58,48 @@
              ("tools/devbind", "dpdk-devbind",
               "check device status and bind/unbind them from drivers", "", 8)]
 
+# DTS API docs additional configuration
+if environ.get('DTS_BUILD'):
+    extensions = ['sphinx.ext.napoleon', 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx']
+    # Napoleon enables the Google format of Python doscstrings.
+    napoleon_numpy_docstring = False
+    napoleon_attr_annotations = True
+    napoleon_preprocess_types = True
+
+    # Autodoc pulls documentation from code.
+    autodoc_default_options = {
+        'members': True,
+        'member-order': 'bysource',
+        'show-inheritance': True,
+    }
+    autodoc_class_signature = 'separated'
+    autodoc_typehints = 'both'
+    autodoc_typehints_format = 'short'
+    autodoc_typehints_description_target = 'documented'
+
+    # Intersphinx allows linking to external projects, such as Python docs.
+    intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+    # DTS docstring options.
+    add_module_names = False
+    toc_object_entries = True
+    toc_object_entries_show_parents = 'hide'
+    # DTS Sidebar config.
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,  # unlimited depth
+    }
+
+    # Add path to DTS sources so that Sphinx can find them.
+    dpdk_root = dirname(dirname(dirname(__file__)))
+    path.append(path_join(dpdk_root, 'dts'))
+
+    # Get missing DTS dependencies. Add path to buildtools to find the get_missing_imports function.
+    path.append(path_join(dpdk_root, 'buildtools'))
+    import importlib
+    # Ignore missing imports from DTS dependencies.
+    autodoc_mock_imports = importlib.import_module('get-dts-runtime-deps').get_missing_imports()
+
 
 # ####### :numref: fallback ########
 # The following hook functions add some simple handling for the :numref:
diff --git a/doc/guides/contributing/documentation.rst b/doc/guides/contributing/documentation.rst
index 68454ae0d5..7b287ce631 100644
--- a/doc/guides/contributing/documentation.rst
+++ b/doc/guides/contributing/documentation.rst
@@ -133,6 +133,8 @@ added to by the developer.
 Building the Documentation
 --------------------------
 
+.. _doc_dependencies:
+
 Dependencies
 ~~~~~~~~~~~~
 
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 04c66bebc4..6629928bee 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -499,6 +499,10 @@ The script usage is::
 For both of the above scripts, the -n option is used to specify a number of commits from HEAD,
 and the -r option allows the user specify a ``git log`` range.
 
+Additionally, when contributing to the DTS tool, patches should also be checked using
+the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo.
+To run the script, extra :ref:`Python dependencies <dts_deps>` are needed.
+
 .. _contrib_check_compilation:
 
 Checking Compilation
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 515b15e4d8..9e8929f567 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -54,6 +54,7 @@ DTS uses Poetry as its Python dependency management.
 Python build/development and runtime environments are the same and DTS development environment,
 DTS runtime environment or just plain DTS environment are used interchangeably.
 
+.. _dts_deps:
 
 Setting up DTS environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -291,8 +292,15 @@ When adding code to the DTS framework, pay attention to the rest of the code
 and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
+You should also build the :ref:`API documentation <building_api_docs>`
+to address any issues found during the build.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure,
+the corresponding changes must be made to DTS api doc sources in ``doc/api/dts``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -427,6 +435,35 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+The documentation is built using the standard DPDK build system.
+See :doc:`../linux_gsg/build_dpdk` for more details on compiling DPDK with meson.
+
+The :ref:`doc build dependencies <doc_dependencies>` may be installed with Poetry:
+
+.. code-block:: console
+
+   poetry install --no-root --only docs
+   poetry install --no-root --with docs  # an alternative that will also install DTS dependencies
+   poetry shell
+
+After executing the meson command, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings.
+
+
 Configuration Schema
 --------------------
 
diff --git a/doc/meson.build b/doc/meson.build
index 6f74706aa2..1e0cfa4127 100644
--- a/doc/meson.build
+++ b/doc/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_source_dir = meson.current_source_dir()
 doc_targets = []
 doc_target_names = []
 subdir('api')
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] app/testpmd: show output of commands read from file
  @ 2024-08-22 17:14  3%     ` Bruce Richardson
  2024-08-22 17:18  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2024-08-22 17:14 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Thu, Aug 22, 2024 at 05:53:27PM +0100, Ferruh Yigit wrote:
> On 8/22/2024 11:41 AM, Bruce Richardson wrote:
> > Testpmd supports the "--cmdline-file" parameter to read a set of initial
> > commands from a file. However, the only indication that this has been
> > done successfully on startup is a single-line message, no output from
> > the commands is seen.
> > 
> 
> For user I think it makes sense to see the command [1], only concern is
> if someone parsing testpmd output may be impacted on this, although I
> expect it should be trivial to update the relevant parsing.
> 
> [1]
> Btw, I can still see the command output, I assume because command does
> the printf itself, for example for 'show port summary 0' command:
> - before patch:
> ...
> Number of available ports: 2
> Port MAC Address       Name         Driver         Status   Link
> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> ...
> 
> - after patch
> ...
> testpmd> show port summary 0
> Number of available ports: 2
> Port MAC Address       Name         Driver         Status   Link
> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> ...
> 
> Only difference above is, after patch the command itself also printed.
> 
> 

That's because the function uses printf itself, which is actually wrong.
Any output from a cmdline function should use the "cmdline_printf" call
which outputs to the proper cmdline filehandle.

> > To improve usability here, we can use cmdline_new rather than
> > cmdline_file_new and have the output from the various commands sent to
> > stdout, allowing the user to see better what is happening.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > 
> > ---
> > v2: use STDOUT_FILENO in place of hard-coded "1"
> > ---
> >  app/test-pmd/cmdline.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > index b7759e38a8..52e64430d9 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -6,6 +6,7 @@
> >  #include <ctype.h>
> >  #include <stdarg.h>
> >  #include <errno.h>
> > +#include <fcntl.h>
> >  #include <stdio.h>
> >  #include <stdint.h>
> >  #include <stdlib.h>
> > @@ -13431,7 +13432,18 @@ cmdline_read_from_file(const char *filename)
> >  {
> >  	struct cmdline *cl;
> >  
> > -	cl = cmdline_file_new(main_ctx, "testpmd> ", filename);
> > +	/* cmdline_file_new does not produce any output which is not ideal here.
> > +	 * Much better to show output of the commands, so we open filename directly
> > +	 * and then pass that to cmdline_new with stdout as the output path.
> > +	 */
> > +	int fd = open(filename, O_RDONLY);
> > +	if (fd < 0) {
> > +		fprintf(stderr, "Failed to open file %s: %s\n",
> > +			filename, strerror(errno));
> > +		return;
> > +	}
> > +
> > +	cl = cmdline_new(main_ctx, "testpmd> ", fd, STDOUT_FILENO);
> >
> 
> Above is almost save as 'cmdline_file_new()' function with only
> difference that it uses '-1' for 's_out'.
> 
> If this usecase may be required by others, do you think does it have a
> value to pass 's_out' to 'cmdline_file_new()' or have a new version of
> API that accepts 's_out' as parameter?
> 

Yes, I thought about this, and actually started implementing a new API for
cmdline library to that. However, I decided that, given the complexity
here, that it's not really necessary - especially as there is no clear way
to do things. The options are:

* extend cmdline_file_new to have a flag to echo to stdout (which would be
  the very common case here).
* extend cmdline_file_new to take a file handle - this is more flexible,
  but slightly less usable.
* add a new cmdline_file_<something>_new function to echo to stdout.
* add a new cmdline_file_<something>_new function to write to a filehandle.

I don't like breaking the cmdline API (and ABI), so I didn't want to do
either #1 or #2, which would be the cleanest solutions. For #3 and #4,
naming is hard, and deciding between them is even harder. Given the choice,
I prefer #3, as I can't see #4 being very common and we always have
cmdline_new as a fallback anyway.

Overall, though, I threw away that work, because it didn't seem worth it,
for the sake of having the user to do an extra "open" call.

> btw, I recognized that 'cmdline' library doesn't have doxygen
> documentation, which is a gap to address. Next time when someone asks
> for entry level task, we can point this one.
> 

Yep, good idea.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2] app/testpmd: show output of commands read from file
  2024-08-22 17:14  3%     ` Bruce Richardson
@ 2024-08-22 17:18  0%       ` Bruce Richardson
  2024-08-22 21:09  3%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2024-08-22 17:18 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Thu, Aug 22, 2024 at 06:14:55PM +0100, Bruce Richardson wrote:
> On Thu, Aug 22, 2024 at 05:53:27PM +0100, Ferruh Yigit wrote:
> > On 8/22/2024 11:41 AM, Bruce Richardson wrote:
> > > Testpmd supports the "--cmdline-file" parameter to read a set of initial
> > > commands from a file. However, the only indication that this has been
> > > done successfully on startup is a single-line message, no output from
> > > the commands is seen.
> > > 
> > 
> > For user I think it makes sense to see the command [1], only concern is
> > if someone parsing testpmd output may be impacted on this, although I
> > expect it should be trivial to update the relevant parsing.
> > 
> > [1]
> > Btw, I can still see the command output, I assume because command does
> > the printf itself, for example for 'show port summary 0' command:
> > - before patch:
> > ...
> > Number of available ports: 2
> > Port MAC Address       Name         Driver         Status   Link
> > 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> > ...
> > 
> > - after patch
> > ...
> > testpmd> show port summary 0
> > Number of available ports: 2
> > Port MAC Address       Name         Driver         Status   Link
> > 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> > ...
> > 
> > Only difference above is, after patch the command itself also printed.
> > 
> > 
> 
> That's because the function uses printf itself, which is actually wrong.
> Any output from a cmdline function should use the "cmdline_printf" call
> which outputs to the proper cmdline filehandle.
> 
> > > To improve usability here, we can use cmdline_new rather than
> > > cmdline_file_new and have the output from the various commands sent to
> > > stdout, allowing the user to see better what is happening.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > 
> > > ---
> > > v2: use STDOUT_FILENO in place of hard-coded "1"
> > > ---
> > >  app/test-pmd/cmdline.c | 14 +++++++++++++-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > > index b7759e38a8..52e64430d9 100644
> > > --- a/app/test-pmd/cmdline.c
> > > +++ b/app/test-pmd/cmdline.c
> > > @@ -6,6 +6,7 @@
> > >  #include <ctype.h>
> > >  #include <stdarg.h>
> > >  #include <errno.h>
> > > +#include <fcntl.h>
> > >  #include <stdio.h>
> > >  #include <stdint.h>
> > >  #include <stdlib.h>
> > > @@ -13431,7 +13432,18 @@ cmdline_read_from_file(const char *filename)
> > >  {
> > >  	struct cmdline *cl;
> > >  
> > > -	cl = cmdline_file_new(main_ctx, "testpmd> ", filename);
> > > +	/* cmdline_file_new does not produce any output which is not ideal here.
> > > +	 * Much better to show output of the commands, so we open filename directly
> > > +	 * and then pass that to cmdline_new with stdout as the output path.
> > > +	 */
> > > +	int fd = open(filename, O_RDONLY);
> > > +	if (fd < 0) {
> > > +		fprintf(stderr, "Failed to open file %s: %s\n",
> > > +			filename, strerror(errno));
> > > +		return;
> > > +	}
> > > +
> > > +	cl = cmdline_new(main_ctx, "testpmd> ", fd, STDOUT_FILENO);
> > >
> > 
> > Above is almost save as 'cmdline_file_new()' function with only
> > difference that it uses '-1' for 's_out'.
> > 
> > If this usecase may be required by others, do you think does it have a
> > value to pass 's_out' to 'cmdline_file_new()' or have a new version of
> > API that accepts 's_out' as parameter?
> > 
> 
> Yes, I thought about this, and actually started implementing a new API for
> cmdline library to that. However, I decided that, given the complexity
> here, that it's not really necessary - especially as there is no clear way
> to do things. The options are:
> 
> * extend cmdline_file_new to have a flag to echo to stdout (which would be
>   the very common case here).
> * extend cmdline_file_new to take a file handle - this is more flexible,
>   but slightly less usable.
> * add a new cmdline_file_<something>_new function to echo to stdout.
> * add a new cmdline_file_<something>_new function to write to a filehandle.
> 
> I don't like breaking the cmdline API (and ABI), so I didn't want to do
> either #1 or #2, which would be the cleanest solutions. For #3 and #4,
> naming is hard, and deciding between them is even harder. Given the choice,
> I prefer #3, as I can't see #4 being very common and we always have
> cmdline_new as a fallback anyway.
> 
> Overall, though, I threw away that work, because it didn't seem worth it,
> for the sake of having the user to do an extra "open" call.
> 

And also to add:
If there is clear consensus on what the correct option for this case is,
I'm happy enough to go back and extend the cmdline library as agreed.
:-)

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] app/testpmd: show output of commands read from file
  2024-08-22 17:18  0%       ` Bruce Richardson
@ 2024-08-22 21:09  3%         ` Ferruh Yigit
  2024-08-23  9:12  0%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-08-22 21:09 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 8/22/2024 6:18 PM, Bruce Richardson wrote:
> On Thu, Aug 22, 2024 at 06:14:55PM +0100, Bruce Richardson wrote:
>> On Thu, Aug 22, 2024 at 05:53:27PM +0100, Ferruh Yigit wrote:
>>> On 8/22/2024 11:41 AM, Bruce Richardson wrote:
>>>> Testpmd supports the "--cmdline-file" parameter to read a set of initial
>>>> commands from a file. However, the only indication that this has been
>>>> done successfully on startup is a single-line message, no output from
>>>> the commands is seen.
>>>>
>>>
>>> For user I think it makes sense to see the command [1], only concern is
>>> if someone parsing testpmd output may be impacted on this, although I
>>> expect it should be trivial to update the relevant parsing.
>>>
>>> [1]
>>> Btw, I can still see the command output, I assume because command does
>>> the printf itself, for example for 'show port summary 0' command:
>>> - before patch:
>>> ...
>>> Number of available ports: 2
>>> Port MAC Address       Name         Driver         Status   Link
>>> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
>>> ...
>>>
>>> - after patch
>>> ...
>>> testpmd> show port summary 0
>>> Number of available ports: 2
>>> Port MAC Address       Name         Driver         Status   Link
>>> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
>>> ...
>>>
>>> Only difference above is, after patch the command itself also printed.
>>>
>>>
>>
>> That's because the function uses printf itself, which is actually wrong.
>> Any output from a cmdline function should use the "cmdline_printf" call
>> which outputs to the proper cmdline filehandle.
>>

Got it.
But in existing testpmd code, only a handful cmdline functions use the
'cmdline_printf' and most of them are in the same help function.
At this stage I think no need to update them. There is already some
confusion on testpmd logging between printf & TESTPMD_LOG().

>>>> To improve usability here, we can use cmdline_new rather than
>>>> cmdline_file_new and have the output from the various commands sent to
>>>> stdout, allowing the user to see better what is happening.
>>>>
>>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>>>
>>>> ---
>>>> v2: use STDOUT_FILENO in place of hard-coded "1"
>>>> ---
>>>>  app/test-pmd/cmdline.c | 14 +++++++++++++-
>>>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
>>>> index b7759e38a8..52e64430d9 100644
>>>> --- a/app/test-pmd/cmdline.c
>>>> +++ b/app/test-pmd/cmdline.c
>>>> @@ -6,6 +6,7 @@
>>>>  #include <ctype.h>
>>>>  #include <stdarg.h>
>>>>  #include <errno.h>
>>>> +#include <fcntl.h>
>>>>  #include <stdio.h>
>>>>  #include <stdint.h>
>>>>  #include <stdlib.h>
>>>> @@ -13431,7 +13432,18 @@ cmdline_read_from_file(const char *filename)
>>>>  {
>>>>  	struct cmdline *cl;
>>>>  
>>>> -	cl = cmdline_file_new(main_ctx, "testpmd> ", filename);
>>>> +	/* cmdline_file_new does not produce any output which is not ideal here.
>>>> +	 * Much better to show output of the commands, so we open filename directly
>>>> +	 * and then pass that to cmdline_new with stdout as the output path.
>>>> +	 */
>>>> +	int fd = open(filename, O_RDONLY);
>>>> +	if (fd < 0) {
>>>> +		fprintf(stderr, "Failed to open file %s: %s\n",
>>>> +			filename, strerror(errno));
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	cl = cmdline_new(main_ctx, "testpmd> ", fd, STDOUT_FILENO);
>>>>
>>>
>>> Above is almost save as 'cmdline_file_new()' function with only
>>> difference that it uses '-1' for 's_out'.
>>>
>>> If this usecase may be required by others, do you think does it have a
>>> value to pass 's_out' to 'cmdline_file_new()' or have a new version of
>>> API that accepts 's_out' as parameter?
>>>
>>
>> Yes, I thought about this, and actually started implementing a new API for
>> cmdline library to that. However, I decided that, given the complexity
>> here, that it's not really necessary - especially as there is no clear way
>> to do things. The options are:
>>
>> * extend cmdline_file_new to have a flag to echo to stdout (which would be
>>   the very common case here).
>> * extend cmdline_file_new to take a file handle - this is more flexible,
>>   but slightly less usable.
>> * add a new cmdline_file_<something>_new function to echo to stdout.
>> * add a new cmdline_file_<something>_new function to write to a filehandle.
>>
>> I don't like breaking the cmdline API (and ABI), so I didn't want to do
>> either #1 or #2, which would be the cleanest solutions. For #3 and #4,
>> naming is hard, and deciding between them is even harder. Given the choice,
>> I prefer #3, as I can't see #4 being very common and we always have
>> cmdline_new as a fallback anyway.
>>
>> Overall, though, I threw away that work, because it didn't seem worth it,
>> for the sake of having the user to do an extra "open" call.
>>
> 

I vote to option #1, but agree that it may not worth breaking API and ABI.

Is 'cmdline_file_new_v2()' too bad a name, perhaps better to go with
testpmd implementation, as you did in this patch.

> And also to add:
> If there is clear consensus on what the correct option for this case is,
> I'm happy enough to go back and extend the cmdline library as agreed.
> :-)


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2] app/testpmd: show output of commands read from file
  2024-08-22 21:09  3%         ` Ferruh Yigit
@ 2024-08-23  9:12  0%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-08-23  9:12 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Thu, Aug 22, 2024 at 10:09:09PM +0100, Ferruh Yigit wrote:
> On 8/22/2024 6:18 PM, Bruce Richardson wrote:
> > On Thu, Aug 22, 2024 at 06:14:55PM +0100, Bruce Richardson wrote:
> >> On Thu, Aug 22, 2024 at 05:53:27PM +0100, Ferruh Yigit wrote:
> >>> On 8/22/2024 11:41 AM, Bruce Richardson wrote:
> >>>> Testpmd supports the "--cmdline-file" parameter to read a set of initial
> >>>> commands from a file. However, the only indication that this has been
> >>>> done successfully on startup is a single-line message, no output from
> >>>> the commands is seen.
> >>>>
> >>>
> >>> For user I think it makes sense to see the command [1], only concern is
> >>> if someone parsing testpmd output may be impacted on this, although I
> >>> expect it should be trivial to update the relevant parsing.
> >>>
> >>> [1]
> >>> Btw, I can still see the command output, I assume because command does
> >>> the printf itself, for example for 'show port summary 0' command:
> >>> - before patch:
> >>> ...
> >>> Number of available ports: 2
> >>> Port MAC Address       Name         Driver         Status   Link
> >>> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> >>> ...
> >>>
> >>> - after patch
> >>> ...
> >>> testpmd> show port summary 0
> >>> Number of available ports: 2
> >>> Port MAC Address       Name         Driver         Status   Link
> >>> 0    xx:xx:xx:xx:xx:xx xxxx:xx:xx.x aaaaaaaa       up       xxx Gbps
> >>> ...
> >>>
> >>> Only difference above is, after patch the command itself also printed.
> >>>
> >>>
> >>
> >> That's because the function uses printf itself, which is actually wrong.
> >> Any output from a cmdline function should use the "cmdline_printf" call
> >> which outputs to the proper cmdline filehandle.
> >>
> 
> Got it.
> But in existing testpmd code, only a handful cmdline functions use the
> 'cmdline_printf' and most of them are in the same help function.
> At this stage I think no need to update them. There is already some
> confusion on testpmd logging between printf & TESTPMD_LOG().
> 

Agree. No point in updating the existing functions to use cmdline_printf vs
printf.

One other point related to echoing commands, there are also testpmd
commands that produce no output - the commands for configuring rte_tm,
being examples right now - and having those echoed to screen when read from
a file is the only way to know what is actually happening.

> >>>> To improve usability here, we can use cmdline_new rather than
> >>>> cmdline_file_new and have the output from the various commands sent to
> >>>> stdout, allowing the user to see better what is happening.
> >>>>
> >>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >>>>
> >>>> ---
> >>>> v2: use STDOUT_FILENO in place of hard-coded "1"
> >>>> ---
> >>>>  app/test-pmd/cmdline.c | 14 +++++++++++++-
> >>>>  1 file changed, 13 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> >>>> index b7759e38a8..52e64430d9 100644
> >>>> --- a/app/test-pmd/cmdline.c
> >>>> +++ b/app/test-pmd/cmdline.c
> >>>> @@ -6,6 +6,7 @@
> >>>>  #include <ctype.h>
> >>>>  #include <stdarg.h>
> >>>>  #include <errno.h>
> >>>> +#include <fcntl.h>
> >>>>  #include <stdio.h>
> >>>>  #include <stdint.h>
> >>>>  #include <stdlib.h>
> >>>> @@ -13431,7 +13432,18 @@ cmdline_read_from_file(const char *filename)
> >>>>  {
> >>>>  	struct cmdline *cl;
> >>>>  
> >>>> -	cl = cmdline_file_new(main_ctx, "testpmd> ", filename);
> >>>> +	/* cmdline_file_new does not produce any output which is not ideal here.
> >>>> +	 * Much better to show output of the commands, so we open filename directly
> >>>> +	 * and then pass that to cmdline_new with stdout as the output path.
> >>>> +	 */
> >>>> +	int fd = open(filename, O_RDONLY);
> >>>> +	if (fd < 0) {
> >>>> +		fprintf(stderr, "Failed to open file %s: %s\n",
> >>>> +			filename, strerror(errno));
> >>>> +		return;
> >>>> +	}
> >>>> +
> >>>> +	cl = cmdline_new(main_ctx, "testpmd> ", fd, STDOUT_FILENO);
> >>>>
> >>>
> >>> Above is almost save as 'cmdline_file_new()' function with only
> >>> difference that it uses '-1' for 's_out'.
> >>>
> >>> If this usecase may be required by others, do you think does it have a
> >>> value to pass 's_out' to 'cmdline_file_new()' or have a new version of
> >>> API that accepts 's_out' as parameter?
> >>>
> >>
> >> Yes, I thought about this, and actually started implementing a new API for
> >> cmdline library to that. However, I decided that, given the complexity
> >> here, that it's not really necessary - especially as there is no clear way
> >> to do things. The options are:
> >>
> >> * extend cmdline_file_new to have a flag to echo to stdout (which would be
> >>   the very common case here).
> >> * extend cmdline_file_new to take a file handle - this is more flexible,
> >>   but slightly less usable.
> >> * add a new cmdline_file_<something>_new function to echo to stdout.
> >> * add a new cmdline_file_<something>_new function to write to a filehandle.
> >>
> >> I don't like breaking the cmdline API (and ABI), so I didn't want to do
> >> either #1 or #2, which would be the cleanest solutions. For #3 and #4,
> >> naming is hard, and deciding between them is even harder. Given the choice,
> >> I prefer #3, as I can't see #4 being very common and we always have
> >> cmdline_new as a fallback anyway.
> >>
> >> Overall, though, I threw away that work, because it didn't seem worth it,
> >> for the sake of having the user to do an extra "open" call.
> >>
> > 
> 
> I vote to option #1, but agree that it may not worth breaking API and ABI.
> 
> Is 'cmdline_file_new_v2()' too bad a name, perhaps better to go with
> testpmd implementation, as you did in this patch.
> 

Let's see what others think. I'm fine to implement this as a cmdline lib
change or a testpmd-local change only, whatever the community prefers.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 2/4] power: refactor uncore power management library
  @ 2024-08-27 13:02  3%     ` lihuisong (C)
  0 siblings, 0 replies; 200+ results
From: lihuisong (C) @ 2024-08-27 13:02 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: dev, david.hunt, anatoly.burakov, radu.nicolau, jerinj,
	cristian.dumitrescu, konstantin.ananyev, ferruh.yigit, gakhil

Hi Sivaprasad,

Suggest to split this patch into two patches for easiler to review:
patch-1: abstract a file for uncore dvfs core level, namely, the 
rte_power_uncore_ops.c you did.
patch-2: move and rename, lib/power/power_intel_uncore.c => 
drivers/power/intel_uncore/intel_uncore.c

patch[1/4] is also too big and not good to review.

In addition, I have some question and am not sure if we can adjust 
uncore init process.

/Huisong


在 2024/8/26 21:06, Sivaprasad Tummala 写道:
> This patch refactors the power management library, addressing uncore
> power management. The primary changes involve the creation of dedicated
> directories for each driver within 'drivers/power/uncore/*'. The
> adjustment of meson.build files enables the selective activation
> of individual drivers.
>
> This refactor significantly improves code organization, enhances
> clarity and boosts maintainability. It lays the foundation for more
> focused development on individual drivers and facilitates seamless
> integration of future enhancements, particularly the AMD uncore driver.
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>   .../power/intel_uncore/intel_uncore.c         |  18 +-
>   .../power/intel_uncore/intel_uncore.h         |   8 +-
>   drivers/power/intel_uncore/meson.build        |   6 +
>   drivers/power/meson.build                     |   3 +-
>   lib/power/meson.build                         |   2 +-
>   lib/power/rte_power_uncore.c                  | 205 ++++++---------
>   lib/power/rte_power_uncore.h                  |  87 ++++---
>   lib/power/rte_power_uncore_ops.h              | 239 ++++++++++++++++++
>   lib/power/version.map                         |   1 +
>   9 files changed, 405 insertions(+), 164 deletions(-)
>   rename lib/power/power_intel_uncore.c => drivers/power/intel_uncore/intel_uncore.c (95%)
>   rename lib/power/power_intel_uncore.h => drivers/power/intel_uncore/intel_uncore.h (97%)
>   create mode 100644 drivers/power/intel_uncore/meson.build
>   create mode 100644 lib/power/rte_power_uncore_ops.h
>
> diff --git a/lib/power/power_intel_uncore.c b/drivers/power/intel_uncore/intel_uncore.c
> similarity index 95%
> rename from lib/power/power_intel_uncore.c
> rename to drivers/power/intel_uncore/intel_uncore.c
> index 4eb9c5900a..804ad5d755 100644
> --- a/lib/power/power_intel_uncore.c
> +++ b/drivers/power/intel_uncore/intel_uncore.c
> @@ -8,7 +8,7 @@
>   
>   #include <rte_memcpy.h>
>   
> -#include "power_intel_uncore.h"
> +#include "intel_uncore.h"
>   #include "power_common.h"
>   
>   #define MAX_NUMA_DIE 8
> @@ -475,3 +475,19 @@ power_intel_uncore_get_num_dies(unsigned int pkg)
>   
>   	return count;
>   }
<...>
>   
> -#endif /* POWER_INTEL_UNCORE_H */
> +#endif /* INTEL_UNCORE_H */
> diff --git a/drivers/power/intel_uncore/meson.build b/drivers/power/intel_uncore/meson.build
> new file mode 100644
> index 0000000000..876df8ad14
> --- /dev/null
> +++ b/drivers/power/intel_uncore/meson.build
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2017 Intel Corporation
> +# Copyright(c) 2024 Advanced Micro Devices, Inc.
> +
> +sources = files('intel_uncore.c')
> +deps += ['power']
> diff --git a/drivers/power/meson.build b/drivers/power/meson.build
> index 8c7215c639..c83047af94 100644
> --- a/drivers/power/meson.build
> +++ b/drivers/power/meson.build
> @@ -6,7 +6,8 @@ drivers = [
>           'amd_pstate',
>           'cppc',
>           'kvm_vm',
> -        'pstate'
> +        'pstate',
> +        'intel_uncore'
The cppc, amd_pstate and so on belong to cpufreq scope.
And intel_uncore belongs to uncore dvfs scope.
They are not the same level. So I proposes that we need to create one 
directory called like cpufreq or core.
This 'intel_uncore' name don't seems appropriate. what do you think the 
following directory structure:
drivers/power/uncore/intel_uncore.c
drivers/power/uncore/amd_uncore.c (according to the patch[4/4]).
>   ]
>   std_deps = ['power']
> diff --git a/lib/power/meson.build b/lib/power/meson.build
> index f3e3451cdc..9b13d98810 100644
> --- a/lib/power/meson.build
> +++ b/lib/power/meson.build
> @@ -13,7 +13,6 @@ if not is_linux
>   endif
>   sources = files(
>           'power_common.c',
> -        'power_intel_uncore.c',
>           'rte_power.c',
>           'rte_power_uncore.c',
>           'rte_power_pmd_mgmt.c',
> @@ -24,6 +23,7 @@ headers = files(
>           'rte_power_guest_channel.h',
>           'rte_power_pmd_mgmt.h',
>           'rte_power_uncore.h',
> +        'rte_power_uncore_ops.h',
>   )
>   if cc.has_argument('-Wno-cast-qual')
>       cflags += '-Wno-cast-qual'
> diff --git a/lib/power/rte_power_uncore.c b/lib/power/rte_power_uncore.c
> index 48c75a5da0..9f8771224f 100644
> --- a/lib/power/rte_power_uncore.c
> +++ b/lib/power/rte_power_uncore.c
> @@ -1,6 +1,7 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
>    * Copyright(c) 2023 AMD Corporation
> + * Copyright(c) 2024 Advanced Micro Devices, Inc.
>    */
>   
>   #include <errno.h>
> @@ -12,98 +13,50 @@
>   #include "rte_power_uncore.h"
>   #include "power_intel_uncore.h"
>   
> -enum rte_uncore_power_mgmt_env default_uncore_env = RTE_UNCORE_PM_ENV_NOT_SET;
> +static enum rte_uncore_power_mgmt_env global_uncore_env = RTE_UNCORE_PM_ENV_NOT_SET;
> +static struct rte_power_uncore_ops *global_uncore_ops;
>   
>   static rte_spinlock_t global_env_cfg_lock = RTE_SPINLOCK_INITIALIZER;
> +static RTE_TAILQ_HEAD(, rte_power_uncore_ops) uncore_ops_list =
> +			TAILQ_HEAD_INITIALIZER(uncore_ops_list);
>   
> -static uint32_t
> -power_get_dummy_uncore_freq(unsigned int pkg __rte_unused,
> -	       unsigned int die __rte_unused)
> -{
> -	return 0;
> -}
> -
> -static int
> -power_set_dummy_uncore_freq(unsigned int pkg __rte_unused,
> -	       unsigned int die __rte_unused, uint32_t index __rte_unused)
> -{
> -	return 0;
> -}
> +const char *uncore_env_str[] = {
> +	"not set",
> +	"auto-detect",
> +	"intel-uncore",
> +	"amd-hsmp"
> +};
Why open the "auto-detect" mode to user?
Why not set this automatically at framework initialization?
After all, the uncore driver is fixed for one platform.
>   
> -static int
> -power_dummy_uncore_freq_max(unsigned int pkg __rte_unused,
> -	       unsigned int die __rte_unused)
> -{
> -	return 0;
> -}
> -
<...>
> -static int
> -power_dummy_uncore_get_num_freqs(unsigned int pkg __rte_unused,
> -	       unsigned int die __rte_unused)
> +/* register the ops struct in rte_power_uncore_ops, return 0 on success. */
> +int
> +rte_power_register_uncore_ops(struct rte_power_uncore_ops *driver_ops)
>   {
> -	return 0;
> -}
> +	if (!driver_ops->init || !driver_ops->exit || !driver_ops->get_num_pkgs ||
> +		!driver_ops->get_num_dies || !driver_ops->get_num_freqs ||
> +		!driver_ops->get_avail_freqs || !driver_ops->get_freq ||
> +		!driver_ops->set_freq || !driver_ops->freq_max ||
> +		!driver_ops->freq_min) {
> +		POWER_LOG(ERR, "Missing callbacks while registering power ops");
> +		return -1;
> +	}
> +	if (driver_ops->cb)
> +		driver_ops->cb();
>   
> -static unsigned int
> -power_dummy_uncore_get_num_pkgs(void)
> -{
> -	return 0;
> -}
> +	TAILQ_INSERT_TAIL(&uncore_ops_list, driver_ops, next);
>   
> -static unsigned int
> -power_dummy_uncore_get_num_dies(unsigned int pkg __rte_unused)
> -{
>   	return 0;
>   }
> -
> -/* function pointers */
> -rte_power_get_uncore_freq_t rte_power_get_uncore_freq = power_get_dummy_uncore_freq;
> -rte_power_set_uncore_freq_t rte_power_set_uncore_freq = power_set_dummy_uncore_freq;
> -rte_power_uncore_freq_change_t rte_power_uncore_freq_max = power_dummy_uncore_freq_max;
> -rte_power_uncore_freq_change_t rte_power_uncore_freq_min = power_dummy_uncore_freq_min;
> -rte_power_uncore_freqs_t rte_power_uncore_freqs = power_dummy_uncore_freqs;
> -rte_power_uncore_get_num_freqs_t rte_power_uncore_get_num_freqs = power_dummy_uncore_get_num_freqs;
> -rte_power_uncore_get_num_pkgs_t rte_power_uncore_get_num_pkgs = power_dummy_uncore_get_num_pkgs;
> -rte_power_uncore_get_num_dies_t rte_power_uncore_get_num_dies = power_dummy_uncore_get_num_dies;
> -
> -static void
> -reset_power_uncore_function_ptrs(void)
> -{
> -	rte_power_get_uncore_freq = power_get_dummy_uncore_freq;
> -	rte_power_set_uncore_freq = power_set_dummy_uncore_freq;
> -	rte_power_uncore_freq_max = power_dummy_uncore_freq_max;
> -	rte_power_uncore_freq_min = power_dummy_uncore_freq_min;
> -	rte_power_uncore_freqs  = power_dummy_uncore_freqs;
> -	rte_power_uncore_get_num_freqs = power_dummy_uncore_get_num_freqs;
> -	rte_power_uncore_get_num_pkgs = power_dummy_uncore_get_num_pkgs;
> -	rte_power_uncore_get_num_dies = power_dummy_uncore_get_num_dies;
> -}
> -
>   int
>   rte_power_set_uncore_env(enum rte_uncore_power_mgmt_env env)
>   {
> -	int ret;
> +	int ret = -1;
> +	struct rte_power_uncore_ops *ops;
>   
>   	rte_spinlock_lock(&global_env_cfg_lock);
>   
> -	if (default_uncore_env != RTE_UNCORE_PM_ENV_NOT_SET) {
> +	if (global_uncore_env != RTE_UNCORE_PM_ENV_NOT_SET) {
>   		POWER_LOG(ERR, "Uncore Power Management Env already set.");
> -		rte_spinlock_unlock(&global_env_cfg_lock);
> -		return -1;
> +		goto out;
>   	}
>   
<...>
> +	if (env <= RTE_DIM(uncore_env_str)) {
> +		RTE_TAILQ_FOREACH(ops, &uncore_ops_list, next)
> +			if (strncmp(ops->name, uncore_env_str[env],
> +				RTE_POWER_UNCORE_DRIVER_NAMESZ) == 0) {
> +				global_uncore_env = env;
> +				global_uncore_ops = ops;
> +				ret = 0;
> +				goto out;
> +			}
> +		POWER_LOG(ERR, "Power Management (%s) not supported",
> +				uncore_env_str[env]);
> +	} else
> +		POWER_LOG(ERR, "Invalid Power Management Environment");
>   
> -	default_uncore_env = env;
>   out:
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   	return ret;
> @@ -139,15 +89,22 @@ void
>   rte_power_unset_uncore_env(void)
>   {
>   	rte_spinlock_lock(&global_env_cfg_lock);
> -	default_uncore_env = RTE_UNCORE_PM_ENV_NOT_SET;
> -	reset_power_uncore_function_ptrs();
> +	global_uncore_env = RTE_UNCORE_PM_ENV_NOT_SET;
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   }
>   

How about abstract an ABI interface to intialize or set the uncore 
driver on platform by automatical.

And later do power_intel_uncore_init_on_die() for each die on different 
package.

>   enum rte_uncore_power_mgmt_env
>   rte_power_get_uncore_env(void)
>   {
> -	return default_uncore_env;
> +	return global_uncore_env;
> +}
> +
> +struct rte_power_uncore_ops *
> +rte_power_get_uncore_ops(void)
> +{
> +	RTE_ASSERT(global_uncore_ops != NULL);
> +
> +	return global_uncore_ops;
>   }
>   
>   int
> @@ -155,27 +112,29 @@ rte_power_uncore_init(unsigned int pkg, unsigned int die)
This pkg means the socket id on the platform, right?
If so, I am not sure that the 
uncore_info[RTE_MAX_NUMA_NODES][MAX_NUMA_DIE] used in uncore lib is 
universal for all uncore driver.
For example, uncore driver just support do uncore dvfs based on the 
socket unit.
What shoud we do for this? we may need to think twice.
>   {
>   	int ret = -1;
>   
<...>

^ permalink raw reply	[relevance 3%]

* [PATCH v2 0/1] bbdev: removing unnecessaray symbols
@ 2024-08-27 23:03  3% Nicolas Chautru
  2024-08-27 23:03  3% ` [PATCH v2 1/1] bbdev: removing unnecessaray symbols from version map Nicolas Chautru
  0 siblings, 1 reply; 200+ results
From: Nicolas Chautru @ 2024-08-27 23:03 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

v2: Actually several functions can be removed from bbdev version map
since they are inline and hence ABI versionning is not relevant.
I checked with other lib (cryptodev/ethdev) and the same guideline
is followed, with inline functions not part of version.map. Similarly
the script as part of CICD do no enforce versionning for inline
functions either. Discussed a bitwith Maxime off list. 
Any thoughts? Good to clean it up now. 

v1: A few functions were somehow missing for the last few years
in the version map file.


Nicolas Chautru (1):
  bbdev: removing unnecessaray symbols from version map

 lib/bbdev/rte_bbdev.h    |  1 -
 lib/bbdev/rte_bbdev_op.h |  2 --
 lib/bbdev/version.map    | 24 +-----------------------
 3 files changed, 1 insertion(+), 26 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v2 1/1] bbdev: removing unnecessaray symbols from version map
  2024-08-27 23:03  3% [PATCH v2 0/1] bbdev: removing unnecessaray symbols Nicolas Chautru
@ 2024-08-27 23:03  3% ` Nicolas Chautru
  0 siblings, 0 replies; 200+ results
From: Nicolas Chautru @ 2024-08-27 23:03 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

A number of inline functions should not be in the
version map since ABI versionning would be irrelevant.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 lib/bbdev/rte_bbdev.h    |  1 -
 lib/bbdev/rte_bbdev_op.h |  2 --
 lib/bbdev/version.map    | 24 +-----------------------
 3 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 25514c58ac..bd49a0a304 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -897,7 +897,6 @@ rte_bbdev_dequeue_fft_ops(uint16_t dev_id, uint16_t queue_id,
  *   The number of operations actually dequeued (this is the number of entries
  *   copied into the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_mldts_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_mldts_op **ops, uint16_t num_ops)
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 459631d0d0..5b862c13a6 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -1159,7 +1159,6 @@ rte_bbdev_fft_op_alloc_bulk(struct rte_mempool *mempool,
  *   - 0 on success.
  *   - EINVAL if invalid mempool is provided.
  */
-__rte_experimental
 static inline int
 rte_bbdev_mldts_op_alloc_bulk(struct rte_mempool *mempool,
 		struct rte_bbdev_mldts_op **ops, uint16_t num_ops)
@@ -1236,7 +1235,6 @@ rte_bbdev_fft_op_free_bulk(struct rte_bbdev_fft_op **ops, unsigned int num_ops)
  * @param num_ops
  *   Number of structures
  */
-__rte_experimental
 static inline void
 rte_bbdev_mldts_op_free_bulk(struct rte_bbdev_mldts_op **ops, unsigned int num_ops)
 {
diff --git a/lib/bbdev/version.map b/lib/bbdev/version.map
index e0d82ff752..2a5baacd90 100644
--- a/lib/bbdev/version.map
+++ b/lib/bbdev/version.map
@@ -6,21 +6,9 @@ DPDK_25 {
 	rte_bbdev_callback_unregister;
 	rte_bbdev_close;
 	rte_bbdev_count;
-	rte_bbdev_dec_op_alloc_bulk;
-	rte_bbdev_dec_op_free_bulk;
-	rte_bbdev_dequeue_dec_ops;
-	rte_bbdev_dequeue_enc_ops;
-	rte_bbdev_dequeue_fft_ops;
 	rte_bbdev_device_status_str;
 	rte_bbdev_devices;
-	rte_bbdev_enc_op_alloc_bulk;
-	rte_bbdev_enc_op_free_bulk;
-	rte_bbdev_enqueue_dec_ops;
-	rte_bbdev_enqueue_enc_ops;
-	rte_bbdev_enqueue_fft_ops;
 	rte_bbdev_enqueue_status_str;
-	rte_bbdev_fft_op_alloc_bulk;
-	rte_bbdev_fft_op_free_bulk;
 	rte_bbdev_find_next;
 	rte_bbdev_get_named_dev;
 	rte_bbdev_info_get;
@@ -44,14 +32,4 @@ DPDK_25 {
 	rte_bbdev_stop;
 
 	local: *;
-};
-
-EXPERIMENTAL {
-	global:
-
-	# added in 23.11
-	rte_bbdev_dequeue_mldts_ops;
-	rte_bbdev_enqueue_mldts_ops;
-	rte_bbdev_mldts_op_alloc_bulk;
-	rte_bbdev_mldts_op_free_bulk;
-};
+};
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v3 1/1] bbdev: removing unnecessary symbols from version map
  2024-08-27 23:06  3% [PATCH v3 0/1] bbdev: removing unnecessaray symbols Nicolas Chautru
@ 2024-08-27 23:06  3% ` Nicolas Chautru
  0 siblings, 0 replies; 200+ results
From: Nicolas Chautru @ 2024-08-27 23:06 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

A number of inline functions should not be in the
version map since ABI versioning would be irrelevant.

Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
---
 lib/bbdev/rte_bbdev.h    |  1 -
 lib/bbdev/rte_bbdev_op.h |  2 --
 lib/bbdev/version.map    | 24 +-----------------------
 3 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 25514c58ac..bd49a0a304 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -897,7 +897,6 @@ rte_bbdev_dequeue_fft_ops(uint16_t dev_id, uint16_t queue_id,
  *   The number of operations actually dequeued (this is the number of entries
  *   copied into the @p ops array).
  */
-__rte_experimental
 static inline uint16_t
 rte_bbdev_dequeue_mldts_ops(uint16_t dev_id, uint16_t queue_id,
 		struct rte_bbdev_mldts_op **ops, uint16_t num_ops)
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 459631d0d0..5b862c13a6 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -1159,7 +1159,6 @@ rte_bbdev_fft_op_alloc_bulk(struct rte_mempool *mempool,
  *   - 0 on success.
  *   - EINVAL if invalid mempool is provided.
  */
-__rte_experimental
 static inline int
 rte_bbdev_mldts_op_alloc_bulk(struct rte_mempool *mempool,
 		struct rte_bbdev_mldts_op **ops, uint16_t num_ops)
@@ -1236,7 +1235,6 @@ rte_bbdev_fft_op_free_bulk(struct rte_bbdev_fft_op **ops, unsigned int num_ops)
  * @param num_ops
  *   Number of structures
  */
-__rte_experimental
 static inline void
 rte_bbdev_mldts_op_free_bulk(struct rte_bbdev_mldts_op **ops, unsigned int num_ops)
 {
diff --git a/lib/bbdev/version.map b/lib/bbdev/version.map
index e0d82ff752..2a5baacd90 100644
--- a/lib/bbdev/version.map
+++ b/lib/bbdev/version.map
@@ -6,21 +6,9 @@ DPDK_25 {
 	rte_bbdev_callback_unregister;
 	rte_bbdev_close;
 	rte_bbdev_count;
-	rte_bbdev_dec_op_alloc_bulk;
-	rte_bbdev_dec_op_free_bulk;
-	rte_bbdev_dequeue_dec_ops;
-	rte_bbdev_dequeue_enc_ops;
-	rte_bbdev_dequeue_fft_ops;
 	rte_bbdev_device_status_str;
 	rte_bbdev_devices;
-	rte_bbdev_enc_op_alloc_bulk;
-	rte_bbdev_enc_op_free_bulk;
-	rte_bbdev_enqueue_dec_ops;
-	rte_bbdev_enqueue_enc_ops;
-	rte_bbdev_enqueue_fft_ops;
 	rte_bbdev_enqueue_status_str;
-	rte_bbdev_fft_op_alloc_bulk;
-	rte_bbdev_fft_op_free_bulk;
 	rte_bbdev_find_next;
 	rte_bbdev_get_named_dev;
 	rte_bbdev_info_get;
@@ -44,14 +32,4 @@ DPDK_25 {
 	rte_bbdev_stop;
 
 	local: *;
-};
-
-EXPERIMENTAL {
-	global:
-
-	# added in 23.11
-	rte_bbdev_dequeue_mldts_ops;
-	rte_bbdev_enqueue_mldts_ops;
-	rte_bbdev_mldts_op_alloc_bulk;
-	rte_bbdev_mldts_op_free_bulk;
-};
+};
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v3 0/1] bbdev: removing unnecessaray symbols
@ 2024-08-27 23:06  3% Nicolas Chautru
  2024-08-27 23:06  3% ` [PATCH v3 1/1] bbdev: removing unnecessary symbols from version map Nicolas Chautru
  0 siblings, 1 reply; 200+ results
From: Nicolas Chautru @ 2024-08-27 23:06 UTC (permalink / raw)
  To: dev, maxime.coquelin
  Cc: hemant.agrawal, david.marchand, hernan.vargas, Nicolas Chautru

v3: typo fixes.

v2: Actually several functions can be removed from bbdev version map
since they are inline and hence ABI versionning is not relevant.
I checked with other lib (cryptodev/ethdev) and the same guideline is
followed, with inline functions not part of version.map. Similarly the
script as part of CICD do no enforce versionning for inline functions
either. Discussed a bitwith Maxime off list. 
Any thoughts? Good to clean it up now. 

v1: A few functions were somehow missing for the last few years in the
version map file.



Nicolas Chautru (1):
  bbdev: removing unnecessary symbols from version map

 lib/bbdev/rte_bbdev.h    |  1 -
 lib/bbdev/rte_bbdev_op.h |  2 --
 lib/bbdev/version.map    | 24 +-----------------------
 3 files changed, 1 insertion(+), 26 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [RFC 1/2] eal: add llc aware functions
  @ 2024-09-02  1:20  4%     ` Varghese, Vipin
  0 siblings, 0 replies; 200+ results
From: Varghese, Vipin @ 2024-09-02  1:20 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage, ferruh.yigit, dev; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

<Snipped>
>> -unsigned int rte_get_next_lcore(unsigned int i, int skip_main, int wrap)
>> +#define LCORE_GET_LLC   \
>> +             "ls -d /sys/bus/cpu/devices/cpu%u/cache/index[0-9] | sort  -r
>> | grep -m1 index[0-9] | awk -F '[x]' '{print $2}' "
>>
> This won't work for some SOCs.

Thank you for your response. please find our response and queries below

> How to ensure the index you got is for an LLC?

we referred to How CPU topology info is exported via sysfs — The Linux 
Kernel documentation 
<https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html> 
and linux/Documentation/ABI/stable/sysfs-devices-system-cpu at master · 
torvalds/linux (github.com) 
<https://github.com/torvalds/linux/blob/master/Documentation/ABI/stable/sysfs-devices-system-cpu> 
and

Get Cache Info in Linux on ARMv8 64-bit Platform (zhiyisun.github.io) 
<https://zhiyisun.github.io/2016/06/25/Get-Cache-Info-in-Linux-on-ARMv8-64-bit-Platform.html>. 
Based on my current understanding on bare metal 64Bit Linux OS (which is 
supported by most Distros), the cache topology are populated into sysfs.

>   Some SOCs may only show upper-level caches here, therefore cannot be use blindly without knowing the SOC.
Can you please help us understand

1. if there are specific SoC which do not populate the information at 
all? If yes are they in DTS?

2. If there are specific SoC which does not export to hypervisor like 
Qemu or Xen?


We can work together to make it compatible.

> Also, unacceptable to execute a shell script, consider implementing in C.
As the intention of the RFC is to share possible API and Macro, we 
welcome suggestions on the implementation as agreed with Stepehen.
>
> --wathsala
>
>

[-- Attachment #2: Type: text/html, Size: 3373 bytes --]

^ permalink raw reply	[relevance 4%]

* [PATCH 2/4] net/nfp: refactor the firmware version logic
  @ 2024-09-03  1:41  2% ` Chaoyong He
  2024-09-03  1:41  2% ` [PATCH 3/4] net/nfp: support different configuration BAR size Chaoyong He
  1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2024-09-03  1:41 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, Chaoyong He, Long Wu

Move the 'ver' data field from 'struct nfp_net_hw' into
'struct nfp_pf_dev', also modify the related logic.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Long Wu <long.wu@corigine.com>
---
 drivers/net/nfp/flower/nfp_flower.c      | 30 ++++++-------
 drivers/net/nfp/flower/nfp_flower_ctrl.c |  8 ++--
 drivers/net/nfp/flower/nfp_flower_ctrl.h |  2 +-
 drivers/net/nfp/nfd3/nfp_nfd3_dp.c       |  2 +-
 drivers/net/nfp/nfdk/nfp_nfdk_dp.c       |  2 +-
 drivers/net/nfp/nfp_ethdev.c             | 29 +++++++-----
 drivers/net/nfp/nfp_ethdev_vf.c          | 30 ++++++++++---
 drivers/net/nfp/nfp_net_common.c         | 56 ++++++++++++++----------
 drivers/net/nfp/nfp_net_common.h         | 16 ++++---
 drivers/net/nfp/nfp_net_meta.c           |  5 ++-
 drivers/net/nfp/nfp_net_meta.h           |  5 ++-
 drivers/net/nfp/nfp_rxtx.c               | 10 ++---
 12 files changed, 118 insertions(+), 77 deletions(-)

diff --git a/drivers/net/nfp/flower/nfp_flower.c b/drivers/net/nfp/flower/nfp_flower.c
index 8d781658ea..4d91d548f7 100644
--- a/drivers/net/nfp/flower/nfp_flower.c
+++ b/drivers/net/nfp/flower/nfp_flower.c
@@ -166,15 +166,15 @@ nfp_flower_pf_nfdk_xmit_pkts(void *tx_queue,
 }
 
 static void
-nfp_flower_pf_xmit_pkts_register(struct nfp_app_fw_flower *app_fw_flower)
+nfp_flower_pf_xmit_pkts_register(struct nfp_pf_dev *pf_dev)
 {
-	struct nfp_net_hw *hw;
 	struct nfp_flower_nfd_func *nfd_func;
+	struct nfp_app_fw_flower *app_fw_flower;
 
-	hw = app_fw_flower->pf_hw;
+	app_fw_flower = pf_dev->app_fw_priv;
 	nfd_func = &app_fw_flower->nfd_func;
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		nfd_func->pf_xmit_t = nfp_flower_pf_nfd3_xmit_pkts;
 	else
 		nfd_func->pf_xmit_t = nfp_flower_pf_nfdk_xmit_pkts;
@@ -204,14 +204,12 @@ nfp_flower_init_vnic_common(struct nfp_net_hw_priv *hw_priv,
 	uint64_t rx_bar_off;
 	uint64_t tx_bar_off;
 	struct nfp_pf_dev *pf_dev;
-	struct rte_pci_device *pci_dev;
 
 	pf_dev = hw_priv->pf_dev;
-	pci_dev = pf_dev->pci_dev;
 
 	PMD_INIT_LOG(DEBUG, "%s vNIC ctrl bar: %p", vnic_type, hw->super.ctrl_bar);
 
-	err = nfp_net_common_init(pci_dev, hw);
+	err = nfp_net_common_init(pf_dev, hw);
 	if (err != 0)
 		return err;
 
@@ -612,15 +610,15 @@ nfp_flower_start_ctrl_vnic(struct nfp_app_fw_flower *app_fw_flower)
 }
 
 static void
-nfp_flower_pkt_add_metadata_register(struct nfp_app_fw_flower *app_fw_flower)
+nfp_flower_pkt_add_metadata_register(struct nfp_pf_dev *pf_dev)
 {
-	struct nfp_net_hw *hw;
 	struct nfp_flower_nfd_func *nfd_func;
+	struct nfp_app_fw_flower *app_fw_flower;
 
-	hw = app_fw_flower->pf_hw;
+	app_fw_flower = pf_dev->app_fw_priv;
 	nfd_func = &app_fw_flower->nfd_func;
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		nfd_func->pkt_add_metadata_t = nfp_flower_nfd3_pkt_add_metadata;
 	else
 		nfd_func->pkt_add_metadata_t = nfp_flower_nfdk_pkt_add_metadata;
@@ -635,11 +633,11 @@ nfp_flower_pkt_add_metadata(struct nfp_app_fw_flower *app_fw_flower,
 }
 
 static void
-nfp_flower_nfd_func_register(struct nfp_app_fw_flower *app_fw_flower)
+nfp_flower_nfd_func_register(struct nfp_pf_dev *pf_dev)
 {
-	nfp_flower_pkt_add_metadata_register(app_fw_flower);
-	nfp_flower_ctrl_vnic_xmit_register(app_fw_flower);
-	nfp_flower_pf_xmit_pkts_register(app_fw_flower);
+	nfp_flower_pkt_add_metadata_register(pf_dev);
+	nfp_flower_ctrl_vnic_xmit_register(pf_dev);
+	nfp_flower_pf_xmit_pkts_register(pf_dev);
 }
 
 int
@@ -730,7 +728,7 @@ nfp_init_app_fw_flower(struct nfp_net_hw_priv *hw_priv)
 		goto pf_cpp_area_cleanup;
 	}
 
-	nfp_flower_nfd_func_register(app_fw_flower);
+	nfp_flower_nfd_func_register(pf_dev);
 
 	/* The ctrl vNIC struct comes directly after the PF one */
 	app_fw_flower->ctrl_hw = pf_hw + 1;
diff --git a/drivers/net/nfp/flower/nfp_flower_ctrl.c b/drivers/net/nfp/flower/nfp_flower_ctrl.c
index a46b849d1b..9b957e1f1e 100644
--- a/drivers/net/nfp/flower/nfp_flower_ctrl.c
+++ b/drivers/net/nfp/flower/nfp_flower_ctrl.c
@@ -343,15 +343,15 @@ nfp_flower_ctrl_vnic_nfdk_xmit(struct nfp_app_fw_flower *app_fw_flower,
 }
 
 void
-nfp_flower_ctrl_vnic_xmit_register(struct nfp_app_fw_flower *app_fw_flower)
+nfp_flower_ctrl_vnic_xmit_register(struct nfp_pf_dev *pf_dev)
 {
-	struct nfp_net_hw *hw;
 	struct nfp_flower_nfd_func *nfd_func;
+	struct nfp_app_fw_flower *app_fw_flower;
 
-	hw = app_fw_flower->pf_hw;
+	app_fw_flower = pf_dev->app_fw_priv;
 	nfd_func = &app_fw_flower->nfd_func;
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		nfd_func->ctrl_vnic_xmit_t = nfp_flower_ctrl_vnic_nfd3_xmit;
 	else
 		nfd_func->ctrl_vnic_xmit_t = nfp_flower_ctrl_vnic_nfdk_xmit;
diff --git a/drivers/net/nfp/flower/nfp_flower_ctrl.h b/drivers/net/nfp/flower/nfp_flower_ctrl.h
index b5d0036c01..f18c8f4095 100644
--- a/drivers/net/nfp/flower/nfp_flower_ctrl.h
+++ b/drivers/net/nfp/flower/nfp_flower_ctrl.h
@@ -11,6 +11,6 @@
 void nfp_flower_ctrl_vnic_process(struct nfp_net_hw_priv *hw_priv);
 uint16_t nfp_flower_ctrl_vnic_xmit(struct nfp_app_fw_flower *app_fw_flower,
 		struct rte_mbuf *mbuf);
-void nfp_flower_ctrl_vnic_xmit_register(struct nfp_app_fw_flower *app_fw_flower);
+void nfp_flower_ctrl_vnic_xmit_register(struct nfp_pf_dev *pf_dev);
 
 #endif /* __NFP_FLOWER_CTRL_H__ */
diff --git a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
index ee96cd8e46..4ff1ae63b0 100644
--- a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
+++ b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
@@ -390,7 +390,7 @@ nfp_net_nfd3_tx_queue_setup(struct rte_eth_dev *dev,
 	hw = nfp_net_get_hw(dev);
 	hw_priv = dev->process_private;
 
-	nfp_net_tx_desc_limits(hw, hw_priv, &min_tx_desc, &max_tx_desc);
+	nfp_net_tx_desc_limits(hw_priv, &min_tx_desc, &max_tx_desc);
 
 	/* Validating number of descriptors */
 	tx_desc_sz = nb_desc * sizeof(struct nfp_net_nfd3_tx_desc);
diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
index 2cea5688b3..68fcbe93da 100644
--- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
+++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
@@ -424,7 +424,7 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev,
 	hw = nfp_net_get_hw(dev);
 	hw_priv = dev->process_private;
 
-	nfp_net_tx_desc_limits(hw, hw_priv, &min_tx_desc, &max_tx_desc);
+	nfp_net_tx_desc_limits(hw_priv, &min_tx_desc, &max_tx_desc);
 
 	/* Validating number of descriptors */
 	tx_desc_sz = nb_desc * sizeof(struct nfp_net_nfdk_tx_desc);
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index 181fd74efe..d85993f70c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -959,10 +959,10 @@ static const struct eth_dev_ops nfp_net_eth_dev_ops = {
 };
 
 static inline void
-nfp_net_ethdev_ops_mount(struct nfp_net_hw *hw,
+nfp_net_ethdev_ops_mount(struct nfp_pf_dev *pf_dev,
 		struct rte_eth_dev *eth_dev)
 {
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts;
 	else
 		nfp_net_nfdk_xmit_pkts_set(eth_dev);
@@ -1030,7 +1030,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev,
 	PMD_INIT_LOG(DEBUG, "ctrl bar: %p", hw->ctrl_bar);
 	PMD_INIT_LOG(DEBUG, "MAC stats: %p", net_hw->mac_stats);
 
-	err = nfp_net_common_init(pci_dev, net_hw);
+	err = nfp_net_common_init(pf_dev, net_hw);
 	if (err != 0)
 		return err;
 
@@ -1046,7 +1046,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev,
 		return err;
 	}
 
-	nfp_net_ethdev_ops_mount(net_hw, eth_dev);
+	nfp_net_ethdev_ops_mount(pf_dev, eth_dev);
 
 	net_hw->eth_xstats_base = rte_malloc("rte_eth_xstat", sizeof(struct rte_eth_xstat) *
 			nfp_net_xstats_size(eth_dev), 0);
@@ -1074,7 +1074,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev,
 	if ((hw->cap & NFP_NET_CFG_CTRL_LSO2) != 0)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
-	nfp_net_log_device_information(net_hw);
+	nfp_net_log_device_information(net_hw, pf_dev);
 
 	/* Initializing spinlock for reconfigs */
 	rte_spinlock_init(&hw->reconfig_lock);
@@ -1552,9 +1552,6 @@ nfp_enable_multi_pf(struct nfp_pf_dev *pf_dev)
 	struct nfp_cpp_area *area;
 	char name[RTE_ETH_NAME_MAX_LEN];
 
-	if (!pf_dev->multi_pf.enabled)
-		return 0;
-
 	memset(&net_hw, 0, sizeof(struct nfp_net_hw));
 
 	/* Map the symbol table */
@@ -1570,6 +1567,16 @@ nfp_enable_multi_pf(struct nfp_pf_dev *pf_dev)
 	hw = &net_hw.super;
 	hw->ctrl_bar = ctrl_bar;
 
+	/* Check the version from firmware */
+	if (!nfp_net_version_check(hw, pf_dev)) {
+		PMD_INIT_LOG(ERR, "Not the valid version.");
+		err = -EINVAL;
+		goto end;
+	}
+
+	if (!pf_dev->multi_pf.enabled)
+		goto end;
+
 	cap_extend = nn_cfg_readl(hw, NFP_NET_CFG_CAP_WORD1);
 	if ((cap_extend & NFP_NET_CFG_CTRL_MULTI_PF) == 0) {
 		PMD_INIT_LOG(ERR, "Loaded firmware doesn't support multiple PF");
@@ -2358,10 +2365,10 @@ static int
 nfp_secondary_net_init(struct rte_eth_dev *eth_dev,
 		void *para)
 {
-	struct nfp_net_hw *net_hw;
+	struct nfp_net_hw_priv *hw_priv;
 
-	net_hw = eth_dev->data->dev_private;
-	nfp_net_ethdev_ops_mount(net_hw, eth_dev);
+	hw_priv = para;
+	nfp_net_ethdev_ops_mount(hw_priv->pf_dev, eth_dev);
 
 	eth_dev->process_private = para;
 
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index cdf5da3af7..2e581c7e45 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -235,10 +235,10 @@ static const struct eth_dev_ops nfp_netvf_eth_dev_ops = {
 };
 
 static inline void
-nfp_netvf_ethdev_ops_mount(struct nfp_net_hw *hw,
+nfp_netvf_ethdev_ops_mount(struct nfp_pf_dev *pf_dev,
 		struct rte_eth_dev *eth_dev)
 {
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts;
 	else
 		nfp_net_nfdk_xmit_pkts_set(eth_dev);
@@ -256,6 +256,7 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	uint32_t start_q;
 	struct nfp_hw *hw;
 	struct nfp_net_hw *net_hw;
+	struct nfp_pf_dev *pf_dev;
 	uint64_t tx_bar_off = 0;
 	uint64_t rx_bar_off = 0;
 	struct rte_pci_device *pci_dev;
@@ -280,13 +281,27 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 		return -ENODEV;
 	}
 
+	pf_dev = rte_zmalloc(NULL, sizeof(*pf_dev), 0);
+	if (pf_dev == NULL) {
+		PMD_INIT_LOG(ERR, "Can not allocate memory for the PF device.");
+		return -ENOMEM;
+	}
+
+	pf_dev->pci_dev = pci_dev;
+
+	/* Check the version from firmware */
+	if (!nfp_net_version_check(hw, pf_dev)) {
+		err = -EINVAL;
+		goto pf_dev_free;
+	}
+
 	PMD_INIT_LOG(DEBUG, "ctrl bar: %p", hw->ctrl_bar);
 
-	err = nfp_net_common_init(pci_dev, net_hw);
+	err = nfp_net_common_init(pf_dev, net_hw);
 	if (err != 0)
-		return err;
+		goto pf_dev_free;
 
-	nfp_netvf_ethdev_ops_mount(net_hw, eth_dev);
+	nfp_netvf_ethdev_ops_mount(pf_dev, eth_dev);
 
 	hw_priv = rte_zmalloc(NULL, sizeof(*hw_priv), 0);
 	if (hw_priv == NULL) {
@@ -296,6 +311,7 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	}
 
 	hw_priv->dev_info = dev_info;
+	hw_priv->pf_dev = pf_dev;
 
 	eth_dev->process_private = hw_priv;
 
@@ -330,7 +346,7 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if ((hw->cap & NFP_NET_CFG_CTRL_LSO2) != 0)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
-	nfp_net_log_device_information(net_hw);
+	nfp_net_log_device_information(net_hw, pf_dev);
 
 	/* Initializing spinlock for reconfigs */
 	rte_spinlock_init(&hw->reconfig_lock);
@@ -381,6 +397,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	rte_free(net_hw->eth_xstats_base);
 hw_priv_free:
 	rte_free(hw_priv);
+pf_dev_free:
+	rte_free(pf_dev);
 
 	return err;
 }
diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index b471fd032a..e4e01d8c79 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -349,13 +349,14 @@ nfp_net_configure(struct rte_eth_dev *dev)
 }
 
 void
-nfp_net_log_device_information(const struct nfp_net_hw *hw)
+nfp_net_log_device_information(const struct nfp_net_hw *hw,
+		struct nfp_pf_dev *pf_dev)
 {
 	uint32_t cap = hw->super.cap;
 	uint32_t cap_ext = hw->super.cap_ext;
 
 	PMD_INIT_LOG(INFO, "VER: %u.%u, Maximum supported MTU: %d",
-			hw->ver.major, hw->ver.minor, hw->max_mtu);
+			pf_dev->ver.major, pf_dev->ver.minor, hw->max_mtu);
 
 	PMD_INIT_LOG(INFO, "CAP: %#x", cap);
 	PMD_INIT_LOG(INFO, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
@@ -1235,14 +1236,13 @@ nfp_net_rx_desc_limits(struct nfp_net_hw_priv *hw_priv,
 }
 
 void
-nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
-		struct nfp_net_hw_priv *hw_priv,
+nfp_net_tx_desc_limits(struct nfp_net_hw_priv *hw_priv,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc)
 {
 	uint16_t tx_dpp;
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (hw_priv->pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		tx_dpp = NFD3_TX_DESC_PER_PKT;
 	else
 		tx_dpp = NFDK_TX_DESC_PER_SIMPLE_PKT;
@@ -1269,7 +1269,7 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		return -EINVAL;
 
 	nfp_net_rx_desc_limits(hw_priv, &min_rx_desc, &max_rx_desc);
-	nfp_net_tx_desc_limits(hw, hw_priv, &min_tx_desc, &max_tx_desc);
+	nfp_net_tx_desc_limits(hw_priv, &min_tx_desc, &max_tx_desc);
 
 	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
 	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
@@ -1373,11 +1373,13 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 }
 
 int
-nfp_net_common_init(struct rte_pci_device *pci_dev,
+nfp_net_common_init(struct nfp_pf_dev *pf_dev,
 		struct nfp_net_hw *hw)
 {
 	const int stride = 4;
+	struct rte_pci_device *pci_dev;
 
+	pci_dev = pf_dev->pci_dev;
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
 	hw->subsystem_device_id = pci_dev->id.subsystem_device_id;
@@ -1391,11 +1393,7 @@ nfp_net_common_init(struct rte_pci_device *pci_dev,
 		return -ENODEV;
 	}
 
-	nfp_net_cfg_read_version(hw);
-	if (!nfp_net_is_valid_nfd_version(hw->ver))
-		return -EINVAL;
-
-	if (nfp_net_check_dma_mask(hw, pci_dev->name) != 0)
+	if (nfp_net_check_dma_mask(pf_dev, pci_dev->name) != 0)
 		return -ENODEV;
 
 	/* Get some of the read-only fields from the config BAR */
@@ -1404,10 +1402,10 @@ nfp_net_common_init(struct rte_pci_device *pci_dev,
 	hw->max_mtu = nn_cfg_readl(&hw->super, NFP_NET_CFG_MAX_MTU);
 	hw->flbufsz = DEFAULT_FLBUF_SIZE;
 
-	nfp_net_meta_init_format(hw);
+	nfp_net_meta_init_format(hw, pf_dev);
 
 	/* Read the Rx offset configured from firmware */
-	if (hw->ver.major < 2)
+	if (pf_dev->ver.major < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
 		hw->rx_offset = nn_cfg_readl(&hw->super, NFP_NET_CFG_RX_OFFSET);
@@ -2118,10 +2116,10 @@ nfp_net_set_vxlan_port(struct nfp_net_hw *net_hw,
  * than 40 bits.
  */
 int
-nfp_net_check_dma_mask(struct nfp_net_hw *hw,
+nfp_net_check_dma_mask(struct nfp_pf_dev *pf_dev,
 		char *name)
 {
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3 &&
+	if (pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3 &&
 			rte_mem_check_dma_mask(40) != 0) {
 		PMD_DRV_LOG(ERR, "Device %s can't be used: restricted dma mask to 40 bits!",
 				name);
@@ -2165,16 +2163,28 @@ nfp_net_txrwb_free(struct rte_eth_dev *eth_dev)
 	net_hw->txrwb_mz = NULL;
 }
 
-void
-nfp_net_cfg_read_version(struct nfp_net_hw *hw)
+static void
+nfp_net_cfg_read_version(struct nfp_hw *hw,
+		struct nfp_pf_dev *pf_dev)
 {
 	union {
 		uint32_t whole;
 		struct nfp_net_fw_ver split;
 	} version;
 
-	version.whole = nn_cfg_readl(&hw->super, NFP_NET_CFG_VERSION);
-	hw->ver = version.split;
+	version.whole = nn_cfg_readl(hw, NFP_NET_CFG_VERSION);
+	pf_dev->ver = version.split;
+}
+
+bool
+nfp_net_version_check(struct nfp_hw *hw,
+		struct nfp_pf_dev *pf_dev)
+{
+	nfp_net_cfg_read_version(hw, pf_dev);
+	if (!nfp_net_is_valid_nfd_version(pf_dev->ver))
+		return false;
+
+	return true;
 }
 
 static void
@@ -2249,6 +2259,7 @@ nfp_net_firmware_version_get(struct rte_eth_dev *dev,
 		size_t fw_size)
 {
 	struct nfp_net_hw *hw;
+	struct nfp_pf_dev *pf_dev;
 	struct nfp_net_hw_priv *hw_priv;
 	char app_name[FW_VER_LEN] = {0};
 	char mip_name[FW_VER_LEN] = {0};
@@ -2260,6 +2271,7 @@ nfp_net_firmware_version_get(struct rte_eth_dev *dev,
 
 	hw = nfp_net_get_hw(dev);
 	hw_priv = dev->process_private;
+	pf_dev = hw_priv->pf_dev;
 
 	if (hw->fw_version[0] != 0) {
 		snprintf(fw_version, FW_VER_LEN, "%s", hw->fw_version);
@@ -2268,8 +2280,8 @@ nfp_net_firmware_version_get(struct rte_eth_dev *dev,
 
 	if (!rte_eth_dev_is_repr(dev)) {
 		snprintf(vnic_version, FW_VER_LEN, "%d.%d.%d.%d",
-			hw->ver.extend, hw->ver.class,
-			hw->ver.major, hw->ver.minor);
+			pf_dev->ver.extend, pf_dev->ver.class,
+			pf_dev->ver.major, pf_dev->ver.minor);
 	} else {
 		snprintf(vnic_version, FW_VER_LEN, "*");
 	}
diff --git a/drivers/net/nfp/nfp_net_common.h b/drivers/net/nfp/nfp_net_common.h
index 67ec5a2d89..8d0922d48c 100644
--- a/drivers/net/nfp/nfp_net_common.h
+++ b/drivers/net/nfp/nfp_net_common.h
@@ -108,6 +108,8 @@ struct nfp_pf_dev {
 
 	enum nfp_app_fw_id app_fw_id;
 
+	struct nfp_net_fw_ver ver;
+
 	/** Pointer to the app running on the PF */
 	void *app_fw_priv;
 
@@ -219,7 +221,6 @@ struct nfp_net_hw {
 	const struct rte_memzone *txrwb_mz;
 
 	/** Info from the firmware */
-	struct nfp_net_fw_ver ver;
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
@@ -276,8 +277,9 @@ nfp_qcp_queue_offset(const struct nfp_dev_info *dev_info,
 /* Prototypes for common NFP functions */
 int nfp_net_mbox_reconfig(struct nfp_net_hw *hw, uint32_t mbox_cmd);
 int nfp_net_configure(struct rte_eth_dev *dev);
-int nfp_net_common_init(struct rte_pci_device *pci_dev, struct nfp_net_hw *hw);
-void nfp_net_log_device_information(const struct nfp_net_hw *hw);
+int nfp_net_common_init(struct nfp_pf_dev *pf_dev, struct nfp_net_hw *hw);
+void nfp_net_log_device_information(const struct nfp_net_hw *hw,
+		struct nfp_pf_dev *pf_dev);
 void nfp_net_enable_queues(struct rte_eth_dev *dev);
 void nfp_net_disable_queues(struct rte_eth_dev *dev);
 void nfp_net_params_setup(struct nfp_net_hw *hw);
@@ -345,12 +347,10 @@ int nfp_net_set_vxlan_port(struct nfp_net_hw *hw, size_t idx, uint16_t port);
 void nfp_net_rx_desc_limits(struct nfp_net_hw_priv *hw_priv,
 		uint16_t *min_rx_desc,
 		uint16_t *max_rx_desc);
-void nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
-		struct nfp_net_hw_priv *hw_priv,
+void nfp_net_tx_desc_limits(struct nfp_net_hw_priv *hw_priv,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
-int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
-void nfp_net_cfg_read_version(struct nfp_net_hw *hw);
+int nfp_net_check_dma_mask(struct nfp_pf_dev *pf_dev, char *name);
 int nfp_net_firmware_version_get(struct rte_eth_dev *dev, char *fw_version, size_t fw_size);
 bool nfp_net_is_valid_nfd_version(struct nfp_net_fw_ver version);
 struct nfp_net_hw *nfp_net_get_hw(const struct rte_eth_dev *dev);
@@ -377,6 +377,8 @@ uint8_t nfp_function_id_get(const struct nfp_pf_dev *pf_dev,
 		uint8_t port_id);
 int nfp_net_vf_config_app_init(struct nfp_net_hw *net_hw,
 		struct nfp_pf_dev *pf_dev);
+bool nfp_net_version_check(struct nfp_hw *hw,
+		struct nfp_pf_dev *pf_dev);
 
 #define NFP_PRIV_TO_APP_FW_NIC(app_fw_priv)\
 	((struct nfp_app_fw_nic *)app_fw_priv)
diff --git a/drivers/net/nfp/nfp_net_meta.c b/drivers/net/nfp/nfp_net_meta.c
index 07c6758d33..5a67f87bee 100644
--- a/drivers/net/nfp/nfp_net_meta.c
+++ b/drivers/net/nfp/nfp_net_meta.c
@@ -269,14 +269,15 @@ nfp_net_meta_parse(struct nfp_net_rx_desc *rxds,
 }
 
 void
-nfp_net_meta_init_format(struct nfp_net_hw *hw)
+nfp_net_meta_init_format(struct nfp_net_hw *hw,
+		struct nfp_pf_dev *pf_dev)
 {
 	/*
 	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
 	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
 	 * also indicate that we are using chained metadata.
 	 */
-	if (hw->ver.major == 4) {
+	if (pf_dev->ver.major == 4) {
 		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
 	} else if ((hw->super.cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
 		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
diff --git a/drivers/net/nfp/nfp_net_meta.h b/drivers/net/nfp/nfp_net_meta.h
index 69d08cf3a7..c3d84dff60 100644
--- a/drivers/net/nfp/nfp_net_meta.h
+++ b/drivers/net/nfp/nfp_net_meta.h
@@ -89,7 +89,10 @@ struct nfp_net_meta_parsed {
 	} vlan[NFP_NET_META_MAX_VLANS];
 };
 
-void nfp_net_meta_init_format(struct nfp_net_hw *hw);
+struct nfp_pf_dev;
+
+void nfp_net_meta_init_format(struct nfp_net_hw *hw,
+		struct nfp_pf_dev *pf_dev);
 void nfp_net_meta_parse(struct nfp_net_rx_desc *rxds,
 		struct nfp_net_rxq *rxq,
 		struct nfp_net_hw *hw,
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 05218537f7..d101477161 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -816,11 +816,11 @@ nfp_net_tx_queue_setup(struct rte_eth_dev *dev,
 		unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf)
 {
-	struct nfp_net_hw *hw;
+	struct nfp_net_hw_priv *hw_priv;
 
-	hw = nfp_net_get_hw(dev);
+	hw_priv = dev->process_private;
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (hw_priv->pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		return nfp_net_nfd3_tx_queue_setup(dev, queue_idx,
 				nb_desc, socket_id, tx_conf);
 	else
@@ -852,10 +852,10 @@ nfp_net_tx_queue_info_get(struct rte_eth_dev *dev,
 		struct rte_eth_txq_info *info)
 {
 	struct rte_eth_dev_info dev_info;
-	struct nfp_net_hw *hw = nfp_net_get_hw(dev);
+	struct nfp_net_hw_priv *hw_priv = dev->process_private;
 	struct nfp_net_txq *txq = dev->data->tx_queues[queue_id];
 
-	if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
+	if (hw_priv->pf_dev->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3)
 		info->nb_desc = txq->tx_count / NFD3_TX_DESC_PER_PKT;
 	else
 		info->nb_desc = txq->tx_count / NFDK_TX_DESC_PER_SIMPLE_PKT;
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH 3/4] net/nfp: support different configuration BAR size
    2024-09-03  1:41  2% ` [PATCH 2/4] net/nfp: refactor the firmware version logic Chaoyong He
@ 2024-09-03  1:41  2% ` Chaoyong He
  1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2024-09-03  1:41 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, Chaoyong He, Long Wu

A new NIC is introduced with different configuration
BAR size other than 32K, and this is distinguished by
the application firmware's class version ABI.

Signed-off-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Long Wu <long.wu@corigine.com>
---
 drivers/common/nfp/nfp_common_ctrl.h | 10 +++++++++-
 drivers/net/nfp/flower/nfp_flower.c  |  4 ++--
 drivers/net/nfp/nfp_ethdev.c         | 24 ++++++++++++++----------
 drivers/net/nfp/nfp_ethdev_vf.c      |  3 +++
 drivers/net/nfp/nfp_net_common.c     | 25 +++++++++++++++++++++++++
 drivers/net/nfp/nfp_net_common.h     |  3 +++
 drivers/net/nfp/nfp_net_ctrl.c       |  4 +++-
 7 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/common/nfp/nfp_common_ctrl.h b/drivers/common/nfp/nfp_common_ctrl.h
index 69596dd6f5..711577dc76 100644
--- a/drivers/common/nfp/nfp_common_ctrl.h
+++ b/drivers/common/nfp/nfp_common_ctrl.h
@@ -11,7 +11,9 @@
  *
  * On the NFP6000, due to THB-350, the configuration BAR is 32K in size.
  */
-#define NFP_NET_CFG_BAR_SZ              (32 * 1024)
+#define NFP_NET_CFG_BAR_SZ_32K          (32 * 1024)
+#define NFP_NET_CFG_BAR_SZ_8K           (8 * 1024)
+#define NFP_NET_CFG_BAR_SZ_MIN          NFP_NET_CFG_BAR_SZ_8K
 
 /*
  * Configuration sriov VF.
@@ -121,6 +123,10 @@
 struct nfp_net_fw_ver {
 	uint8_t minor;
 	uint8_t major;
+	/**
+	 * BIT0: class, refer NFP_NET_CFG_VERSION_CLASS_*
+	 * BIT[7:1]: reserved
+	 */
 	uint8_t class;
 	/**
 	 * This byte can be extended for more use.
@@ -147,6 +153,8 @@ struct nfp_net_fw_ver {
 #define NFP_NET_CFG_VERSION             0x0030
 #define   NFP_NET_CFG_VERSION_DP_NFD3   0
 #define   NFP_NET_CFG_VERSION_DP_NFDK   1
+#define   NFP_NET_CFG_VERSION_CLASS_GENERIC    0
+#define   NFP_NET_CFG_VERSION_CLASS_NO_EMEM    1
 #define NFP_NET_CFG_STS                 0x0034
 #define   NFP_NET_CFG_STS_LINK            (0x1 << 0) /* Link up or down */
 /* Link rate */
diff --git a/drivers/net/nfp/flower/nfp_flower.c b/drivers/net/nfp/flower/nfp_flower.c
index 4d91d548f7..c1a3532c11 100644
--- a/drivers/net/nfp/flower/nfp_flower.c
+++ b/drivers/net/nfp/flower/nfp_flower.c
@@ -692,7 +692,7 @@ nfp_init_app_fw_flower(struct nfp_net_hw_priv *hw_priv)
 	/* Map the PF ctrl bar */
 	snprintf(bar_name, sizeof(bar_name), "_pf%u_net_bar0", id);
 	pf_dev->ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, bar_name,
-			NFP_NET_CFG_BAR_SZ, &pf_dev->ctrl_area);
+			pf_dev->ctrl_bar_size, &pf_dev->ctrl_area);
 	if (pf_dev->ctrl_bar == NULL) {
 		PMD_INIT_LOG(ERR, "Cloud not map the PF vNIC ctrl bar");
 		ret = -ENODEV;
@@ -737,7 +737,7 @@ nfp_init_app_fw_flower(struct nfp_net_hw_priv *hw_priv)
 	/* Map the ctrl vNIC ctrl bar */
 	snprintf(ctrl_name, sizeof(ctrl_name), "_pf%u_net_ctrl_bar", id);
 	ctrl_hw->super.ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, ctrl_name,
-			NFP_NET_CFG_BAR_SZ, &ctrl_hw->ctrl_area);
+			pf_dev->ctrl_bar_size, &ctrl_hw->ctrl_area);
 	if (ctrl_hw->super.ctrl_bar == NULL) {
 		PMD_INIT_LOG(ERR, "Cloud not map the ctrl vNIC ctrl bar");
 		ret = -ENODEV;
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index d85993f70c..a09bbe52ca 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -1022,7 +1022,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev,
 	if (pf_dev->multi_pf.enabled)
 		hw->ctrl_bar = pf_dev->ctrl_bar;
 	else
-		hw->ctrl_bar = pf_dev->ctrl_bar + (port * NFP_NET_CFG_BAR_SZ);
+		hw->ctrl_bar = pf_dev->ctrl_bar + (port * pf_dev->ctrl_bar_size);
 
 	net_hw->mac_stats = pf_dev->mac_stats_bar +
 				(net_hw->nfp_idx * NFP_MAC_STATS_SIZE);
@@ -1555,9 +1555,10 @@ nfp_enable_multi_pf(struct nfp_pf_dev *pf_dev)
 	memset(&net_hw, 0, sizeof(struct nfp_net_hw));
 
 	/* Map the symbol table */
+	pf_dev->ctrl_bar_size = NFP_NET_CFG_BAR_SZ_MIN;
 	snprintf(name, sizeof(name), "_pf%u_net_bar0",
 			pf_dev->multi_pf.function_id);
-	ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, name, NFP_NET_CFG_BAR_SZ,
+	ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, name, pf_dev->ctrl_bar_size,
 			&area);
 	if (ctrl_bar == NULL) {
 		PMD_INIT_LOG(ERR, "Failed to find data vNIC memory symbol");
@@ -1574,6 +1575,9 @@ nfp_enable_multi_pf(struct nfp_pf_dev *pf_dev)
 		goto end;
 	}
 
+	/* Set the ctrl bar size */
+	nfp_net_ctrl_bar_size_set(pf_dev);
+
 	if (!pf_dev->multi_pf.enabled)
 		goto end;
 
@@ -1670,7 +1674,7 @@ nfp_init_app_fw_nic(struct nfp_net_hw_priv *hw_priv)
 	/* Map the symbol table */
 	snprintf(bar_name, sizeof(bar_name), "_pf%u_net_bar0", id);
 	pf_dev->ctrl_bar = nfp_rtsym_map(pf_dev->sym_tbl, bar_name,
-			pf_dev->total_phyports * NFP_NET_CFG_BAR_SZ,
+			pf_dev->total_phyports * pf_dev->ctrl_bar_size,
 			&pf_dev->ctrl_area);
 	if (pf_dev->ctrl_bar == NULL) {
 		PMD_INIT_LOG(ERR, "nfp_rtsym_map fails for %s", bar_name);
@@ -2065,11 +2069,11 @@ nfp_net_vf_config_init(struct nfp_pf_dev *pf_dev)
 	if (pf_dev->sriov_vf == 0)
 		return 0;
 
-	min_size = NFP_NET_CFG_BAR_SZ * pf_dev->sriov_vf;
+	min_size = pf_dev->ctrl_bar_size * pf_dev->sriov_vf;
 	snprintf(vf_bar_name, sizeof(vf_bar_name), "_pf%d_net_vf_bar",
 			pf_dev->multi_pf.function_id);
 	pf_dev->vf_bar = nfp_rtsym_map_offset(pf_dev->sym_tbl, vf_bar_name,
-			NFP_NET_CFG_BAR_SZ * pf_dev->vf_base_id,
+			pf_dev->ctrl_bar_size * pf_dev->vf_base_id,
 			min_size, &pf_dev->vf_area);
 	if (pf_dev->vf_bar == NULL) {
 		PMD_INIT_LOG(ERR, "Failed to get vf cfg.");
@@ -2295,15 +2299,15 @@ nfp_pf_init(struct rte_pci_device *pci_dev)
 		goto hwqueues_cleanup;
 	}
 
+	ret = nfp_enable_multi_pf(pf_dev);
+	if (ret != 0)
+		goto mac_stats_cleanup;
+
 	ret = nfp_net_vf_config_init(pf_dev);
 	if (ret != 0) {
 		PMD_INIT_LOG(ERR, "Failed to init VF config.");
-		goto mac_stats_cleanup;
-	}
-
-	ret = nfp_enable_multi_pf(pf_dev);
-	if (ret != 0)
 		goto vf_cfg_tbl_cleanup;
+	}
 
 	hw_priv->is_pf = true;
 	hw_priv->pf_dev = pf_dev;
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index 2e581c7e45..ab413a2c5a 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -295,6 +295,9 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 		goto pf_dev_free;
 	}
 
+	/* Set the ctrl bar size */
+	nfp_net_ctrl_bar_size_set(pf_dev);
+
 	PMD_INIT_LOG(DEBUG, "ctrl bar: %p", hw->ctrl_bar);
 
 	err = nfp_net_common_init(pf_dev, net_hw);
diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index e4e01d8c79..5f92c2c31d 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -2184,6 +2184,9 @@ nfp_net_version_check(struct nfp_hw *hw,
 	if (!nfp_net_is_valid_nfd_version(pf_dev->ver))
 		return false;
 
+	if (!nfp_net_is_valid_version_class(pf_dev->ver))
+		return false;
+
 	return true;
 }
 
@@ -2325,6 +2328,28 @@ nfp_net_is_valid_nfd_version(struct nfp_net_fw_ver version)
 	return false;
 }
 
+bool
+nfp_net_is_valid_version_class(struct nfp_net_fw_ver version)
+{
+	switch (version.class) {
+	case NFP_NET_CFG_VERSION_CLASS_GENERIC:
+		return true;
+	case NFP_NET_CFG_VERSION_CLASS_NO_EMEM:
+		return true;
+	default:
+		return false;
+	}
+}
+
+void
+nfp_net_ctrl_bar_size_set(struct nfp_pf_dev *pf_dev)
+{
+	if (pf_dev->ver.class == NFP_NET_CFG_VERSION_CLASS_GENERIC)
+		pf_dev->ctrl_bar_size = NFP_NET_CFG_BAR_SZ_32K;
+	else
+		pf_dev->ctrl_bar_size = NFP_NET_CFG_BAR_SZ_8K;
+}
+
 /* Disable rx and tx functions to allow for reconfiguring. */
 int
 nfp_net_stop(struct rte_eth_dev *dev)
diff --git a/drivers/net/nfp/nfp_net_common.h b/drivers/net/nfp/nfp_net_common.h
index 8d0922d48c..2c54815fc9 100644
--- a/drivers/net/nfp/nfp_net_common.h
+++ b/drivers/net/nfp/nfp_net_common.h
@@ -117,6 +117,7 @@ struct nfp_pf_dev {
 	struct nfp_eth_table *nfp_eth_table;
 
 	uint8_t *ctrl_bar;
+	uint32_t ctrl_bar_size;
 
 	struct nfp_cpp *cpp;
 	struct nfp_cpp_area *ctrl_area;
@@ -353,6 +354,7 @@ void nfp_net_tx_desc_limits(struct nfp_net_hw_priv *hw_priv,
 int nfp_net_check_dma_mask(struct nfp_pf_dev *pf_dev, char *name);
 int nfp_net_firmware_version_get(struct rte_eth_dev *dev, char *fw_version, size_t fw_size);
 bool nfp_net_is_valid_nfd_version(struct nfp_net_fw_ver version);
+bool nfp_net_is_valid_version_class(struct nfp_net_fw_ver version);
 struct nfp_net_hw *nfp_net_get_hw(const struct rte_eth_dev *dev);
 int nfp_net_stop(struct rte_eth_dev *dev);
 int nfp_net_flow_ctrl_get(struct rte_eth_dev *dev,
@@ -379,6 +381,7 @@ int nfp_net_vf_config_app_init(struct nfp_net_hw *net_hw,
 		struct nfp_pf_dev *pf_dev);
 bool nfp_net_version_check(struct nfp_hw *hw,
 		struct nfp_pf_dev *pf_dev);
+void nfp_net_ctrl_bar_size_set(struct nfp_pf_dev *pf_dev);
 
 #define NFP_PRIV_TO_APP_FW_NIC(app_fw_priv)\
 	((struct nfp_app_fw_nic *)app_fw_priv)
diff --git a/drivers/net/nfp/nfp_net_ctrl.c b/drivers/net/nfp/nfp_net_ctrl.c
index ea14b98924..b34d8f140f 100644
--- a/drivers/net/nfp/nfp_net_ctrl.c
+++ b/drivers/net/nfp/nfp_net_ctrl.c
@@ -30,13 +30,15 @@ nfp_net_tlv_caps_parse(struct rte_eth_dev *dev)
 	uint32_t tlv_type;
 	struct nfp_net_hw *net_hw;
 	struct nfp_net_tlv_caps *caps;
+	struct nfp_net_hw_priv *hw_priv;
 
 	net_hw = dev->data->dev_private;
+	hw_priv = dev->process_private;
 	caps = &net_hw->tlv_caps;
 	nfp_net_tlv_caps_reset(caps);
 
 	data = net_hw->super.ctrl_bar + NFP_NET_CFG_TLV_BASE;
-	end = net_hw->super.ctrl_bar + NFP_NET_CFG_BAR_SZ;
+	end = net_hw->super.ctrl_bar + hw_priv->pf_dev->ctrl_bar_size;
 
 	hdr = rte_read32(data);
 	if (hdr == 0) {
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [RFC PATCH] devtools/test-meson-builds: use cross file for 32bit build
@ 2024-09-04 14:03 13% Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-09-04 14:03 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When testing the 32-bit x86 build on debian or ubuntu linux systems, use
the cross-file rather than using args and pkgconfig environment
variable. The advantage of using the cross-file is that the paths are
saved across runs. While the '-m32' args settings are preserved in the
current setup, the PKG_CONFIG_LIBDIR value from environment is not,
which can cause rebuilds of the build-32b directory to fail if meson
needs to do a reconfiguration first.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 devtools/test-meson-builds.sh | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index d71bb1ded0..1d9d04ce7c 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -253,21 +253,24 @@ build build-x86-generic cc skipABI -Dcheck_includes=true \
 
 # 32-bit with default compiler
 if check_cc_flags '-m32' ; then
+	target_override='i386-pc-linux-gnu'
 	if [ -d '/usr/lib/i386-linux-gnu' ] ; then
-		# 32-bit pkgconfig on Debian/Ubuntu
-		export PKG_CONFIG_LIBDIR='/usr/lib/i386-linux-gnu/pkgconfig'
+		# 32-bit pkgconfig on Debian/Ubuntu, use cross file
+		build build-32b $srcdir/config/x86/cross-debian-32bit ABI
 	elif [ -d '/usr/lib32' ] ; then
 		# 32-bit pkgconfig on Arch
 		export PKG_CONFIG_LIBDIR='/usr/lib32/pkgconfig'
+		build build-32b cc ABI -Dc_args='-m32' -Dc_link_args='-m32' \
+				-Dcpp_args='-m32' -Dcpp_link_args='-m32'
+		unset PKG_CONFIG_LIBDIR
 	else
 		# 32-bit pkgconfig on RHEL/Fedora (lib vs lib64)
 		export PKG_CONFIG_LIBDIR='/usr/lib/pkgconfig'
+		build build-32b cc ABI -Dc_args='-m32' -Dc_link_args='-m32' \
+				-Dcpp_args='-m32' -Dcpp_link_args='-m32'
+		unset PKG_CONFIG_LIBDIR
 	fi
-	target_override='i386-pc-linux-gnu'
-	build build-32b cc ABI -Dc_args='-m32' -Dc_link_args='-m32' \
-			-Dcpp_args='-m32' -Dcpp_link_args='-m32'
 	target_override=
-	unset PKG_CONFIG_LIBDIR
 fi
 
 # x86 MinGW
-- 
2.43.0


^ permalink raw reply	[relevance 13%]

Results 13201-13400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2021-04-12 21:53     [dpdk-dev] [PATCH] devtools: test different build types Thomas Monjalon
2021-08-08 12:51     ` [dpdk-dev] [PATCH v3 0/5] more build tests Thomas Monjalon
2021-08-08 12:51       ` [dpdk-dev] [PATCH v3 5/5] devtools: test different build types Thomas Monjalon
2024-08-15 16:26  0%     ` Stephen Hemminger
2023-07-03  9:31     [PATCH v4] bitmap: add scan from offset function Volodymyr Fialko
2023-07-03 12:39     ` [PATCH v5] " Volodymyr Fialko
2024-07-03 12:50       ` Thomas Monjalon
2024-07-03 13:42  0%     ` Volodymyr Fialko
2023-10-20 16:51     [PATCH v2 0/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-04-30 16:27     ` [PATCH v9 " Yoan Picchi
2024-04-30 16:27       ` [PATCH v9 4/4] " Yoan Picchi
2024-06-14 13:42  4%     ` David Marchand
2024-07-03 17:13     ` [PATCH v10 0/4] " Yoan Picchi
2024-07-03 17:13       ` [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-07-04 20:31  3%     ` David Marchand
2024-07-05 17:45  3% ` [PATCH v11 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
2024-07-05 17:45  3%   ` [PATCH v11 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-08 12:14  3% ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup Yoan Picchi
2024-07-08 12:14  3%   ` [PATCH v12 1/7] hash: make compare signature function enum private Yoan Picchi
2024-07-09  4:48  0%   ` [PATCH v12 0/7] hash: add SVE support for bulk key lookup David Marchand
2023-11-15 13:36     [PATCH v1 0/2] dts: api docs generation Juraj Linkeš
2024-04-12 10:14     ` [PATCH v4 0/3] dts: API " Juraj Linkeš
2024-04-12 10:14  2%   ` [PATCH v4 3/3] dts: add API doc generation Juraj Linkeš
2024-06-24 13:26     ` [PATCH v5 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 13:26  2%   ` [PATCH v5 4/4] dts: add API doc generation Juraj Linkeš
2024-06-24 13:45     ` [PATCH v6 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 13:46  2%   ` [PATCH v6 4/4] dts: add API doc generation Juraj Linkeš
2024-06-24 14:08  0%     ` Juraj Linkeš
2024-06-24 14:25     ` [PATCH v7 0/4] dts: API docs generation Juraj Linkeš
2024-06-24 14:25  2%   ` [PATCH v7 4/4] dts: add API doc generation Juraj Linkeš
2024-07-12  8:57     ` [PATCH v8 0/5] dts: API docs generation Juraj Linkeš
2024-07-12  8:57  3%   ` [PATCH v8 5/5] dts: add API doc generation Juraj Linkeš
2024-07-30 13:51  0%     ` Thomas Monjalon
2024-08-01 13:03  0%       ` Juraj Linkeš
2024-08-01  9:18     ` [PATCH v9 0/5] dts: API docs generation Juraj Linkeš
2024-08-01  9:18  3%   ` [PATCH v9 5/5] dts: add API doc generation Juraj Linkeš
2024-08-01  9:37     ` [PATCH v10 0/5] dts: API docs generation Juraj Linkeš
2024-08-01  9:37  3%   ` [PATCH v10 5/5] dts: add API doc generation Juraj Linkeš
2024-08-05 13:59     ` [PATCH v11 0/5] dts: API docs generation Juraj Linkeš
2024-08-05 13:59  2%   ` [PATCH v11 5/5] dts: add API doc generation Juraj Linkeš
2024-08-06  6:13     ` [PATCH v12 0/5] dts: API docs generation Juraj Linkeš
2024-08-06  6:14  2%   ` [PATCH v12 5/5] dts: add API doc generation Juraj Linkeš
2024-08-06  8:46     ` [PATCH v13 0/6] API docs generation Juraj Linkeš
2024-08-06  8:46  2%   ` [PATCH v13 6/6] dts: add API doc generation Juraj Linkeš
2024-08-06 11:17     ` [PATCH v14 0/6] API docs generation Juraj Linkeš
2024-08-06 11:17  2%   ` [PATCH v14 6/6] dts: add API doc generation Juraj Linkeš
2024-08-06 15:19     ` [PATCH v15 0/5] API docs generation Juraj Linkeš
2024-08-06 15:19  2%   ` [PATCH v15 5/5] dts: add API doc generation Juraj Linkeš
2024-08-08  8:54     ` [PATCH v16 0/5] API docs generation Juraj Linkeš
2024-08-08  8:54  2%   ` [PATCH v16 5/5] dts: add API doc generation Juraj Linkeš
2024-08-14 15:05     ` [PATCH v17 0/5] API docs generation Juraj Linkeš
2024-08-14 15:05  2%   ` [PATCH v17 5/5] dts: add API doc generation Juraj Linkeš
2024-08-20 13:18     ` [PATCH v18 0/5] API docs generation Juraj Linkeš
2024-08-20 13:18  2%   ` [PATCH v18 5/5] dts: add API doc generation Juraj Linkeš
2024-08-21 15:02     ` [PATCH v19 0/5] DTS API docs generation Juraj Linkeš
2024-08-21 15:02  2%   ` [PATCH v19 5/5] dts: add API doc generation Juraj Linkeš
2023-12-14  1:56     [PATCH] ethdev: add dump regs for telemetry Jie Hai
2024-07-22  6:58     ` [PATCH v6 0/8] support dump reigser names and filter Jie Hai
2024-07-22  6:58  8%   ` [PATCH v6 1/8] ethdev: support report register " Jie Hai
2024-01-30  3:46     [RFC 0/2] net/tap RSS BPF rewrite Stephen Hemminger
2024-04-05 21:14     ` [PATCH v6 0/8] net/tap: cleanup and fix BPF flow support Stephen Hemminger
2024-04-05 21:14  2%   ` [PATCH v6 6/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-08 21:18     ` [PATCH v7 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-08 21:18  2%   ` [PATCH v7 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-09  3:40     ` [PATCH v8 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-09  3:40  2%   ` [PATCH v8 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-26 15:48     ` [PATCH v9 0/9] net/tap: fix RSS (BPF) support Stephen Hemminger
2024-04-26 15:48  2%   ` [PATCH v9 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-01 16:11     ` [PATCH v10 0/9] net/tap: fix RSS (BPF) flow support Stephen Hemminger
2024-05-01 16:12  2%   ` [PATCH v10 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02  2:49     ` [PATCH v11 0/9] net/tap fix RSS (BPF) flow support Stephen Hemminger
2024-05-02  2:49  2%   ` [PATCH v11 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 21:31     ` [PATCH v12 00/12] net/tap: RSS and other fixes Stephen Hemminger
2024-05-02 21:31       ` [PATCH v12 02/12] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-20 17:46         ` Ferruh Yigit
2024-05-20 18:16  3%       ` Stephen Hemminger
2024-05-02 21:31  2%   ` [PATCH v12 06/12] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 21:31       ` [PATCH v12 07/12] net/tap: use libbpf to load new " Stephen Hemminger
2024-05-20 17:49         ` Ferruh Yigit
2024-05-20 18:18           ` Stephen Hemminger
2024-05-20 21:42  3%         ` Luca Boccassi
2024-05-20 22:08  0%           ` Ferruh Yigit
2024-05-20 22:25  0%             ` Luca Boccassi
2024-05-20 23:20  0%             ` Stephen Hemminger
2024-05-21  2:47     ` [PATCH v13 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21  2:47  2%   ` [PATCH v13 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21  2:47  2%   ` [PATCH v13 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-21 17:06     ` [PATCH v14 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21 17:06  2%   ` [PATCH v14 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21 17:06  2%   ` [PATCH v14 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-21 20:12     ` [PATCH v15 00/11] net/tap: make RSS work again Stephen Hemminger
2024-05-21 20:12  2%   ` [PATCH v15 02/11] net/tap: do not duplicate fd's Stephen Hemminger
2024-05-21 20:12  2%   ` [PATCH v15 06/11] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-01-30 23:26     [PATCH] replace GCC marker extension with C11 anonymous unions Tyler Retzlaff
2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-03 19:32  0%     ` Morten Brørup
2024-04-03 22:45  0%       ` Tyler Retzlaff
2024-04-03 21:49  0%     ` Stephen Hemminger
2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-04 17:51  2%   ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-06-19 15:01  3% ` [PATCH v12 0/4] remove use of RTE_MARKER fields in libraries David Marchand
2024-06-19 15:01  6%   ` [PATCH v12 2/4] mbuf: remove marker fields David Marchand
2024-03-20 10:55     [PATCH 0/2] introduce PM QoS interface Huisong Li
2024-06-13 11:20  4% ` [PATCH v2 0/2] power: " Huisong Li
2024-06-13 11:20  5%   ` [PATCH v2 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19  6:31  4% ` [PATCH v3 0/2] power: introduce PM QoS interface Huisong Li
2024-06-19  6:31  5%   ` [PATCH v3 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-06-19 15:32  0%     ` Thomas Monjalon
2024-06-20  2:32  0%       ` lihuisong (C)
2024-06-19  6:59  0%   ` [PATCH v3 0/2] power: introduce PM QoS interface Morten Brørup
2024-06-27  6:00  4% ` [PATCH v4 " Huisong Li
2024-06-27  6:00  5%   ` [PATCH v4 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-02  3:50  4% ` [PATCH v5 0/2] power: introduce PM QoS interface Huisong Li
2024-07-02  3:50  5%   ` [PATCH v5 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09  2:29  4% ` [PATCH v6 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09  2:29  5%   ` [PATCH v6 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09  6:31  4% ` [PATCH v7 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09  6:31  5%   ` [PATCH v7 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-07-09  7:25  4% ` [PATCH v8 0/2] power: introduce PM QoS interface Huisong Li
2024-07-09  7:25  5%   ` [PATCH v8 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-08-09  9:50  4% ` [PATCH v9 0/2] power: introduce PM QoS interface Huisong Li
2024-08-09  9:50  5%   ` [PATCH v9 1/2] power: introduce PM QoS API on CPU wide Huisong Li
2024-03-25 10:05     [PATCH v3] graph: expose node context as pointers Robin Jarry
2024-03-27  9:14     ` [PATCH v5] " Robin Jarry
2024-05-29 17:54  0%   ` Nithin Dabilpuram
2024-06-18 12:33  4%   ` David Marchand
2024-06-25 15:22  0%     ` Robin Jarry
2024-06-26 11:30  0%       ` Jerin Jacob
2024-03-26 23:59     [PATCH] igc/ixgbe: add get/set link settings interface Marek Pazdan
2024-04-03 13:40     ` [PATCH] lib: " Marek Pazdan
2024-04-03 16:49       ` Tyler Retzlaff
2024-04-04  7:09  4%     ` David Marchand
2024-04-05  0:55  0%       ` Tyler Retzlaff
2024-04-05  0:56  0%         ` Tyler Retzlaff
2024-04-05  8:58  0%         ` David Marchand
2024-04-04 16:29  3% Community CI Meeting Minutes - April 4, 2024 Patrick Robb
2024-04-04 21:04     [PATCH v1 0/3] Additional queue stats Nicolas Chautru
2024-04-04 21:04     ` [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
2024-04-05  0:46  3%   ` Stephen Hemminger
2024-04-05 15:15  3%   ` Stephen Hemminger
2024-04-05 18:17  3%     ` Chautru, Nicolas
2024-08-12  9:28  3%   ` Maxime Coquelin
2024-08-12  9:56  0%     ` Maxime Coquelin
2024-08-12 17:27  0%     ` Chautru, Nicolas
2024-08-12 19:44  0%       ` Maxime Coquelin
2024-04-10  9:33     Strict aliasing problem with rte_eth_linkstatus_set() fengchengwen
2024-04-10 15:27     ` Stephen Hemminger
2024-04-10 15:58  3%   ` Ferruh Yigit
2024-04-10 17:54       ` Morten Brørup
2024-04-10 19:58  3%     ` Tyler Retzlaff
2024-04-11  3:20  0%       ` fengchengwen
2024-04-11  8:22  3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
2024-04-12 11:57  3% ` [PATCH v2 " Akhil Goyal
2024-05-29 10:43  0%   ` Anoob Joseph
2024-05-30  9:19  0%     ` Akhil Goyal
2024-04-17  7:23     [RFC 0/2] ethdev: update GENEVE option item structure Michael Baum
2024-06-11 17:07  4% ` Ferruh Yigit
2024-04-18 17:49  3% Community CI Meeting Minutes - April 18, 2024 Patrick Robb
     [not found]     <20220825024425.10534-1-lihuisong@huawei.com>
2024-01-30  6:36     ` [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port Huisong Li
2024-04-23 11:17  0%   ` lihuisong (C)
2024-04-24  4:08  3% getting rid of type argument to rte_malloc() Stephen Hemminger
2024-04-24 10:29  0% ` Ferruh Yigit
2024-04-24 16:23  0%   ` Stephen Hemminger
2024-04-24 16:23  0%     ` Stephen Hemminger
2024-04-24 17:09  0%     ` Morten Brørup
2024-04-24 19:05  0%       ` Stephen Hemminger
2024-04-24 19:06  0%   ` Stephen Hemminger
2024-04-24 15:24  3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
2024-04-24 17:25  3% ` Morten Brørup
2024-04-24 19:10  0%   ` Thomas Monjalon
2024-04-25 17:46     [RFC] net/af_packet: make stats reset reliable Ferruh Yigit
2024-04-26 14:38     ` [RFC v2] " Ferruh Yigit
2024-04-28 15:11       ` Mattias Rönnblom
2024-05-07  7:23         ` Mattias Rönnblom
2024-05-07 13:49           ` Ferruh Yigit
2024-05-07 14:51             ` Stephen Hemminger
2024-05-07 16:00  3%           ` Morten Brørup
2024-05-07 16:54  0%             ` Ferruh Yigit
2024-05-07 18:47  0%               ` Stephen Hemminger
2024-05-08  7:48  0%             ` Mattias Rönnblom
2024-04-30 15:39     [PATCH] net/af_packet: fix statistics Stephen Hemminger
2024-05-01 16:25     ` Ferruh Yigit
2024-05-01 16:44  3%   ` Stephen Hemminger
2024-05-01 18:18  0%     ` Morten Brørup
2024-05-02 13:47  0%       ` Ferruh Yigit
2024-05-02 13:55     [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD Tom Jones
2024-05-03  9:46     ` Tom Jones
2024-05-03 13:03       ` Bruce Richardson
2024-05-03 13:12         ` Tom Jones
2024-05-03 13:24  3%       ` Bruce Richardson
2024-05-12  5:55     [PATCH] bpf: don't verify classic bpfs Yoav Winstein
2024-05-12 16:03     ` Stephen Hemminger
2024-05-16  9:36       ` Konstantin Ananyev
2024-06-27 15:36  3%     ` Thomas Monjalon
2024-06-27 18:14  0%       ` Konstantin Ananyev
2024-05-13 15:59     [PATCH 0/9] reowrd in prog guide Nandini Persad
2024-05-13 15:59  6% ` [PATCH 1/9] doc: reword design section in contributors guidelines Nandini Persad
2024-06-21  2:32     ` [PATCH v2 1/9] doc: reword pmd section in prog guide Nandini Persad
2024-06-21  2:32  6%   ` [PATCH v2 3/9] doc: reword design section in contributors guidelines Nandini Persad
2024-05-29 23:33     [PATCH v10 00/20] Remove use of noninclusive term sanity Stephen Hemminger
2024-05-29 23:33  2% ` [PATCH v10 01/20] mbuf: replace term sanity check Stephen Hemminger
2024-08-01 15:46     ` [PATCH v11 0/7] Remove usage of wording " Stephen Hemminger
2024-08-01 15:46  2%   ` [PATCH v11 1/7] mbuf: replace term " Stephen Hemminger
2024-05-30 12:44     [PATCH v4 1/2] eventdev/dma: reorganize event DMA ops pbhagavatula
2024-06-07 10:36  3% ` [PATCH v5 " pbhagavatula
2024-06-08  6:16  9%   ` Jerin Jacob
2024-06-04  4:44     [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-06-04  4:44     ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-06-04 15:41  3%   ` Stephen Hemminger
2024-06-06 13:32     [PATCH 0/1] net/ena: devargs api change shaibran
2024-06-06 13:33  3% ` [PATCH 1/1] net/ena: restructure the llq policy user setting shaibran
2024-07-05 17:32  4%   ` Ferruh Yigit
2024-07-06  4:59  4%     ` Brandes, Shai
2024-06-07 13:25     [PATCH v7 2/3] ethdev: add VXLAN last reserved field Thomas Monjalon
2024-06-07 14:02     ` [PATCH v8 0/3] support VXLAN last reserved byte modification Rongwei Liu
2024-06-07 14:02       ` [PATCH v8 2/3] ethdev: add VXLAN last reserved field Rongwei Liu
2024-06-11 14:52  3%     ` Ferruh Yigit
2024-06-12  1:25  0%       ` rongwei liu
2024-06-25 14:46  0%         ` Thomas Monjalon
2024-06-20 17:57     [PATCH v4 01/13] net/i40e: add missing vector API header include Mattias Rönnblom
2024-07-24  7:53     ` [PATCH v5 0/6] Optionally have rte_memcpy delegate to compiler memcpy Mattias Rönnblom
2024-07-24  7:53  5%   ` [PATCH v5 5/6] ci: test " Mattias Rönnblom
2024-06-21 18:33  3% [DPDK/core Bug 1471] rte_pktmbuf_free_bulk does not respect RTE_LIBRTE_MBUF_DEBUG bugzilla
2024-06-24 11:04     [PATCH v2] bus/vmbus: add device_order field to rte_vmbus_dev Vladimir Ratnikov
2024-06-24 15:13  3% ` Stephen Hemminger
2024-06-25 12:01  3%   ` David Marchand
2024-06-27 20:52  2% Community CI Meeting Minutes - June 27, 2024 Patrick Robb
2024-07-05 14:52  4% [PATCH v6] graph: expose node context as pointers Robin Jarry
2024-07-12 11:39  0% ` [EXTERNAL] " Kiran Kumar Kokkilagadda
2024-07-07  9:57  4% [PATCH] net/mlx5: fix compilation warning in GCC-9.1 Gregory Etelson
2024-07-18  7:24  4% ` Raslan Darawsheh
2024-07-08 20:35     [PATCH v3] ethdev: Add link_speed lanes support Damodharam Ammepalli
2024-07-08 23:22     ` [PATCH v4] " Damodharam Ammepalli
2024-07-09 11:10  4%   ` Ferruh Yigit
2024-07-09 21:20  0%     ` Damodharam Ammepalli
2024-07-12 18:25     release candidate 24.07-rc2 David Marchand
2024-07-23  2:14  4% ` Xu, HailinX
2024-07-15 22:11     [RFC v2] ethdev: an API for cache stashing hints Wathsala Vithanage
2024-07-17  2:27  3% ` Stephen Hemminger
2024-07-18 18:48  0%   ` Wathsala Wathawana Vithanage
2024-07-20  3:05  3%   ` Honnappa Nagarahalli
2024-07-18  3:42 10% [DPDK/eventdev Bug 1497] [dpdk-24.07] [ABI][meson test] driver-tests/event_dma_adapter_autotest test hang when do ABI testing bugzilla
2024-07-18 15:03     IPv6 APIs rework Robin Jarry
2024-07-18 20:27     ` Morten Brørup
2024-07-18 21:25       ` Vladimir Medvedkin
2024-07-18 21:34  3%     ` Robin Jarry
2024-07-19  8:25  0%       ` Konstantin Ananyev
2024-07-19  9:12  0%       ` Morten Brørup
2024-07-19 10:41  0%         ` Medvedkin, Vladimir
2024-07-20 16:50     [PATCH v1 0/4] power: refactor power management library Sivaprasad Tummala
2024-08-26 13:06     ` [PATCH v2 " Sivaprasad Tummala
2024-08-26 13:06       ` [PATCH v2 2/4] power: refactor uncore " Sivaprasad Tummala
2024-08-27 13:02  3%     ` lihuisong (C)
2024-07-22 14:53  8% [PATCH] doc: announce cryptodev change to support EDDSA Gowrishankar Muthukrishnan
2024-07-24  5:07  0% ` Anoob Joseph
2024-07-24  6:46  0% ` [EXTERNAL] " Akhil Goyal
2024-07-25 15:01  0% ` Kusztal, ArkadiuszX
2024-07-31 12:57  3%   ` Thomas Monjalon
2024-08-07 17:21  0%     ` [EXTERNAL] " Gowrishankar Muthukrishnan
2024-07-22 14:55  5% [PATCH] doc: announce cryptodev changes to offload RSA in VirtIO Gowrishankar Muthukrishnan
2024-07-24  6:49  0% ` [EXTERNAL] " Akhil Goyal
2024-07-25  9:48  0% ` Kusztal, ArkadiuszX
2024-07-25 15:53  0%   ` Gowrishankar Muthukrishnan
2024-07-30 14:39  0%     ` Gowrishankar Muthukrishnan
2024-07-31 12:51  0%       ` Thomas Monjalon
2024-07-31 14:26  0%         ` Thomas Monjalon
2024-08-07 13:31  0%           ` Kusztal, ArkadiuszX
2024-07-25 16:00  0%   ` Gowrishankar Muthukrishnan
2024-07-22 14:56  8% [PATCH] doc: announce vhost changes to support asymmetric operation Gowrishankar Muthukrishnan
2024-07-23 18:30  4% ` Jerin Jacob
2024-07-25  9:29  4%   ` [EXTERNAL] " Gowrishankar Muthukrishnan
2024-07-29 12:49     [PATCH] doc: announce dmadev new capability addition Vamsi Attunuru
2024-07-29 15:20  3% ` Jerin Jacob
2024-07-29 17:17  0%   ` Morten Brørup
2024-07-31 10:24  0%   ` Thomas Monjalon
2024-07-30 12:53     [PATCH v3] doc: announce changes to dma device structures Amit Prakash Shukla
2024-07-30 17:27     ` [PATCH v4] " Jerin Jacob
2024-07-31 11:01       ` Thomas Monjalon
2024-07-31 16:06  3%     ` Thomas Monjalon
2024-07-31 19:22  3% DPDK 24.07 released Thomas Monjalon
2024-08-08  8:03 12% [PATCH] version: 24.11-rc0 David Marchand
2024-08-08 12:00  0% ` Thomas Monjalon
2024-08-08 12:55  3%   ` David Marchand
2024-08-08 21:45  3%     ` [OS-Team] [dpdklab] " Patrick Robb
2024-08-12 13:29     [RFC PATCH] config: make queues per port a meson config option Bruce Richardson
2024-08-13 15:59     ` [RFC PATCH v2 00/26] add meson config options for queues per port Bruce Richardson
2024-08-13 16:00  5%   ` [RFC PATCH v2 26/26] config: add computed max queues define for compatibility Bruce Richardson
2024-08-14 10:49     ` [PATCH v3 00/26] add meson config options for queues per port Bruce Richardson
2024-08-14 10:49  5%   ` [PATCH v3 26/26] config: add computed max queues define for compatibility Bruce Richardson
2024-08-12 23:41  3% [PATCH v2 0/3] bbdev: sdditional queue stats Nicolas Chautru
2024-08-12 23:41  6% ` [PATCH v2 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
2024-08-13  7:22  0% ` [PATCH v2 0/3] bbdev: sdditional queue stats Hemant Agrawal
2024-08-15  8:53     [RFC 0/6] Stage-Ordered API and other extensions for ring library Konstantin Ananyev
2024-08-15  8:53  2% ` [RFC 6/6] ring: minimize reads of the counterpart cache-line Konstantin Ananyev
2024-08-22 10:36     [PATCH] app/testpmd: show output of commands read from file Bruce Richardson
2024-08-22 10:41     ` [PATCH v2] " Bruce Richardson
2024-08-22 16:53       ` Ferruh Yigit
2024-08-22 17:14  3%     ` Bruce Richardson
2024-08-22 17:18  0%       ` Bruce Richardson
2024-08-22 21:09  3%         ` Ferruh Yigit
2024-08-23  9:12  0%           ` Bruce Richardson
2024-08-27 15:10     [RFC 0/2] introduce LLC aware functions Vipin Varghese
2024-08-27 15:10     ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 20:56       ` Wathsala Wathawana Vithanage
2024-09-02  1:20  4%     ` Varghese, Vipin
2024-08-27 23:03  3% [PATCH v2 0/1] bbdev: removing unnecessaray symbols Nicolas Chautru
2024-08-27 23:03  3% ` [PATCH v2 1/1] bbdev: removing unnecessaray symbols from version map Nicolas Chautru
2024-08-27 23:06  3% [PATCH v3 0/1] bbdev: removing unnecessaray symbols Nicolas Chautru
2024-08-27 23:06  3% ` [PATCH v3 1/1] bbdev: removing unnecessary symbols from version map Nicolas Chautru
2024-09-03  1:41     [PATCH 0/4] Support new card using NFP 3800 chip Chaoyong He
2024-09-03  1:41  2% ` [PATCH 2/4] net/nfp: refactor the firmware version logic Chaoyong He
2024-09-03  1:41  2% ` [PATCH 3/4] net/nfp: support different configuration BAR size Chaoyong He
2024-09-04 14:03 13% [RFC PATCH] devtools/test-meson-builds: use cross file for 32bit build Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).