DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/4] support protocol based buffer split
@ 2022-08-12 18:15 Yuan Wang
  2022-08-12 18:15 ` [PATCH 1/4] ethdev: introduce protocol header API Yuan Wang
                   ` (15 more replies)
  0 siblings, 16 replies; 72+ messages in thread
From: Yuan Wang @ 2022-08-12 18:15 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, jerinjacobk,
	viacheslavo, mdr
  Cc: stephen, xuan.ding, wenxuanx.wu, dev, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools. 

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 123 +++++++++++++-
 app/test-pmd/config.c                  |  70 ++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   6 +-
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  14 ++
 drivers/net/ice/ice_ethdev.c           |  35 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                |  88 ++++++++--
 lib/ethdev/rte_ethdev.h                |  41 ++++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 606 insertions(+), 50 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/4] ethdev: introduce protocol header API
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
@ 2022-08-12 18:15 ` Yuan Wang
  2022-08-12 18:15 ` [PATCH 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-08-12 18:15 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, jerinjacobk,
	viacheslavo, mdr
  Cc: stephen, xuan.ding, wenxuanx.wu, dev, Yuan Wang

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 24 +++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 80 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..4d90514a9a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+  header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 5101868ea7..f64ceb9907 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1054,6 +1054,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1301,6 +1313,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..093c577add 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5917,6 +5917,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL "
+			"when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get, -ENOTSUP);
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..c58c908c3a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6206,6 +6206,30 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..e496c8d938 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/4] ethdev: introduce protocol hdr based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
  2022-08-12 18:15 ` [PATCH 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-08-12 18:15 ` Yuan Wang
  2022-08-12 18:15 ` [PATCH 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-08-12 18:15 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, jerinjacobk,
	viacheslavo, mdr
  Cc: stephen, xuan.ding, wenxuanx.wu, dev, Yuan Wang

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* supported ptype of a specific pmd,
                               configures split point.
			       It should be defined by RTE_PTYPE_*
			     */
};

If protocol header split can be supported by a PMD. The
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 +++
 lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
 lib/ethdev/rte_ethdev.h                | 17 +++++++-
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4d90514a9a..f3b58c7895 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -60,6 +60,11 @@ New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 093c577add..dfceb723ee 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+		if (ret <= 0) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			uint32_t ptypes[ret];
+			int i;
+
+			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
+			for (i = 0; i < ret; i++)
+				if (ptypes[i] & proto_hdr)
+					break;
+
+			if (i == ret) {
+#define PTYPE_NAMESIZE	256
+				char ptype_name[PTYPE_NAMESIZE];
+				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %s is not supported.\n",
+					ptype_name);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c58c908c3a..410fba5eab 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Supported ptype of a specific pmd, configures split point.
+	 * It should be defined by RTE_PTYPE_*.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
  2022-08-12 18:15 ` [PATCH 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-08-12 18:15 ` [PATCH 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-08-12 18:15 ` Yuan Wang
  2022-08-12 18:15 ` [PATCH 4/4] net/ice: support buffer split in Rx path Yuan Wang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-08-12 18:15 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, jerinjacobk,
	viacheslavo, mdr
  Cc: stephen, xuan.ding, wenxuanx.wu, dev, Yuan Wang

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs mac,ipv4,tcp,udp,sctp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs mac,ipv4
        (default protocols of testpmd : mac|icmp|ipv4|ipv6|tcp|udp|
                              sctp|inner_mac|inner_ipv4|inner_ipv6|
                              inner_tcp|inner_udp|inner_sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 123 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  70 ++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   6 +-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 215 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4fe9dfb17..f00b7bc6a4 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -307,6 +307,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|tcp|udp|sctp|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_tcp|inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3456,6 +3464,68 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3825,6 +3895,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6918,6 +7032,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6930,12 +7046,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -14166,6 +14282,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a2939867c4..4102ebb3f9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4940,6 +4940,76 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "outer_mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..206194c385 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -164,6 +164,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -676,6 +677,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1331,7 +1333,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1341,6 +1342,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..039e2edfca 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2680,11 +2681,12 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
 		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+					rx_pkt_seg_lengths[i] :
+					mbuf_data_size[mp_n];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..de992c9416 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,6 +562,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -892,6 +893,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 int init_cmdline(void);
@@ -1049,6 +1053,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 4/4] net/ice: support buffer split in Rx path
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (2 preceding siblings ...)
  2022-08-12 18:15 ` [PATCH 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-08-12 18:15 ` Yuan Wang
  2022-09-01 22:33 ` [PATCH v2 0/4] support protocol based buffer split Yuan Wang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-08-12 18:15 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, jerinjacobk,
	viacheslavo, mdr
  Cc: stephen, xuan.ding, wenxuanx.wu, dev, Yuan Wang

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of ice pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  35 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 246 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index f3b58c7895..99af35714d 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -66,6 +66,10 @@ New Features
   User can choose length or protocol header to configure buffer split
   according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  Added protocol based buffer split support in scalar path.
+
 Removed Items
 -------------
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index b2300790ae..b5ccda74b8 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -169,6 +169,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -280,6 +281,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3749,7 +3751,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3761,7 +3764,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3830,6 +3833,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5886,6 +5894,29 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L3_IPV4,
+		RTE_PTYPE_L3_IPV6,
+		RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_INNER_L3_IPV4,
+		RTE_PTYPE_INNER_L3_IPV6,
+		RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_SCTP,
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bfb3a16ae2..96c4cecf56 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,53 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+		case RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_PTYPE_UNKNOWN:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +442,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +450,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +457,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +806,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1174,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1570,7 +1656,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1625,6 +1711,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1716,7 +1820,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1729,6 +1835,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1737,13 +1852,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2352,11 +2475,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2384,12 +2509,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2402,24 +2531,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f5337d5284..d44bde3710 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -98,6 +112,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (3 preceding siblings ...)
  2022-08-12 18:15 ` [PATCH 4/4] net/ice: support buffer split in Rx path Yuan Wang
@ 2022-09-01 22:33 ` Yuan Wang
  2022-09-01 22:34 ` [PATCH v2 1/4] ethdev: introduce protocol header API Yuan Wang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-01 22:33 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang, mdr
  Cc: jerinjacobk, viacheslavo, stephen, xuan.ding, dev, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v2:
Add mbuf dump to the driver's buffer split path. 
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 123 +++++++++++++-
 app/test-pmd/config.c                  |  70 ++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/nics/features/ice.ini       |   1 +
 doc/guides/rel_notes/release_22_11.rst |  14 ++
 drivers/net/ice/ice_ethdev.c           |  30 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                |  88 ++++++++--
 lib/ethdev/rte_ethdev.h                |  41 ++++-
 lib/ethdev/version.map                 |   3 +
 15 files changed, 600 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 1/4] ethdev: introduce protocol header API
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (4 preceding siblings ...)
  2022-09-01 22:33 ` [PATCH v2 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-01 22:34 ` Yuan Wang
  2022-09-01 22:35 ` [PATCH v2 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-01 22:34 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, ferruh.yigit, mdr
  Cc: xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding, dev,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 24 +++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 80 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..4d90514a9a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+  header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 5101868ea7..f64ceb9907 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1054,6 +1054,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1301,6 +1313,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..093c577add 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5917,6 +5917,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL "
+			"when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get, -ENOTSUP);
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..c58c908c3a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6206,6 +6206,30 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..e496c8d938 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 2/4] ethdev: introduce protocol hdr based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (5 preceding siblings ...)
  2022-09-01 22:34 ` [PATCH v2 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-01 22:35 ` Yuan Wang
  2022-09-01 22:36 ` [PATCH v2 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-01 22:35 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, ferruh.yigit
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding, dev,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* supported ptype of a specific pmd,
                               configures split point.
			       It should be defined by RTE_PTYPE_*
			     */
};

If protocol header split can be supported by a PMD. The
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 +++
 lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
 lib/ethdev/rte_ethdev.h                | 17 +++++++-
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4d90514a9a..f3b58c7895 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -60,6 +60,11 @@ New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 093c577add..dfceb723ee 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+		if (ret <= 0) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			uint32_t ptypes[ret];
+			int i;
+
+			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
+			for (i = 0; i < ret; i++)
+				if (ptypes[i] & proto_hdr)
+					break;
+
+			if (i == ret) {
+#define PTYPE_NAMESIZE	256
+				char ptype_name[PTYPE_NAMESIZE];
+				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %s is not supported.\n",
+					ptype_name);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c58c908c3a..410fba5eab 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Supported ptype of a specific pmd, configures split point.
+	 * It should be defined by RTE_PTYPE_*.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (6 preceding siblings ...)
  2022-09-01 22:35 ` [PATCH v2 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-01 22:36 ` Yuan Wang
  2022-09-01 22:37 ` [PATCH v2 4/4] net/ice: support buffer split in Rx path Yuan Wang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-01 22:36 UTC (permalink / raw)
  To: aman.deep.singh, yuying.zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, dev, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs mac,ipv4,tcp,udp,sctp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs mac,ipv4
        (default protocols of testpmd : mac|icmp|ipv4|ipv6|tcp|udp|
                              sctp|inner_mac|inner_ipv4|inner_ipv6|
                              inner_tcp|inner_udp|inner_sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 123 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  70 ++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 213 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4fe9dfb17..f00b7bc6a4 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -307,6 +307,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|tcp|udp|sctp|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_tcp|inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3456,6 +3464,68 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3825,6 +3895,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6918,6 +7032,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6930,12 +7046,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -14166,6 +14282,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a2939867c4..4102ebb3f9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4940,6 +4940,76 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "outer_mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..206194c385 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -164,6 +164,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -676,6 +677,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1331,7 +1333,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1341,6 +1342,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..157db60cd4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2685,6 +2686,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..de992c9416 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,6 +562,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -892,6 +893,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 int init_cmdline(void);
@@ -1049,6 +1053,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v2 4/4] net/ice: support buffer split in Rx path
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (7 preceding siblings ...)
  2022-09-01 22:36 ` [PATCH v2 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-09-01 22:37 ` Yuan Wang
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-01 22:37 UTC (permalink / raw)
  To: qi.z.zhang, qiming.yang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, dev, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of ice pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/nics/features/ice.ini       |   1 +
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  30 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 6 files changed, 242 insertions(+), 32 deletions(-)

diff --git a/doc/guides/nics/features/ice.ini b/doc/guides/nics/features/ice.ini
index 7861790a51..bf978ab7f5 100644
--- a/doc/guides/nics/features/ice.ini
+++ b/doc/guides/nics/features/ice.ini
@@ -7,6 +7,7 @@
 ; is selected.
 ;
 [Features]
+Buffer split         = P
 Speed capabilities   = Y
 Link status          = Y
 Link status event    = Y
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index f3b58c7895..99af35714d 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -66,6 +66,10 @@ New Features
   User can choose length or protocol header to configure buffer split
   according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  Added protocol based buffer split support in scalar path.
+
 Removed Items
 -------------
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index b2300790ae..3e140439ff 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -169,6 +169,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -280,6 +281,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3749,7 +3751,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3761,7 +3764,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3830,6 +3833,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5886,6 +5894,24 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_INNER_L3_IPV4,
+		RTE_PTYPE_INNER_L3_IPV6,
+		RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_SCTP,
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bfb3a16ae2..94804b9945 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,47 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_PTYPE_UNKNOWN:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +436,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +444,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +451,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +500,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +799,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1133,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1145,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1165,23 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		RTE_ASSERT(n_seg <= ICE_RX_MAX_NSEG);
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1570,7 +1648,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1625,6 +1703,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1716,7 +1815,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1729,6 +1830,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1737,13 +1847,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2352,11 +2470,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2384,12 +2504,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2402,24 +2526,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f5337d5284..d44bde3710 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -98,6 +112,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (8 preceding siblings ...)
  2022-09-01 22:37 ` [PATCH v2 4/4] net/ice: support buffer split in Rx path Yuan Wang
@ 2022-09-02 19:10 ` Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 more replies)
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
                   ` (5 subsequent siblings)
  15 siblings, 4 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-02 19:10 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 123 +++++++++++++-
 app/test-pmd/config.c                  |  70 ++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/nics/features/ice.ini       |   1 +
 doc/guides/rel_notes/release_22_11.rst |  14 ++
 drivers/net/ice/ice_ethdev.c           |  30 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                |  88 ++++++++--
 lib/ethdev/rte_ethdev.h                |  41 ++++-
 lib/ethdev/version.map                 |   3 +
 15 files changed, 600 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 1/4] ethdev: introduce protocol header API
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-02 19:10   ` Yuan Wang
  2022-09-12 11:24     ` Andrew Rybchenko
  2022-09-02 19:10   ` [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-09-02 19:10 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 24 +++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 80 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..4d90514a9a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+  header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 5101868ea7..f64ceb9907 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1054,6 +1054,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1301,6 +1313,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..093c577add 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5917,6 +5917,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL "
+			"when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get, -ENOTSUP);
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..c58c908c3a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6206,6 +6206,30 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..e496c8d938 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-02 19:10   ` Yuan Wang
  2022-09-12 11:47     ` Andrew Rybchenko
  2022-09-13  7:56     ` Suanming Mou
  2022-09-02 19:10   ` [PATCH v3 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 2 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-02 19:10 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* supported ptype of a specific pmd,
                               configures split point.
			       It should be defined by RTE_PTYPE_*
			     */
};

If protocol header split can be supported by a PMD. The
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 +++
 lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
 lib/ethdev/rte_ethdev.h                | 17 +++++++-
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4d90514a9a..f3b58c7895 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -60,6 +60,11 @@ New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 093c577add..dfceb723ee 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+		if (ret <= 0) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			uint32_t ptypes[ret];
+			int i;
+
+			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
+			for (i = 0; i < ret; i++)
+				if (ptypes[i] & proto_hdr)
+					break;
+
+			if (i == ret) {
+#define PTYPE_NAMESIZE	256
+				char ptype_name[PTYPE_NAMESIZE];
+				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %s is not supported.\n",
+					ptype_name);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c58c908c3a..410fba5eab 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Supported ptype of a specific pmd, configures split point.
+	 * It should be defined by RTE_PTYPE_*.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-02 19:10   ` Yuan Wang
  2022-09-02 19:10   ` [PATCH v3 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-02 19:10 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs mac,ipv4,tcp,udp,sctp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs mac,ipv4
        (default protocols of testpmd : mac|icmp|ipv4|ipv6|tcp|udp|
                              sctp|inner_mac|inner_ipv4|inner_ipv6|
                              inner_tcp|inner_udp|inner_sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 123 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  70 ++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 213 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4fe9dfb17..f00b7bc6a4 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -307,6 +307,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|tcp|udp|sctp|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_tcp|inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3456,6 +3464,68 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3825,6 +3895,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6918,6 +7032,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6930,12 +7046,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -14166,6 +14282,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a2939867c4..4102ebb3f9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4940,6 +4940,76 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "outer_mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..206194c385 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -164,6 +164,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -676,6 +677,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1331,7 +1333,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1341,6 +1342,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..157db60cd4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2685,6 +2686,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..de992c9416 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,6 +562,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -892,6 +893,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 int init_cmdline(void);
@@ -1049,6 +1053,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v3 4/4] net/ice: support buffer split in Rx path
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-09-02 19:10   ` [PATCH v3 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-09-02 19:10   ` Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-02 19:10 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of ice pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/nics/features/ice.ini       |   1 +
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  30 +++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 6 files changed, 242 insertions(+), 32 deletions(-)

diff --git a/doc/guides/nics/features/ice.ini b/doc/guides/nics/features/ice.ini
index 7861790a51..bf978ab7f5 100644
--- a/doc/guides/nics/features/ice.ini
+++ b/doc/guides/nics/features/ice.ini
@@ -7,6 +7,7 @@
 ; is selected.
 ;
 [Features]
+Buffer split         = P
 Speed capabilities   = Y
 Link status          = Y
 Link status event    = Y
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index f3b58c7895..99af35714d 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -66,6 +66,10 @@ New Features
   User can choose length or protocol header to configure buffer split
   according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  Added protocol based buffer split support in scalar path.
+
 Removed Items
 -------------
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index b2300790ae..3e140439ff 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -169,6 +169,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -280,6 +281,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3749,7 +3751,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3761,7 +3764,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3830,6 +3833,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5886,6 +5894,24 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_INNER_L3_IPV4,
+		RTE_PTYPE_INNER_L3_IPV6,
+		RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_SCTP,
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bfb3a16ae2..94804b9945 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,47 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_PTYPE_UNKNOWN:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +436,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +444,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +451,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +500,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +799,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1133,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1145,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1165,23 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		RTE_ASSERT(n_seg <= ICE_RX_MAX_NSEG);
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1570,7 +1648,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1625,6 +1703,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1716,7 +1815,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1729,6 +1830,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1737,13 +1847,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2352,11 +2470,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2384,12 +2504,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2402,24 +2526,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f5337d5284..d44bde3710 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -98,6 +112,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 1/4] ethdev: introduce protocol header API
  2022-09-02 19:10   ` [PATCH v3 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-12 11:24     ` Andrew Rybchenko
  2022-09-16  8:34       ` Wang, YuanX
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-09-12 11:24 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Wenxuan Wu

On 9/2/22 22:10, Yuan Wang wrote:
> Add a new ethdev API to retrieve supported protocol headers
> of a PMD, which helps to configure protocol header based buffer split.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>

Nit below. Other than that:
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 1979dc0850..093c577add 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5917,6 +5917,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
>   	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
>   }
>   
> +int
> +rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
> +{
> +	int i, j;
> +	struct rte_eth_dev *dev;
> +	const uint32_t *all_types;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (ptypes == NULL && num > 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Cannot get ethdev port %u supported header protocol types to NULL "
> +			"when array size is non zero\n",

Do not split log message across many lines. Too long line is a
less evil which is accepted by checkpatches.

> +			port_id);
> +		return -EINVAL;
> +	}

[snip]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-02 19:10   ` [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-12 11:47     ` Andrew Rybchenko
  2022-09-16  8:38       ` Wang, YuanX
  2022-09-13  7:56     ` Suanming Mou
  1 sibling, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-09-12 11:47 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Wenxuan Wu

On 9/2/22 22:10, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* supported ptype of a specific pmd,
>                                 configures split point.
> 			       It should be defined by RTE_PTYPE_*

If I understand correctly, the statement is a bit misleading
since it should be a bit mask of RTE_PTYPE_* defines. Not
exactly one RTE_PTYPE_*.

> 			     */
> };
> 
> If protocol header split can be supported by a PMD. The
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like

What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?

> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header.

You must define which mempool is used in the case.

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>   doc/guides/rel_notes/release_22_11.rst |  5 +++
>   lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
>   lib/ethdev/rte_ethdev.h                | 17 +++++++-
>   3 files changed, 65 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 4d90514a9a..f3b58c7895 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -60,6 +60,11 @@ New Features
>     Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>     header protocols of a PMD to split.
>   
> +* **Added protocol header based buffer split.**
> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split`` structure is
> +  replaced with ``proto_hdr`` to support protocol header based buffer split.
> +  User can choose length or protocol header to configure buffer split
> +  according to NIC's capability.

Add one more empty line to have two before the next sectoin.

>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 093c577add..dfceb723ee 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		int ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

Do not mix variable declaration and code.
It is better to give the variable some sensible name.
Otherwise else branch code is hard to read.

> +		if (ret <= 0) {

May be I'm missing something, but nothing prevetns a driver/HW
to support both protocol-based and fixed-length split.
So, ability to support protocol based split should be treated
as a request to do it. It must be based on rx_seg->proto_hdr
content (for all segments).

Also nothing should prevent to mix protocol and fixed-length
split. I.e. split just after UDP in the first segment,
40 bytes in the second segment, everything else in the third.

> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split after specified protocol header. */
> +			uint32_t ptypes[ret];
> +			int i;
> +
> +			ret = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);

In theory, the funciton could fail since input arguments
differ. So, it should be handled.

> +			for (i = 0; i < ret; i++)
> +				if (ptypes[i] & proto_hdr)

IMHO it should be ==, not &. I think that
rte_eth_buffer_split_get_supported_hdr_ptypes() should define
points at which split could happen and we should match the
point exactly.

> +					break;
> +
> +			if (i == ret) {
> +#define PTYPE_NAMESIZE	256

Why? It is looks really strange that it is defined here.

> +				char ptype_name[PTYPE_NAMESIZE];
> +				rte_get_ptype_name(proto_hdr, ptype_name, sizeof(ptype_name));
> +				RTE_ETHDEV_LOG(ERR,
> +					"Protocol header %s is not supported.\n",
> +					ptype_name);
> +				return -EINVAL;
> +			}
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c58c908c3a..410fba5eab 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Supported ptype of a specific pmd, configures split point.
> +	 * It should be defined by RTE_PTYPE_*.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-02 19:10   ` [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
  2022-09-12 11:47     ` Andrew Rybchenko
@ 2022-09-13  7:56     ` Suanming Mou
  2022-09-16  8:39       ` Wang, YuanX
  1 sibling, 1 reply; 72+ messages in thread
From: Suanming Mou @ 2022-09-13  7:56 UTC (permalink / raw)
  To: Yuan Wang, dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, Slava Ovsiienko, stephen, xuan.ding,
	hpothula, yaqi.tang, Wenxuan Wu

Hi

> -----Original Message-----
> From: Yuan Wang <yuanx.wang@intel.com>
> Sent: Saturday, September 3, 2022 3:10 AM
> To: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: mdr@ashroe.eu; xiaoyun.li@intel.com; aman.deep.singh@intel.com;
> yuying.zhang@intel.com; qi.z.zhang@intel.com; qiming.yang@intel.com;
> jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stephen@networkplumber.org; xuan.ding@intel.com; hpothula@marvell.com;
> yaqi.tang@intel.com; Yuan Wang <yuanx.wang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 

snip

> @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		int ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

One small question, since the  ptypes == NULL and num == 0, I assume ret will always be <=0, right?

> +		if (ret <= 0) {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split after specified protocol header. */
> +			uint32_t ptypes[ret];
> +			int i;
> +
> +			ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
> +			for (i = 0; i < ret; i++)
> +				if (ptypes[i] & proto_hdr)
> +					break;
> +

snip

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v3 1/4] ethdev: introduce protocol header API
  2022-09-12 11:24     ` Andrew Rybchenko
@ 2022-09-16  8:34       ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-09-16  8:34 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z, Yang,
	Qiming, jerinjacobk, viacheslavo, stephen, Ding, Xuan, hpothula,
	Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, September 12, 2022 7:25 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v3 1/4] ethdev: introduce protocol header API
> 
> On 9/2/22 22:10, Yuan Wang wrote:
> > Add a new ethdev API to retrieve supported protocol headers of a PMD,
> > which helps to configure protocol header based buffer split.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Nit below. Other than that:
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 1979dc0850..093c577add 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -5917,6 +5917,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE
> *file)
> >   	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev,
> file));
> >   }
> >
> > +int
> > +rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id,
> > +uint32_t *ptypes, int num) {
> > +	int i, j;
> > +	struct rte_eth_dev *dev;
> > +	const uint32_t *all_types;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	if (ptypes == NULL && num > 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Cannot get ethdev port %u supported header
> protocol types to NULL "
> > +			"when array size is non zero\n",
> 
> Do not split log message across many lines. Too long line is a less evil which is
> accepted by checkpatches.

Thanks, will update in v4.

Thanks,
Yuan

> 
> > +			port_id);
> > +		return -EINVAL;
> > +	}
> 
> [snip]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-12 11:47     ` Andrew Rybchenko
@ 2022-09-16  8:38       ` Wang, YuanX
  2022-09-20  5:35         ` Andrew Rybchenko
  0 siblings, 1 reply; 72+ messages in thread
From: Wang, YuanX @ 2022-09-16  8:38 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit
  Cc: mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	Yang, Qiming, jerinjacobk, viacheslavo, stephen, Ding, Xuan,
	hpothula, Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, September 12, 2022 7:47 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 9/2/22 22:10, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* supported ptype of a specific pmd,
> >                                 configures split point.
> > 			       It should be defined by RTE_PTYPE_*
> 
> If I understand correctly, the statement is a bit misleading since it should be a
> bit mask of RTE_PTYPE_* defines. Not exactly one RTE_PTYPE_*.

Do you mean that a segment should support multiple protocol headers, such as splitting both tcp and udp headers?

> 
> > 			     */
> > };
> >
> > If protocol header split can be supported by a PMD. The
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> 
> What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?

Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.

> 
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header.
> 
> You must define which mempool is used in the case.

IMHO I don't think we can define which mempool to use, it depends on NIC behavior.
For our NIC, packets that are unable to split will be put into the last valid pool, with zero offset. 
So here we would like to define to put these packets into the last valid mempool, with zero offset.

> 
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  5 +++
> >   lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
> >   lib/ethdev/rte_ethdev.h                | 17 +++++++-
> >   3 files changed, 65 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 4d90514a9a..f3b58c7895 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -60,6 +60,11 @@ New Features
> >     Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >     header protocols of a PMD to split.
> >
> > +* **Added protocol header based buffer split.**
> > +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
> > +structure is
> > +  replaced with ``proto_hdr`` to support protocol header based buffer split.
> > +  User can choose length or protocol header to configure buffer split
> > +  according to NIC's capability.
> 
> Add one more empty line to have two before the next sectoin.

Thanks for your catch.

> 
> >
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 093c577add..dfceb723ee 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +			const struct rte_eth_rxseg_split *rx_seg,
> > +			uint16_t n_seg, uint32_t *mbp_buf_size,
> > +			const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> > @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1693,13
> > +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		int ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +NULL, 0);
> 
> Do not mix variable declaration and code.
> It is better to give the variable some sensible name.
> Otherwise else branch code is hard to read.

Thanks for the suggestion, will  take care of naming.

> 
> > +		if (ret <= 0) {
> 
> May be I'm missing something, but nothing prevetns a driver/HW to support
> both protocol-based and fixed-length split.
> So, ability to support protocol based split should be treated as a request to
> do it. It must be based on rx_seg->proto_hdr content (for all segments).
> 
> Also nothing should prevent to mix protocol and fixed-length split. I.e. split
> just after UDP in the first segment,
> 40 bytes in the second segment, everything else in the third.

Mix mode is an interesting idea. Currently testpmd and driver do not support mixed mode, but it does not affect the library to support this mode.

> 
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split after specified protocol header. */
> > +			uint32_t ptypes[ret];
> > +			int i;
> > +
> > +			ret =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +ptypes, ret);
> 
> In theory, the funciton could fail since input arguments differ. So, it should be
> handled.

Thanks for your catch, will fix in the next version.

> 
> > +			for (i = 0; i < ret; i++)
> > +				if (ptypes[i] & proto_hdr)
> 
> IMHO it should be ==, not &. I think that
> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points at
> which split could happen and we should match the point exactly.

Sure, == is better. Thanks for the suggestion.

> 
> > +					break;
> > +
> > +			if (i == ret) {
> > +#define PTYPE_NAMESIZE	256
> 
> Why? It is looks really strange that it is defined here.

I intend to display the protocol name in the log, but if the proto_hdr is a bit mask, can I just show the number?
Please see v4 for this modification.

> 
> > +				char ptype_name[PTYPE_NAMESIZE];
> > +				rte_get_ptype_name(proto_hdr,
> ptype_name, sizeof(ptype_name));
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Protocol header %s is not
> supported.\n",
> > +					ptype_name);
> > +				return -EINVAL;
> > +			}
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c58c908c3a..410fba5eab 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field will be ignored.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field will be ignored.
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**
> > +	 * Supported ptype of a specific pmd, configures split point.
> > +	 * It should be defined by RTE_PTYPE_*.
> > +	 */
> > +	uint32_t proto_hdr;
> >   };
> >
> >   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-13  7:56     ` Suanming Mou
@ 2022-09-16  8:39       ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-09-16  8:39 UTC (permalink / raw)
  To: Suanming Mou, dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	Yang, Qiming, jerinjacobk, Slava Ovsiienko, stephen, Ding, Xuan,
	hpothula, Tang, Yaqi, Wenxuan Wu

Hi 

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Tuesday, September 13, 2022 3:57 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> Hi
> 
> > -----Original Message-----
> > From: Yuan Wang <yuanx.wang@intel.com>
> > Sent: Saturday, September 3, 2022 3:10 AM
> > To: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> > <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>; Andrew
> > Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Cc: mdr@ashroe.eu; xiaoyun.li@intel.com; aman.deep.singh@intel.com;
> > yuying.zhang@intel.com; qi.z.zhang@intel.com; qiming.yang@intel.com;
> > jerinjacobk@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>;
> > stephen@networkplumber.org; xuan.ding@intel.com;
> hpothula@marvell.com;
> > yaqi.tang@intel.com; Yuan Wang <yuanx.wang@intel.com>; Wenxuan Wu
> > <wenxuanx.wu@intel.com>
> > Subject: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer
> > split
> >
> 
> snip
> 
> > @@ -1693,13 +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> >  		}
> >  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		int ret =
> > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> 
> One small question, since the  ptypes == NULL and num == 0, I assume ret
> will always be <=0, right?

The usage of rte_eth_buffer_split_get_supported_hdr_ptypes is the same as rte_eth_dev_get_supported_ptypes.
 In this scenario, the function returns the total number of supported ptypes, or an error code less than 0.

> 
> > +		if (ret <= 0) {
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split after specified protocol header. */
> > +			uint32_t ptypes[ret];
> > +			int i;
> > +
> > +			ret =
> > rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, ptypes, ret);
> > +			for (i = 0; i < ret; i++)
> > +				if (ptypes[i] & proto_hdr)
> > +					break;
> > +
> 
> snip

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-16  8:38       ` Wang, YuanX
@ 2022-09-20  5:35         ` Andrew Rybchenko
  2022-09-22  3:13           ` Wang, YuanX
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-09-20  5:35 UTC (permalink / raw)
  To: Wang, YuanX, dev, Thomas Monjalon, Ferruh Yigit
  Cc: mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	Yang, Qiming, jerinjacobk, viacheslavo, stephen, Ding, Xuan,
	hpothula, Tang, Yaqi, Wenxuan Wu

On 9/16/22 11:38, Wang, YuanX wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, September 12, 2022 7:47 PM
>> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
>> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
>> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
>> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
>> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
>> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
>> <wenxuanx.wu@intel.com>
>> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
>>
>> On 9/2/22 22:10, Yuan Wang wrote:
>>> Currently, Rx buffer split supports length based split. With Rx queue
>>> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
>> segment
>>> configured, PMD will be able to split the received packets into
>>> multiple segments.
>>>
>>> However, length based buffer split is not suitable for NICs that do
>>> split based on protocol headers. Given an arbitrarily variable length
>>> in Rx packet segment, it is almost impossible to pass a fixed protocol
>>> header to driver. Besides, the existence of tunneling results in the
>>> composition of a packet is various, which makes the situation even worse.
>>>
>>> This patch extends current buffer split to support protocol header
>>> based buffer split. A new proto_hdr field is introduced in the
>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>> header. The proto_hdr field defines the split position of packet,
>>> splitting will always happens after the protocol header defined in the
>>> Rx packet segment. When Rx queue offload
>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
>> protocol
>>> header is configured, driver will split the ingress packets into multiple
>> segments.
>>>
>>> struct rte_eth_rxseg_split {
>>>           struct rte_mempool *mp; /* memory pools to allocate segment from
>> */
>>>           uint16_t length; /* segment maximal data length,
>>>                               configures split point */
>>>           uint16_t offset; /* data offset from beginning
>>>                               of mbuf data buffer */
>>>           uint32_t proto_hdr; /* supported ptype of a specific pmd,
>>>                                  configures split point.
>>> 			       It should be defined by RTE_PTYPE_*
>>
>> If I understand correctly, the statement is a bit misleading since it should be a
>> bit mask of RTE_PTYPE_* defines. Not exactly one RTE_PTYPE_*.
> 
> Do you mean that a segment should support multiple protocol headers, such as splitting both tcp and udp headers?

No-no. Look. In order to split after some protocol, for example
UDP, NIC should recognice all previous protocols. Moreover, in
the case of a tunenl UDP could be inner and outer. At which
point would you like to split if NIC supports both?
Another example, if NIC support stplit after Eth-IPv4-UDP and
after Eth-IPv6-UDP, how to request to split just after
Eth-IPv4-UDP, but not Eth-IPv6-UDP?

> 
>>
>>> 			     */
>>> };
>>>
>>> If protocol header split can be supported by a PMD. The
>>> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
>>> obtain a list of these protocol headers.
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>           seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>           seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>           seg2 - pool2, off1=0B
>>>
>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>
>> What is MAC_IPV4_UDP_PAYLOAD? Do you mean ETH_IPV4_UDP_PAYLOAD?
> 
> Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.
> 
>>
>>> following:
>>>           seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>> pool0
>>>           seg1 - udp header @ 128 in mbuf from pool1
>>>           seg2 - payload @ 0 in mbuf from pool2
>>>
>>> Note: NIC will only do split when the packets exactly match all the
>>> protocol headers in the segments. For example, if ARP packets received
>>> with above config, the NIC won't do split for ARP packets since it
>>> does not contains ipv4 header and udp header.
>>
>> You must define which mempool is used in the case.
> 
> IMHO I don't think we can define which mempool to use, it depends on NIC behavior.
> For our NIC, packets that are unable to split will be put into the last valid pool, with zero offset.
> So here we would like to define to put these packets into the last valid mempool, with zero offset.


Anyway API should not be silent about the case since it is the
first question at least in my head. IMHO the last segment is
the only sensible option since it is typically will be big
enough. Other mempool for protocol headers are likely to be
small. So, I suggest to define this way.

>>
>>>
>>> Now buffer split can be configured in two modes. For length based
>>> buffer split, the mp, length, offset field in Rx packet segment should
>>> be configured, while the proto_hdr field will be ignored.
>>> For protocol header based buffer split, the mp, offset, proto_hdr
>>> field in Rx packet segment should be configured, while the length
>>> field will be ignored.
>>>
>>> The split limitations imposed by underlying driver is reported in the
>>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
>>> split parts may differ either, dpdk memory and external memory,
>> respectively.
>>>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> ---
>>>    doc/guides/rel_notes/release_22_11.rst |  5 +++
>>>    lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
>>>    lib/ethdev/rte_ethdev.h                | 17 +++++++-
>>>    3 files changed, 65 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
>>> b/doc/guides/rel_notes/release_22_11.rst
>>> index 4d90514a9a..f3b58c7895 100644
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -60,6 +60,11 @@ New Features
>>>      Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
>> supported
>>>      header protocols of a PMD to split.
>>>
>>> +* **Added protocol header based buffer split.**
>>> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
>>> +structure is
>>> +  replaced with ``proto_hdr`` to support protocol header based buffer split.
>>> +  User can choose length or protocol header to configure buffer split
>>> +  according to NIC's capability.
>>
>> Add one more empty line to have two before the next sectoin.
> 
> Thanks for your catch.
> 
>>
>>>
>>>    Removed Items
>>>    -------------
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 093c577add..dfceb723ee 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>>>    }
>>>
>>>    static int
>>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>>> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
>>> -			     const struct rte_eth_dev_info *dev_info)
>>> +rte_eth_rx_queue_check_split(uint16_t port_id,
>>> +			const struct rte_eth_rxseg_split *rx_seg,
>>> +			uint16_t n_seg, uint32_t *mbp_buf_size,
>>> +			const struct rte_eth_dev_info *dev_info)
>>>    {
>>>    	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
>>> rx_seg_capa;
>>>    	struct rte_mempool *mp_first;
>>> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>    		uint32_t length = rx_seg[seg_idx].length;
>>>    		uint32_t offset = rx_seg[seg_idx].offset;
>>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>>>
>>>    		if (mpl == NULL) {
>>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1693,13
>>> +1695,44 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		}
>>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>> -		length = length != 0 ? length : *mbp_buf_size;
>>> -		if (*mbp_buf_size < length + offset) {
>>> -			RTE_ETHDEV_LOG(ERR,
>>> -				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> -				       mpl->name, *mbp_buf_size,
>>> -				       length + offset, length, offset);
>>> -			return -EINVAL;
>>> +
>>> +		int ret =
>> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
>>> +NULL, 0);
>>
>> Do not mix variable declaration and code.
>> It is better to give the variable some sensible name.
>> Otherwise else branch code is hard to read.
> 
> Thanks for the suggestion, will  take care of naming.
> 
>>
>>> +		if (ret <= 0) {
>>
>> May be I'm missing something, but nothing prevetns a driver/HW to support
>> both protocol-based and fixed-length split.
>> So, ability to support protocol based split should be treated as a request to
>> do it. It must be based on rx_seg->proto_hdr content (for all segments).
>>
>> Also nothing should prevent to mix protocol and fixed-length split. I.e. split
>> just after UDP in the first segment,
>> 40 bytes in the second segment, everything else in the third.
> 
> Mix mode is an interesting idea. Currently testpmd and driver do not support mixed mode, but it does not affect the library to support this mode.

That's OK.

> 
>>
>>> +			/* Split at fixed length. */
>>> +			length = length != 0 ? length : *mbp_buf_size;
>>> +			if (*mbp_buf_size < length + offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					length + offset, length, offset);
>>> +				return -EINVAL;
>>> +			}
>>> +		} else {
>>> +			/* Split after specified protocol header. */
>>> +			uint32_t ptypes[ret];
>>> +			int i;
>>> +
>>> +			ret =
>> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
>>> +ptypes, ret);
>>
>> In theory, the funciton could fail since input arguments differ. So, it should be
>> handled.
> 
> Thanks for your catch, will fix in the next version.
> 
>>
>>> +			for (i = 0; i < ret; i++)
>>> +				if (ptypes[i] & proto_hdr)
>>
>> IMHO it should be ==, not &. I think that
>> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points at
>> which split could happen and we should match the point exactly.
> 
> Sure, == is better. Thanks for the suggestion.
> 
>>
>>> +					break;
>>> +
>>> +			if (i == ret) {
>>> +#define PTYPE_NAMESIZE	256
>>
>> Why? It is looks really strange that it is defined here.
> 
> I intend to display the protocol name in the log, but if the proto_hdr is a bit mask, can I just show the number?
> Please see v4 for this modification.
> 
>>
>>> +				char ptype_name[PTYPE_NAMESIZE];
>>> +				rte_get_ptype_name(proto_hdr,
>> ptype_name, sizeof(ptype_name));
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"Protocol header %s is not
>> supported.\n",
>>> +					ptype_name);
>>> +				return -EINVAL;
>>> +			}
>>> +			if (*mbp_buf_size < offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +						"%s
>> mbuf_data_room_size %u < %u segment offset)\n",
>>> +						mpl->name, *mbp_buf_size,
>>> +						offset);
>>> +				return -EINVAL;
>>> +			}
>>>    		}
>>>    	}
>>>    	return 0;
>>> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		n_seg = rx_conf->rx_nseg;
>>>
>>>    		if (rx_conf->offloads &
>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
>>> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>>> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
>> n_seg,
>>>    							   &mbp_buf_size,
>>>    							   &dev_info);
>>>    			if (ret != 0)
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> c58c908c3a..410fba5eab 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
>>>     *   specified in the first array element, the second buffer, from the
>>>     *   pool in the second element, and so on.
>>>     *
>>> + * - The proto_hdrs in the elements define the split position of
>>> + *   received packets.
>>> + *
>>>     * - The offsets from the segment description elements specify
>>>     *   the data offset from the buffer beginning except the first mbuf.
>>>     *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>>> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
>>>     *     - pool from the last valid element
>>>     *     - the buffer size from this pool
>>>     *     - zero offset
>>> + *
>>> + * - Length based buffer split:
>>> + *     - mp, length, offset should be configured.
>>> + *     - The proto_hdr field will be ignored.
>>> + *
>>> + * - Protocol header based buffer split:
>>> + *     - mp, offset, proto_hdr should be configured.
>>> + *     - The length field will be ignored.
>>>     */
>>>    struct rte_eth_rxseg_split {
>>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
>> from. */
>>>    	uint16_t length; /**< Segment data length, configures split point. */
>>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
>> */
>>> -	uint32_t reserved; /**< Reserved field. */
>>> +	/**
>>> +	 * Supported ptype of a specific pmd, configures split point.
>>> +	 * It should be defined by RTE_PTYPE_*.
>>> +	 */
>>> +	uint32_t proto_hdr;
>>>    };
>>>
>>>    /**
> 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (9 preceding siblings ...)
  2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-20 11:12 ` Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 more replies)
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
                   ` (4 subsequent siblings)
  15 siblings, 4 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-20 11:12 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 124 +++++++++++++-
 app/test-pmd/config.c                  |  70 ++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  16 ++
 drivers/net/ice/ice_ethdev.c           |  32 +++-
 drivers/net/ice/ice_rxtx.c             | 215 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                | 106 ++++++++++--
 lib/ethdev/rte_ethdev.h                |  49 +++++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 625 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 1/4] ethdev: introduce protocol header API
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-20 11:12   ` Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-20 11:12 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 32 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 24 +++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 79 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..4d90514a9a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+  header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 5101868ea7..f64ceb9907 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1054,6 +1054,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1301,6 +1313,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..38689387a5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5917,6 +5917,38 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get, -ENOTSUP);
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..c58c908c3a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6206,6 +6206,30 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..e496c8d938 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-20 11:12   ` Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-20 11:12 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* supported ptype of a specific pmd,
                               configures split point.
			       It should be a bit mask of RTE_PTYPE_* defines
			     */
};

If protocol header split can be supported by a PMD. The
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 +++
 lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 25 ++++++++-
 3 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4d90514a9a..600b65221a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -60,6 +60,13 @@ New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+
+  Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 38689387a5..c3bdcdd89b 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1693,13 +1695,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+
+			/* skip the payload */
+			if (proto_hdr == RTE_PTYPE_ALL_MASK)
+				continue;
+
+			int ptype_cnt;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			uint32_t ptypes[ptype_cnt];
+			int i;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1830,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c58c908c3a..28cc520321 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1196,12 +1199,32 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Supported ptype of a specific pmd, configures split point.
+	 * It should be a bit mask of RTE_PTYPE_* defines.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-20 11:12   ` Yuan Wang
  2022-09-20 11:12   ` [PATCH v4 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-20 11:12 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,ipv4,udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,ipv4
        (default protocols of testpmd : eth|ipv4|ipv6|tcp|udp|sctp|
                              inner_eth|inner_ipv4|inner_ipv6|
                              inner_tcp|inner_udp|inner_sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 124 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  70 +++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 214 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4fe9dfb17..17392f1d2d 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -307,6 +307,15 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,ipv4])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|ipv4|ipv6|tcp|udp|sctp|"
+			"inner_eth|inner_ipv4|inner_ipv6|inner_tcp|inner_udp|"
+			"inner_sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3456,6 +3465,68 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_eth"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3825,6 +3896,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6918,6 +7033,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6930,12 +7047,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -14166,6 +14283,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a2939867c4..7380ba9c25 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4940,6 +4940,76 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "eth";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_eth";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..87aa0a6589 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -164,6 +164,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -676,6 +677,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1331,7 +1333,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1341,6 +1342,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..157db60cd4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2685,6 +2686,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..de992c9416 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,6 +562,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -892,6 +893,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 int init_cmdline(void);
@@ -1049,6 +1053,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v4 4/4] net/ice: support buffer split in Rx path
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-09-20 11:12   ` [PATCH v4 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-09-20 11:12   ` Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-20 11:12 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  32 +++-
 drivers/net/ice/ice_rxtx.c             | 215 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 238 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 600b65221a..e254e7bed7 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -67,6 +67,10 @@ New Features
   User can choose length or protocol header to configure buffer split
   according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index b2300790ae..633317985e 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -169,6 +169,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -280,6 +281,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3749,7 +3751,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3761,7 +3764,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3830,6 +3833,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5886,6 +5894,26 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_INNER_L3_IPV4,
+		RTE_PTYPE_INNER_L3_IPV6,
+		RTE_PTYPE_INNER_L3_IPV4 | RTE_PTYPE_INNER_L3_IPV6,
+		RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_TCP | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_SCTP,
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bfb3a16ae2..481b60fa06 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,42 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		if (rxq->rxseg[0].proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+		if (rxq->rxseg[0].proto_hdr & RTE_PTYPE_L2_ETHER) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+		} else if (rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L2_ETHER) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+		} else if (rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L3_IPV4 ||
+			rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L3_IPV6) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+		} else if (rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L4_TCP ||
+			rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L4_UDP) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+		} else if (rxq->rxseg[0].proto_hdr & RTE_PTYPE_INNER_L4_SCTP) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+		} else {
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +431,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +439,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +446,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +495,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +794,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1128,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	const struct rte_eth_rxseg_split *rx_seg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1141,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1161,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
+		rte_memcpy(rxq->rxseg, rx_seg, sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1570,7 +1643,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1625,6 +1698,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1716,7 +1810,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1729,6 +1825,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1737,13 +1842,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2352,11 +2465,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2384,12 +2499,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2402,24 +2521,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f5337d5284..d44bde3710 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -98,6 +112,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-20  5:35         ` Andrew Rybchenko
@ 2022-09-22  3:13           ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-09-22  3:13 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit
  Cc: mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	Yang, Qiming, jerinjacobk, viacheslavo, stephen, Ding, Xuan,
	hpothula, Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, September 20, 2022 1:35 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 9/16/22 11:38, Wang, YuanX wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, September 12, 2022 7:47 PM
> >> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@xilinx.com>
> >> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman
> >> Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
> >> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
> >> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> >> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> >> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> >> Subject: Re: [PATCH v3 2/4] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> On 9/2/22 22:10, Yuan Wang wrote:
> >>> Currently, Rx buffer split supports length based split. With Rx
> >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> packet
> >> segment
> >>> configured, PMD will be able to split the received packets into
> >>> multiple segments.
> >>>
> >>> However, length based buffer split is not suitable for NICs that do
> >>> split based on protocol headers. Given an arbitrarily variable
> >>> length in Rx packet segment, it is almost impossible to pass a fixed
> >>> protocol header to driver. Besides, the existence of tunneling
> >>> results in the composition of a packet is various, which makes the
> situation even worse.
> >>>
> >>> This patch extends current buffer split to support protocol header
> >>> based buffer split. A new proto_hdr field is introduced in the
> >>> reserved field of rte_eth_rxseg_split structure to specify protocol
> >>> header. The proto_hdr field defines the split position of packet,
> >>> splitting will always happens after the protocol header defined in
> >>> the Rx packet segment. When Rx queue offload
> >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >> protocol
> >>> header is configured, driver will split the ingress packets into
> >>> multiple
> >> segments.
> >>>
> >>> struct rte_eth_rxseg_split {
> >>>           struct rte_mempool *mp; /* memory pools to allocate
> >>> segment from
> >> */
> >>>           uint16_t length; /* segment maximal data length,
> >>>                               configures split point */
> >>>           uint16_t offset; /* data offset from beginning
> >>>                               of mbuf data buffer */
> >>>           uint32_t proto_hdr; /* supported ptype of a specific pmd,
> >>>                                  configures split point.
> >>> 			       It should be defined by RTE_PTYPE_*
> >>
> >> If I understand correctly, the statement is a bit misleading since it
> >> should be a bit mask of RTE_PTYPE_* defines. Not exactly one
> RTE_PTYPE_*.
> >
> > Do you mean that a segment should support multiple protocol headers,
> such as splitting both tcp and udp headers?
> 
> No-no. Look. In order to split after some protocol, for example UDP, NIC
> should recognice all previous protocols. Moreover, in the case of a tunenl
> UDP could be inner and outer. At which point would you like to split if NIC
> supports both?
> Another example, if NIC support stplit after Eth-IPv4-UDP and after Eth-IPv6-
> UDP, how to request to split just after Eth-IPv4-UDP, but not Eth-IPv6-UDP?

Thank you for your patience. 

We have a proposal to solve this problem. 
We define the proto_hdr as a bit mask. Each mask should contain the composition of the packet, but inner and outer are written separately (to avoid unnecessary trouble).
We use the highest RTE_PTYPE* in the mask to define the split position.

For the first example, since ptype has distinguished between outer and inner, external UDP can be simply written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.
The inner UDP can be simply written as RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP.

 For the second example, Eth-IPv4-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.
 Eth-IPv6-UDP can be written as RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP.

What do you think?

Thanks,
Yuan

> 
> >
> >>
> >>> 			     */
> >>> };
> >>>
> >>> If protocol header split can be supported by a PMD. The
> >>> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use
> to
> >>> obtain a list of these protocol headers.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>           seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>           seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>           seg2 - pool2, off1=0B
> >>>
> >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>
> >> What is MAC_IPV4_UDP_PAYLOAD? Do you mean
> ETH_IPV4_UDP_PAYLOAD?
> >
> > Thanks for your correction, it should be ETH_IPV4_UDP_PAYLOAD.
> >
> >>
> >>> following:
> >>>           seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> >> pool0
> >>>           seg1 - udp header @ 128 in mbuf from pool1
> >>>           seg2 - payload @ 0 in mbuf from pool2
> >>>
> >>> Note: NIC will only do split when the packets exactly match all the
> >>> protocol headers in the segments. For example, if ARP packets
> >>> received with above config, the NIC won't do split for ARP packets
> >>> since it does not contains ipv4 header and udp header.
> >>
> >> You must define which mempool is used in the case.
> >
> > IMHO I don't think we can define which mempool to use, it depends on NIC
> behavior.
> > For our NIC, packets that are unable to split will be put into the last valid
> pool, with zero offset.
> > So here we would like to define to put these packets into the last valid
> mempool, with zero offset.
> 
> 
> Anyway API should not be silent about the case since it is the first question at
> least in my head. IMHO the last segment is the only sensible option since it is
> typically will be big enough. Other mempool for protocol headers are likely to
> be small. So, I suggest to define this way.
> 
> >>
> >>>
> >>> Now buffer split can be configured in two modes. For length based
> >>> buffer split, the mp, length, offset field in Rx packet segment
> >>> should be configured, while the proto_hdr field will be ignored.
> >>> For protocol header based buffer split, the mp, offset, proto_hdr
> >>> field in Rx packet segment should be configured, while the length
> >>> field will be ignored.
> >>>
> >>> The split limitations imposed by underlying driver is reported in
> >>> the rte_eth_dev_info->rx_seg_capa field. The memory attributes for
> >>> the split parts may differ either, dpdk memory and external memory,
> >> respectively.
> >>>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> ---
> >>>    doc/guides/rel_notes/release_22_11.rst |  5 +++
> >>>    lib/ethdev/rte_ethdev.c                | 55 ++++++++++++++++++++------
> >>>    lib/ethdev/rte_ethdev.h                | 17 +++++++-
> >>>    3 files changed, 65 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/doc/guides/rel_notes/release_22_11.rst
> >>> b/doc/guides/rel_notes/release_22_11.rst
> >>> index 4d90514a9a..f3b58c7895 100644
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -60,6 +60,11 @@ New Features
> >>>      Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to
> >>> get
> >> supported
> >>>      header protocols of a PMD to split.
> >>>
> >>> +* **Added protocol header based buffer split.**
> >>> +  Ethdev: The ``reserved`` field in the  ``rte_eth_rxseg_split``
> >>> +structure is
> >>> +  replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> >>> +  User can choose length or protocol header to configure buffer
> >>> +split
> >>> +  according to NIC's capability.
> >>
> >> Add one more empty line to have two before the next sectoin.
> >
> > Thanks for your catch.
> >
> >>
> >>>
> >>>    Removed Items
> >>>    -------------
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 093c577add..dfceb723ee 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1635,9 +1635,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >>>    }
> >>>
> >>>    static int
> >>> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> *rx_seg,
> >>> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> >>> -			     const struct rte_eth_dev_info *dev_info)
> >>> +rte_eth_rx_queue_check_split(uint16_t port_id,
> >>> +			const struct rte_eth_rxseg_split *rx_seg,
> >>> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> >>> +			const struct rte_eth_dev_info *dev_info)
> >>>    {
> >>>    	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >>> rx_seg_capa;
> >>>    	struct rte_mempool *mp_first;
> >>> @@ -1660,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>    		uint32_t length = rx_seg[seg_idx].length;
> >>>    		uint32_t offset = rx_seg[seg_idx].offset;
> >>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >>>
> >>>    		if (mpl == NULL) {
> >>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1693,13
> >>> +1695,44 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		}
> >>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>> -		length = length != 0 ? length : *mbp_buf_size;
> >>> -		if (*mbp_buf_size < length + offset) {
> >>> -			RTE_ETHDEV_LOG(ERR,
> >>> -				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> -				       mpl->name, *mbp_buf_size,
> >>> -				       length + offset, length, offset);
> >>> -			return -EINVAL;
> >>> +
> >>> +		int ret =
> >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> >>> +NULL, 0);
> >>
> >> Do not mix variable declaration and code.
> >> It is better to give the variable some sensible name.
> >> Otherwise else branch code is hard to read.
> >
> > Thanks for the suggestion, will  take care of naming.
> >
> >>
> >>> +		if (ret <= 0) {
> >>
> >> May be I'm missing something, but nothing prevetns a driver/HW to
> >> support both protocol-based and fixed-length split.
> >> So, ability to support protocol based split should be treated as a
> >> request to do it. It must be based on rx_seg->proto_hdr content (for all
> segments).
> >>
> >> Also nothing should prevent to mix protocol and fixed-length split.
> >> I.e. split just after UDP in the first segment,
> >> 40 bytes in the second segment, everything else in the third.
> >
> > Mix mode is an interesting idea. Currently testpmd and driver do not
> support mixed mode, but it does not affect the library to support this mode.
> 
> That's OK.
> 
> >
> >>
> >>> +			/* Split at fixed length. */
> >>> +			length = length != 0 ? length : *mbp_buf_size;
> >>> +			if (*mbp_buf_size < length + offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					length + offset, length, offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +		} else {
> >>> +			/* Split after specified protocol header. */
> >>> +			uint32_t ptypes[ret];
> >>> +			int i;
> >>> +
> >>> +			ret =
> >> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> >>> +ptypes, ret);
> >>
> >> In theory, the funciton could fail since input arguments differ. So,
> >> it should be handled.
> >
> > Thanks for your catch, will fix in the next version.
> >
> >>
> >>> +			for (i = 0; i < ret; i++)
> >>> +				if (ptypes[i] & proto_hdr)
> >>
> >> IMHO it should be ==, not &. I think that
> >> rte_eth_buffer_split_get_supported_hdr_ptypes() should define points
> >> at which split could happen and we should match the point exactly.
> >
> > Sure, == is better. Thanks for the suggestion.
> >
> >>
> >>> +					break;
> >>> +
> >>> +			if (i == ret) {
> >>> +#define PTYPE_NAMESIZE	256
> >>
> >> Why? It is looks really strange that it is defined here.
> >
> > I intend to display the protocol name in the log, but if the proto_hdr is a bit
> mask, can I just show the number?
> > Please see v4 for this modification.
> >
> >>
> >>> +				char ptype_name[PTYPE_NAMESIZE];
> >>> +				rte_get_ptype_name(proto_hdr,
> >> ptype_name, sizeof(ptype_name));
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"Protocol header %s is not
> >> supported.\n",
> >>> +					ptype_name);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +			if (*mbp_buf_size < offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +						"%s
> >> mbuf_data_room_size %u < %u segment offset)\n",
> >>> +						mpl->name, *mbp_buf_size,
> >>> +						offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>>    		}
> >>>    	}
> >>>    	return 0;
> >>> @@ -1778,7 +1811,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		n_seg = rx_conf->rx_nseg;
> >>>
> >>>    		if (rx_conf->offloads &
> >> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> >>> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >>> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> >> n_seg,
> >>>    							   &mbp_buf_size,
> >>>    							   &dev_info);
> >>>    			if (ret != 0)
> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> c58c908c3a..410fba5eab 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -1175,6 +1175,9 @@ struct rte_eth_txmode {
> >>>     *   specified in the first array element, the second buffer, from the
> >>>     *   pool in the second element, and so on.
> >>>     *
> >>> + * - The proto_hdrs in the elements define the split position of
> >>> + *   received packets.
> >>> + *
> >>>     * - The offsets from the segment description elements specify
> >>>     *   the data offset from the buffer beginning except the first mbuf.
> >>>     *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >>> @@ -1196,12 +1199,24 @@ struct rte_eth_txmode {
> >>>     *     - pool from the last valid element
> >>>     *     - the buffer size from this pool
> >>>     *     - zero offset
> >>> + *
> >>> + * - Length based buffer split:
> >>> + *     - mp, length, offset should be configured.
> >>> + *     - The proto_hdr field will be ignored.
> >>> + *
> >>> + * - Protocol header based buffer split:
> >>> + *     - mp, offset, proto_hdr should be configured.
> >>> + *     - The length field will be ignored.
> >>>     */
> >>>    struct rte_eth_rxseg_split {
> >>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
> >> from. */
> >>>    	uint16_t length; /**< Segment data length, configures split point. */
> >>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> >> */
> >>> -	uint32_t reserved; /**< Reserved field. */
> >>> +	/**
> >>> +	 * Supported ptype of a specific pmd, configures split point.
> >>> +	 * It should be defined by RTE_PTYPE_*.
> >>> +	 */
> >>> +	uint32_t proto_hdr;
> >>>    };
> >>>
> >>>    /**
> >


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (10 preceding siblings ...)
  2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-26  9:40 ` Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 more replies)
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
                   ` (3 subsequent siblings)
  15 siblings, 4 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-26  9:40 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v5:
Define proto_hdr to use mask instead of single protocol type.
Define PMD to return protocol header mask.
Refine the doc and commit log.
Remove deprecated RTE_FUNC_PTR_OR_ERR_RET.

v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 146 ++++++++++++++++-
 app/test-pmd/config.c                  |  88 ++++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  16 ++
 drivers/net/ice/ice_ethdev.c           |  55 ++++++-
 drivers/net/ice/ice_rxtx.c             | 218 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                | 107 ++++++++++--
 lib/ethdev/rte_ethdev.h                |  59 ++++++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 702 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 1/4] ethdev: introduce protocol header API
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-26  9:40   ` Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-26  9:40 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 86 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 235ac9bf94..8e5bdde46a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -59,6 +59,11 @@ New Features
 
   * Added support to set device link down/up.
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+  header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 8cd8eb8685..791b264610 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1055,6 +1055,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1302,6 +1314,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 0c2c1088c0..1f0a7f8f3f 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
+		return -ENOTSUP;
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 45d17ddd13..c440e3863a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5924,6 +5924,36 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * When a packet type is announced to be split, it *must* be supported by
+ * the PMD. For instance, if eth-ipv4, eth-ipv4-udp is announced, the PMD must
+ * return the following packet types for these packets:
+ * - Ether/IPv4             -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
+ * - Ether/IPv4/UDP         -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 03f52fee91..e496c8d938 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-26  9:40   ` Yuan Wang
  2022-09-28 15:42     ` Wang, YuanX
  2022-09-26  9:40   ` [PATCH v5 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-09-26  9:40 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, xiaoyun.li, aman.deep.singh, yuying.zhang, qi.z.zhang,
	qiming.yang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        /**
	 * Proto_hdr defines a bit mask of the protocol sequence as
         * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
         * in the mask indicates the split position.
	 * For non-tunneling packets, the complete protocol sequence
         * should be defined.
	 * For tunneling packets, for simplicity, only the tunnel and
         * inner protocol sequence should be defined.
	 */
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
               | RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 +++
 lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 29 +++++++++-
 3 files changed, 98 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8e5bdde46a..cce1f6e50c 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -64,6 +64,13 @@ New Features
   Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
   header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+
+  Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1f0a7f8f3f..27ec19faed 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+
+			/* skip the payload */
+			if (proto_hdr == RTE_PTYPE_ALL_MASK)
+				continue;
+
+			int ptype_cnt;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			uint32_t ptypes[ptype_cnt];
+			int i;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c440e3863a..ba7c11f735 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 * For non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner
+	 * protocol sequence should be defined.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-26  9:40   ` Yuan Wang
  2022-09-26  9:40   ` [PATCH v5 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-26  9:40 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth,[eth-ipv4,eth-ipv4-udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,eth-ipv4,eth-ipv4-udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
        (default protocols of testpmd : eth|eth-ipv4|eth-ipv6|
         eth-ipv4-tcp|eth-ipv6-tcp|eth-ipv4-udp|eth-ipv6-udp|
         eth-ipv4-sctp|eth-ipv6-sctp|grenat-eth|grenat-eth-ipv4|
         grenat-eth-ipv6|grenat-eth-ipv4-tcp|grenat-eth-ipv6-tcp|
         grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|grenat-eth-ipv4-sctp|
         grenat-eth-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 146 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  88 +++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ba749f588a..00c7d167ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -181,7 +181,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -305,6 +305,17 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,eth-ipv4])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|eth-ipv4|eth-ipv6|eth-ipv4-tcp|eth-ipv6-tcp|"
+			"eth-ipv4-udp|eth-ipv6-udp|eth-ipv4-sctp|eth-ipv6-sctp|"
+			"grenat-eth|grenat-eth-ipv4|grenat-eth-ipv6|grenat-eth-ipv4-tcp|"
+			"grenat-eth-ipv6-tcp|grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|"
+			"grenat-eth-ipv4-sctp|grenat-eth-ipv6-sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3366,6 +3377,88 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "eth-ipv4"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv6"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv4-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv6-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "grenat-eth-ipv4"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv6"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv4-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv6-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3735,6 +3828,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,eth-ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6487,6 +6624,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6499,12 +6638,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -12455,6 +12594,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 568b0881d4..d3e95e40da 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4746,6 +4746,94 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "eth";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		return "eth-ipv4";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+		return "eth-ipv6";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv4-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv6-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv4-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv6-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv4-sctp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv6-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER:
+		return "grenat-eth";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		return "grenat-eth-ipv4";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+		return "grenat-eth-ipv6";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv4-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv6-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv4-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv6-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv4-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv6-sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1024b5419c..5bf4219c46 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -152,6 +152,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -660,6 +661,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1254,7 +1256,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1264,6 +1265,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index de6ad00138..bb2a969559 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2652,6 +2653,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 21c5632aec..0e5e94423a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -554,6 +554,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -825,6 +826,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
 void cmdline_read_from_file(const char *filename);
@@ -974,6 +978,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 4/4] net/ice: support buffer split in Rx path
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-09-26  9:40   ` [PATCH v5 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-09-26  9:40   ` Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-26  9:40 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  55 ++++++-
 drivers/net/ice/ice_rxtx.c             | 218 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 264 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index cce1f6e50c..f11bbbdc1f 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -71,6 +71,10 @@ New Features
   User can choose length or protocol header to configure buffer split
   according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index e8304a1f2b..23f0f2140c 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -170,6 +170,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -281,6 +282,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3750,7 +3752,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3762,7 +3765,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3831,6 +3834,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5887,6 +5895,49 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		/* Non tunneled */
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+
+		/* Tunneled */
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index bfb3a16ae2..022b241f3f 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,45 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		uint32_t proto_hdr;
+		proto_hdr = rxq->rxseg[0].proto_hdr;
+
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+		if ((proto_hdr & RTE_PTYPE_L2_MASK) == RTE_PTYPE_L2_ETHER) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+		} else if ((proto_hdr & RTE_PTYPE_INNER_L4_MASK) == RTE_PTYPE_INNER_L4_TCP ||
+			(proto_hdr & RTE_PTYPE_INNER_L4_MASK) == RTE_PTYPE_INNER_L4_UDP) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+		} else if ((proto_hdr & RTE_PTYPE_INNER_L4_MASK) == RTE_PTYPE_INNER_L4_SCTP) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+		} else if ((proto_hdr & RTE_PTYPE_INNER_L3_MASK) == RTE_PTYPE_INNER_L3_IPV4 ||
+			(proto_hdr & RTE_PTYPE_INNER_L3_MASK) == RTE_PTYPE_INNER_L3_IPV6) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+		} else if ((proto_hdr & RTE_PTYPE_INNER_L2_MASK) == RTE_PTYPE_INNER_L2_ETHER) {
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+		} else {
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +434,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +442,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +449,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +498,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +797,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1131,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	const struct rte_eth_rxseg_split *rx_seg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1144,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1164,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
+		rte_memcpy(rxq->rxseg, rx_seg, sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1570,7 +1646,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1625,6 +1701,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1716,7 +1813,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1729,6 +1828,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1737,13 +1845,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2352,11 +2468,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2384,12 +2502,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2402,24 +2524,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f5337d5284..d44bde3710 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -98,6 +112,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-26  9:40   ` [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-28 15:42     ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-09-28 15:42 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	Yang, Qiming, jerinjacobk, viacheslavo, stephen, Ding, Xuan,
	hpothula, Tang, Yaqi

Hi Andrew,

Did you get a chance to review this patch? Please let me know your thoughts on it.

Regards,
Yuan

> -----Original Message-----
> From: Wang, YuanX <yuanx.wang@intel.com>
> Sent: Monday, September 26, 2022 5:41 PM
> To: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@xilinx.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> Subject: [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split
> 
> Currently, Rx buffer split supports length based split. With Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into multiple
> segments.
> 
> However, length based buffer split is not suitable for NICs that do split based
> on protocol headers. Given an arbitrarily variable length in Rx packet
> segment, it is almost impossible to pass a fixed protocol header to driver.
> Besides, the existence of tunneling results in the composition of a packet is
> various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field of
> rte_eth_rxseg_split structure to specify protocol header. The proto_hdr field
> defines the split position of packet, splitting will always happen after the
> protocol header defined in the Rx packet segment. When Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as RTE_PTYPE_L2_ETHER |
> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> struct rte_eth_rxseg_split {
>         struct rte_mempool *mp; /* memory pools to allocate segment from */
>         uint16_t length; /* segment maximal data length,
>                             configures split point */
>         uint16_t offset; /* data offset from beginning
>                             of mbuf data buffer */
>         /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>          * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>          * in the mask indicates the split position.
> 	 * For non-tunneling packets, the complete protocol sequence
>          * should be defined.
> 	 * For tunneling packets, for simplicity, only the tunnel and
>          * inner protocol sequence should be defined.
> 	 */
>         uint32_t proto_hdr;
> };
> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the following
> segments:
>         seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                off0=2B
>         seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
>                | RTE_PTYPE_L4_UDP, off1=128B
>         seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>         seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>         seg1 - udp header @ 128 in mbuf from pool1
>         seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the protocol
> headers in the segments. For example, if ARP packets received with above
> config, the NIC won't do split for ARP packets since it does not contains ipv4
> header and udp header. These packets will be put into the last valid
> mempool, with zero offset.
> 
> Now buffer split can be configured in two modes. For length based buffer
> split, the mp, length, offset field in Rx packet segment should be configured,
> while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field in Rx
> packet segment should be configured, while the length field will be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>  doc/guides/rel_notes/release_22_11.rst |  7 +++
>  lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
>  lib/ethdev/rte_ethdev.h                | 29 +++++++++-
>  3 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst
> b/doc/guides/rel_notes/release_22_11.rst
> index 8e5bdde46a..cce1f6e50c 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -64,6 +64,13 @@ New Features
>    Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
>    header protocols of a PMD to split.
> 
> +* **Added protocol header based buffer split.**
> +
> +  Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split``
> + structure is  replaced with ``proto_hdr`` to support protocol header based
> buffer split.
> +  User can choose length or protocol header to configure buffer split
> + according to NIC's capability.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 1f0a7f8f3f..27ec19faed 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
> 
>  static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>  {
>  	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
>  	struct rte_mempool *mp_first;
> @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {
> +			/* Split based on protocol headers. */
> +
> +			/* skip the payload */
> +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> +				continue;
> +
> +			int ptype_cnt;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			uint32_t ptypes[ptype_cnt];
> +			int i;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +
> 	ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)
> +					break;
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>  		}
>  	}
>  	return 0;
> @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
>  		n_seg = rx_conf->rx_nseg;
> 
>  		if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
>  							   &mbp_buf_size,
>  							   &dev_info);
>  			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> c440e3863a..ba7c11f735 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>   *   specified in the first array element, the second buffer, from the
>   *   pool in the second element, and so on.
>   *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>   * - The offsets from the segment description elements specify
>   *   the data offset from the buffer beginning except the first mbuf.
>   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.
> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset
>   */
>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> +	 * split position.
> +	 * For non-tunneling packets, the complete protocol sequence
> should be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner
> +	 * protocol sequence should be defined.
> +	 */
> +	uint32_t proto_hdr;
>  };
> 
>  /**
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (11 preceding siblings ...)
  2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-29 18:59 ` Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 more replies)
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
                   ` (2 subsequent siblings)
  15 siblings, 4 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-29 18:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v6:
ice: Fix proto_hdr mappings to NIC configuration.

v5:
Define proto_hdr to use mask instead of single protocol type.
Define PMD to return protocol header mask.
Refine the doc and commit log.
Remove deprecated RTE_FUNC_PTR_OR_ERR_RET.

v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 146 +++++++++++++-
 app/test-pmd/config.c                  |  88 +++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  16 ++
 drivers/net/ice/ice_ethdev.c           |  55 +++++-
 drivers/net/ice/ice_rxtx.c             | 257 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                | 107 ++++++++--
 lib/ethdev/rte_ethdev.h                |  59 +++++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 741 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 1/4] ethdev: introduce protocol header API
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
@ 2022-09-29 18:59   ` Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-29 18:59 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 86 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 0231959874..6a7474a3d6 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -96,6 +96,11 @@ New Features
   * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx Adapter
     from enqueueing any packets to the Tx queue.
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+    header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 8cd8eb8685..791b264610 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1055,6 +1055,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1302,6 +1314,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 0c2c1088c0..1f0a7f8f3f 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
+		return -ENOTSUP;
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 12535c703e..cf14e04010 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6017,6 +6017,36 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * When a packet type is announced to be split, it *must* be supported by
+ * the PMD. For instance, if eth-ipv4, eth-ipv4-udp is announced, the PMD must
+ * return the following packet types for these packets:
+ * - Ether/IPv4             -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
+ * - Ether/IPv4/UDP         -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 25e54f9d3e..50f5814a48 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 2/4] ethdev: introduce protocol hdr based buffer split
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-09-29 18:59   ` Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-29 18:59 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        /**
	 * Proto_hdr defines a bit mask of the protocol sequence as
         * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
         * in the mask indicates the split position.
	 * For non-tunneling packets, the complete protocol sequence
         * should be defined.
	 * For tunneling packets, for simplicity, only the tunnel and
         * inner protocol sequence should be defined.
	 */
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
               | RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 +++
 lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 29 +++++++++-
 3 files changed, 98 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 6a7474a3d6..510869c73a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -101,6 +101,13 @@ New Features
   * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
     header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+
+  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+    replaced with ``proto_hdr`` to support protocol header based buffer split.
+    User can choose length or protocol header to configure buffer split
+    according to NIC's capability.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1f0a7f8f3f..27ec19faed 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+
+			/* skip the payload */
+			if (proto_hdr == RTE_PTYPE_ALL_MASK)
+				continue;
+
+			int ptype_cnt;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			uint32_t ptypes[ptype_cnt];
+			int i;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index cf14e04010..a5f9647bd3 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 * For non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner
+	 * protocol sequence should be defined.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-09-29 18:59   ` Yuan Wang
  2022-09-29 18:59   ` [PATCH v6 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-09-29 18:59 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth,[eth-ipv4,eth-ipv4-udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,eth-ipv4,eth-ipv4-udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
        (default protocols of testpmd : eth|eth-ipv4|eth-ipv6|
         eth-ipv4-tcp|eth-ipv6-tcp|eth-ipv4-udp|eth-ipv6-udp|
         eth-ipv4-sctp|eth-ipv6-sctp|grenat-eth|grenat-eth-ipv4|
         grenat-eth-ipv6|grenat-eth-ipv4-tcp|grenat-eth-ipv6-tcp|
         grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|grenat-eth-ipv4-sctp|
         grenat-eth-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 146 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  88 +++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ba749f588a..00c7d167ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -181,7 +181,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -305,6 +305,17 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,eth-ipv4])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|eth-ipv4|eth-ipv6|eth-ipv4-tcp|eth-ipv6-tcp|"
+			"eth-ipv4-udp|eth-ipv6-udp|eth-ipv4-sctp|eth-ipv6-sctp|"
+			"grenat-eth|grenat-eth-ipv4|grenat-eth-ipv6|grenat-eth-ipv4-tcp|"
+			"grenat-eth-ipv6-tcp|grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|"
+			"grenat-eth-ipv4-sctp|grenat-eth-ipv6-sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3366,6 +3377,88 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "eth-ipv4"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv6"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv4-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv6-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "grenat-eth-ipv4"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv6"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv4-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv6-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3735,6 +3828,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,eth-ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6487,6 +6624,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6499,12 +6638,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -12455,6 +12594,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 568b0881d4..d3e95e40da 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4746,6 +4746,94 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "eth";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		return "eth-ipv4";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+		return "eth-ipv6";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv4-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv6-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv4-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv6-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv4-sctp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv6-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER:
+		return "grenat-eth";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		return "grenat-eth-ipv4";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+		return "grenat-eth-ipv6";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv4-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv6-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv4-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv6-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv4-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv6-sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1024b5419c..5bf4219c46 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -152,6 +152,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -660,6 +661,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1254,7 +1256,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1264,6 +1265,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index de6ad00138..bb2a969559 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2652,6 +2653,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 21c5632aec..0e5e94423a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -554,6 +554,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -825,6 +826,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
 void cmdline_read_from_file(const char *filename);
@@ -974,6 +978,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 4/4] net/ice: support buffer split in Rx path
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-09-29 18:59   ` [PATCH v6 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-09-29 18:59   ` Yuan Wang
  2022-09-30  6:45     ` Tang, Yaqi
  3 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-09-29 18:59 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  55 +++++-
 drivers/net/ice/ice_rxtx.c             | 257 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 303 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 510869c73a..3fa5377d96 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -108,6 +108,10 @@ New Features
     User can choose length or protocol header to configure buffer split
     according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  * Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 71302b03d8..709e7df408 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -159,6 +159,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -270,6 +271,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3797,7 +3799,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3809,7 +3812,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3878,6 +3881,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5932,6 +5940,49 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		/* Non tunneled */
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+
+		/* Tunneled */
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5af7c0c8f6..62f02dfd87 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -259,7 +259,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -288,11 +287,84 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		uint32_t proto_hdr;
+		proto_hdr = rxq->rxseg[0].proto_hdr;
+
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L4_MASK) {
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L3_MASK) {
+		case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L2_MASK) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L4_MASK) {
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L3_MASK) {
+		case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L2_MASK) {
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+		return -EINVAL;
+
+set_hsplit_finish:
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -378,6 +450,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -385,8 +458,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -394,9 +465,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -420,14 +514,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -719,7 +813,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1053,6 +1147,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	const struct rte_eth_rxseg_split *rx_seg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1064,6 +1160,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1075,12 +1180,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
+		rte_memcpy(rxq->rxseg, rx_seg, sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1551,7 +1666,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1606,6 +1721,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1697,7 +1833,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1710,6 +1848,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1718,13 +1865,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2333,11 +2488,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2365,12 +2522,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2383,24 +2544,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index 6c08c175dc..0cfc3ca57d 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -99,6 +113,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v6 4/4] net/ice: support buffer split in Rx path
  2022-09-29 18:59   ` [PATCH v6 4/4] net/ice: support buffer split in Rx path Yuan Wang
@ 2022-09-30  6:45     ` Tang, Yaqi
  0 siblings, 0 replies; 72+ messages in thread
From: Tang, Yaqi @ 2022-09-30  6:45 UTC (permalink / raw)
  To: Wang, YuanX, dev, Yang, Qiming, Zhang, Qi Z
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, Li, Xiaoyun, Singh,
	Aman Deep, Zhang, Yuying, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Wenxuan Wu


> -----Original Message-----
> From: Wang, YuanX <yuanx.wang@intel.com>
> Sent: Friday, September 30, 2022 2:59 AM
> To: dev@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>
> Cc: thomas@monjalon.net; andrew.rybchenko@oktetlabs.ru;
> ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun <xiaoyun.li@intel.com>;
> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; jerinjacobk@gmail.com;
> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> <yaqi.tang@intel.com>; Wang, YuanX <yuanx.wang@intel.com>; Wenxuan
> Wu <wenxuanx.wu@intel.com>
> Subject: [PATCH v6 4/4] net/ice: support buffer split in Rx path
> 
> This patch adds support for protocol based buffer split in normal Rx data
> paths. When the Rx queue is configured with specific protocol type, packets
> received will be directly split into protocol header and payload parts
> limitation of pmd. And the two parts will be put into different mempools.
> 
> Currently, protocol based buffer split is not supported in vectorized paths.
> 
> A new api ice_buffer_split_supported_hdr_ptypes_get() has been introduced,
> it will return the supported header protocols of ice PMD to app for splitting.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---

Tested-by: Yaqi Tang <yaqi.tang@intel.com>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (12 preceding siblings ...)
  2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
@ 2022-10-01 21:05 ` Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 more replies)
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
  15 siblings, 4 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-01 21:05 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v7:
ice: Fix CI issue.

v6:
ice: Fix proto_hdr mappings to NIC configuration.

v5:
Define proto_hdr to use mask instead of single protocol type.
Define PMD to return protocol header mask.
Refine the doc and commit log.
Remove deprecated RTE_FUNC_PTR_OR_ERR_RET.

v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 146 +++++++++++++-
 app/test-pmd/config.c                  |  88 +++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |   2 +
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  16 ++
 drivers/net/ice/ice_ethdev.c           |  55 +++++-
 drivers/net/ice/ice_rxtx.c             | 259 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                | 107 ++++++++--
 lib/ethdev/rte_ethdev.h                |  59 +++++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 743 insertions(+), 48 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 1/4] ethdev: introduce protocol header API
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
@ 2022-10-01 21:05   ` Yuan Wang
  2022-10-03  7:04     ` Andrew Rybchenko
  2022-10-01 21:05   ` [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-10-01 21:05 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 86 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 0231959874..6a7474a3d6 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -96,6 +96,11 @@ New Features
   * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx Adapter
     from enqueueing any packets to the Tx queue.
 
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
+
+  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+    header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 8cd8eb8685..791b264610 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1055,6 +1055,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1302,6 +1314,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 0c2c1088c0..1f0a7f8f3f 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
+		return -ENOTSUP;
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (!all_types)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 12535c703e..cf14e04010 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6017,6 +6017,36 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * When a packet type is announced to be split, it *must* be supported by
+ * the PMD. For instance, if eth-ipv4, eth-ipv4-udp is announced, the PMD must
+ * return the following packet types for these packets:
+ * - Ether/IPv4             -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
+ * - Ether/IPv4/UDP         -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 25e54f9d3e..50f5814a48 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -285,6 +285,9 @@ EXPERIMENTAL {
 	rte_mtr_color_in_protocol_priority_get;
 	rte_mtr_color_in_protocol_set;
 	rte_mtr_meter_vlan_table_update;
+
+	# added in 22.11
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-10-01 21:05   ` Yuan Wang
  2022-10-02  4:01     ` Wang, YuanX
  2022-10-03  7:47     ` Andrew Rybchenko
  2022-10-01 21:05   ` [PATCH v7 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 2 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-01 21:05 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        /**
	 * Proto_hdr defines a bit mask of the protocol sequence as
         * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
         * in the mask indicates the split position.
	 * For non-tunneling packets, the complete protocol sequence
         * should be defined.
	 * For tunneling packets, for simplicity, only the tunnel and
         * inner protocol sequence should be defined.
	 */
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
               | RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 +++
 lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 29 +++++++++-
 3 files changed, 98 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 6a7474a3d6..510869c73a 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -101,6 +101,13 @@ New Features
   * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
     header protocols of a PMD to split.
 
+* **Added protocol header based buffer split.**
+
+  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+    replaced with ``proto_hdr`` to support protocol header based buffer split.
+    User can choose length or protocol header to configure buffer split
+    according to NIC's capability.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1f0a7f8f3f..27ec19faed 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+
+			/* skip the payload */
+			if (proto_hdr == RTE_PTYPE_ALL_MASK)
+				continue;
+
+			int ptype_cnt;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			uint32_t ptypes[ptype_cnt];
+			int i;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index cf14e04010..a5f9647bd3 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field will be ignored.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field will be ignored.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 * For non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner
+	 * protocol sequence should be defined.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-10-01 21:05   ` Yuan Wang
  2022-10-01 21:05   ` [PATCH v7 4/4] net/ice: support buffer split in Rx path Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-01 21:05 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth,[eth-ipv4,eth-ipv4-udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,eth-ipv4,eth-ipv4-udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
        (default protocols of testpmd : eth|eth-ipv4|eth-ipv6|
         eth-ipv4-tcp|eth-ipv6-tcp|eth-ipv4-udp|eth-ipv6-udp|
         eth-ipv4-sctp|eth-ipv6-sctp|grenat-eth|grenat-eth-ipv4|
         grenat-eth-ipv6|grenat-eth-ipv4-tcp|grenat-eth-ipv6-tcp|
         grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|grenat-eth-ipv4-sctp|
         grenat-eth-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 146 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  88 +++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 ++++-
 app/test-pmd/testpmd.c    |   2 +
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ba749f588a..00c7d167ce 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -181,7 +181,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -305,6 +305,17 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,eth-ipv4])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|eth-ipv4|eth-ipv6|eth-ipv4-tcp|eth-ipv6-tcp|"
+			"eth-ipv4-udp|eth-ipv6-udp|eth-ipv4-sctp|eth-ipv6-sctp|"
+			"grenat-eth|grenat-eth-ipv4|grenat-eth-ipv6|grenat-eth-ipv4-tcp|"
+			"grenat-eth-ipv6-tcp|grenat-eth-ipv4-udp|grenat-eth-ipv6-udp|"
+			"grenat-eth-ipv4-sctp|grenat-eth-ipv6-sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3366,6 +3377,88 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "eth-ipv4"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv6"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "eth-ipv4-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv6-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "grenat-eth-ipv4"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv6"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "grenat-eth-ipv4-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv6-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "grenat-eth-ipv4-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv6-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "grenat-eth-ipv4-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "grenat-eth-ipv6-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3735,6 +3828,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,eth-ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6487,6 +6624,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6499,12 +6638,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -12455,6 +12594,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 568b0881d4..d3e95e40da 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4746,6 +4746,94 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "eth";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		return "eth-ipv4";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+		return "eth-ipv6";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv4-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "eth-ipv6-tcp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv4-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "eth-ipv6-udp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv4-sctp";
+	case RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "eth-ipv6-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER:
+		return "grenat-eth";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		return "grenat-eth-ipv4";
+	case RTE_PTYPE_TUNNEL_GRENAT|RTE_PTYPE_INNER_L2_ETHER|RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+		return "grenat-eth-ipv6";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv4-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "grenat-eth-ipv6-tcp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv4-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "grenat-eth-ipv6-udp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv4-sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "grenat-eth-ipv6-sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	rx_pkt_hdr_protos[nb_segs] = RTE_PTYPE_ALL_MASK;
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1024b5419c..5bf4219c46 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -152,6 +152,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -660,6 +661,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1254,7 +1256,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1264,6 +1265,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index de6ad00138..bb2a969559 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2652,6 +2653,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 21c5632aec..0e5e94423a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -554,6 +554,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -825,6 +826,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
 void cmdline_read_from_file(const char *filename);
@@ -974,6 +978,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v7 4/4] net/ice: support buffer split in Rx path
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-10-01 21:05   ` [PATCH v7 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-10-01 21:05   ` Yuan Wang
  3 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-01 21:05 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Tested-by: Yaqi Tang <yaqi.tang@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  55 +++++-
 drivers/net/ice/ice_rxtx.c             | 259 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 305 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 510869c73a..3fa5377d96 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -108,6 +108,10 @@ New Features
     User can choose length or protocol header to configure buffer split
     according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  * Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 71302b03d8..709e7df408 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -159,6 +159,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -270,6 +271,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3797,7 +3799,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3809,7 +3812,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3878,6 +3881,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5932,6 +5940,49 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		/* Non tunneled */
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+
+		/* Tunneled */
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5af7c0c8f6..36d1193cec 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -259,7 +259,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -288,11 +287,84 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		uint32_t proto_hdr;
+		proto_hdr = rxq->rxseg[0].proto_hdr;
+
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L4_MASK) {
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L3_MASK) {
+		case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L2_MASK) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L4_MASK) {
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L3_MASK) {
+		case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L2_MASK) {
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+		return -EINVAL;
+
+set_hsplit_finish:
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -378,6 +450,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -385,8 +458,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -394,9 +465,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -420,14 +514,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -719,7 +813,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1053,6 +1147,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	uint16_t i;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1064,6 +1160,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1075,12 +1180,24 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		for (i = 0; i < n_seg; i++)
+			memcpy(&rxq->rxseg[i], &rx_conf->rx_seg[i].split,
+				sizeof(struct rte_eth_rxseg_split));
+
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1551,7 +1668,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1606,6 +1723,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1697,7 +1835,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1710,6 +1850,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1718,13 +1867,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2333,11 +2490,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2365,12 +2524,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2383,24 +2546,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index 6c08c175dc..0cfc3ca57d 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -99,6 +113,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-01 21:05   ` [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-10-02  4:01     ` Wang, YuanX
  2022-10-03  7:47     ` Andrew Rybchenko
  1 sibling, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-10-02  4:01 UTC (permalink / raw)
  To: Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi, Wenxuan Wu, dev

Hi All,

Could you please review and provide suggestions if any.

Thanks,
Yuan

> -----Original Message-----
> From: Wang, YuanX <yuanx.wang@intel.com>
> Sent: Sunday, October 2, 2022 5:05 AM
> To: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> Subject: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> Currently, Rx buffer split supports length based split. With Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into multiple
> segments.
> 
> However, length based buffer split is not suitable for NICs that do split based
> on protocol headers. Given an arbitrarily variable length in Rx packet
> segment, it is almost impossible to pass a fixed protocol header to driver.
> Besides, the existence of tunneling results in the composition of a packet is
> various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field of
> rte_eth_rxseg_split structure to specify protocol header. The proto_hdr field
> defines the split position of packet, splitting will always happen after the
> protocol header defined in the Rx packet segment. When Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as RTE_PTYPE_L2_ETHER |
> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> struct rte_eth_rxseg_split {
>         struct rte_mempool *mp; /* memory pools to allocate segment from */
>         uint16_t length; /* segment maximal data length,
>                             configures split point */
>         uint16_t offset; /* data offset from beginning
>                             of mbuf data buffer */
>         /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>          * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>          * in the mask indicates the split position.
> 	 * For non-tunneling packets, the complete protocol sequence
>          * should be defined.
> 	 * For tunneling packets, for simplicity, only the tunnel and
>          * inner protocol sequence should be defined.
> 	 */
>         uint32_t proto_hdr;
> };
> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the following
> segments:
>         seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                off0=2B
>         seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
>                | RTE_PTYPE_L4_UDP, off1=128B
>         seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>         seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
>         seg1 - udp header @ 128 in mbuf from pool1
>         seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the protocol
> headers in the segments. For example, if ARP packets received with above
> config, the NIC won't do split for ARP packets since it does not contains ipv4
> header and udp header. These packets will be put into the last valid
> mempool, with zero offset.
> 
> Now buffer split can be configured in two modes. For length based buffer
> split, the mp, length, offset field in Rx packet segment should be configured,
> while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field in Rx
> packet segment should be configured, while the length field will be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>  doc/guides/rel_notes/release_22_11.rst |  7 +++
>  lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
>  lib/ethdev/rte_ethdev.h                | 29 +++++++++-
>  3 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst
> b/doc/guides/rel_notes/release_22_11.rst
> index 6a7474a3d6..510869c73a 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -101,6 +101,13 @@ New Features
>    * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
>      header protocols of a PMD to split.
> 
> +* **Added protocol header based buffer split.**
> +
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 1f0a7f8f3f..27ec19faed 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)  }
> 
>  static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>  {
>  	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
>  	struct rte_mempool *mp_first;
> @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {
> +			/* Split based on protocol headers. */
> +
> +			/* skip the payload */
> +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> +				continue;
> +
> +			int ptype_cnt;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			uint32_t ptypes[ptype_cnt];
> +			int i;
> +
> +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +
> 	ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer
> split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)
> +					break;
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>  		}
>  	}
>  	return 0;
> @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
>  		n_seg = rx_conf->rx_nseg;
> 
>  		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
>  							   &mbp_buf_size,
>  							   &dev_info);
>  			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> cf14e04010..a5f9647bd3 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>   *   specified in the first array element, the second buffer, from the
>   *   pool in the second element, and so on.
>   *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>   * - The offsets from the segment description elements specify
>   *   the data offset from the buffer beginning except the first mbuf.
>   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.
> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset
>   */
>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> +	 * split position.
> +	 * For non-tunneling packets, the complete protocol sequence should
> be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner
> +	 * protocol sequence should be defined.
> +	 */
> +	uint32_t proto_hdr;
>  };
> 
>  /**
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 1/4] ethdev: introduce protocol header API
  2022-10-01 21:05   ` [PATCH v7 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-10-03  7:04     ` Andrew Rybchenko
  2022-10-04  2:21       ` Wang, YuanX
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-03  7:04 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Wenxuan Wu

On 10/2/22 00:05, Yuan Wang wrote:
> Add a new ethdev API to retrieve supported protocol headers
> of a PMD, which helps to configure protocol header based buffer split.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> ---
>   doc/guides/rel_notes/release_22_11.rst |  5 ++++
>   lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
>   lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
>   lib/ethdev/version.map                 |  3 +++
>   5 files changed, 86 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 0231959874..6a7474a3d6 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -96,6 +96,11 @@ New Features
>     * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx Adapter
>       from enqueueing any packets to the Tx queue.
>   
> +* **Added new ethdev API for PMD to get buffer split supported protocol types.**
> +
> +  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
> +    header protocols of a PMD to split.
> +

ethdev features should be grouped together in release notes.
I'll fix it on applying if a new version is not required.

[snip]

> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 0c2c1088c0..1f0a7f8f3f 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
>   	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
>   }
>   
> +int
> +rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
> +{
> +	int i, j;
> +	struct rte_eth_dev *dev;
> +	const uint32_t *all_types;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (ptypes == NULL && num > 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
> +		return -ENOTSUP;
> +	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
> +
> +	if (!all_types)

Should be compared with NULL explicitly as coding standard
says. I can fix it on applying as well.

[snip]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-01 21:05   ` [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
  2022-10-02  4:01     ` Wang, YuanX
@ 2022-10-03  7:47     ` Andrew Rybchenko
  2022-10-04  2:48       ` Wang, YuanX
  1 sibling, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-03  7:47 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Wenxuan Wu

On 10/2/22 00:05, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happen
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>           * in the mask indicates the split position.
> 	 * For non-tunneling packets, the complete protocol sequence
>           * should be defined.
> 	 * For tunneling packets, for simplicity, only the tunnel and
>           * inner protocol sequence should be defined.
> 	 */
>          uint32_t proto_hdr;
> };
> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                 off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
>                 | RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header. These packets will be put
> into the last valid mempool, with zero offset.
> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>

I apologize for delay with review. Overall LGTM now. See few
notes below.

> ---
>   doc/guides/rel_notes/release_22_11.rst |  7 +++
>   lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
>   lib/ethdev/rte_ethdev.h                | 29 +++++++++-
>   3 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 6a7474a3d6..510869c73a 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -101,6 +101,13 @@ New Features
>     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>       header protocols of a PMD to split.
>   
> +* **Added protocol header based buffer split.**
> +
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.
> +

It should be grouped together with other ethdev features.

>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 1f0a7f8f3f..27ec19faed 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
> @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13 +1709,63 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {
> +			/* Split based on protocol headers. */

Isn't safer here to ensure that segment length is set to 0?
Just to protect agains misusage etc.

> +
> +			/* skip the payload */

Sorry, it is confusing. What do you mean here?

> +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> +				continue;
> +
> +			int ptype_cnt;
> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			uint32_t ptypes[ptype_cnt];
> +			int i;

First of all do no mix code and variable declaration.
It significantly complicates code reading.
Second creation of an array on stack based on function
return value is very dangerours from security point of
view - potential stack overflow and corresponding
vulnerabilities.

> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +										ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)
> +					break;
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {

The check is obviously insufficient, but I agree that it should
be driver reponsibility to do extra checks for required space
in mbuf.

> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index cf14e04010..a5f9647bd3 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field will be ignored.

Looking at the code above I think proto_hdr must be 0.

> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field will be ignored.

I'd require length to be 0 to avoid misusage of the API.

> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset

Shoundl't be check that dataroom in the last segment mempool
is sufficient for up to MTU packet if Rx scatter is disabled?

>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates the
> +	 * split position.
> +	 * For non-tunneling packets, the complete protocol sequence should be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner
> +	 * protocol sequence should be defined.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v7 1/4] ethdev: introduce protocol header API
  2022-10-03  7:04     ` Andrew Rybchenko
@ 2022-10-04  2:21       ` Wang, YuanX
  2022-10-04  7:52         ` Andrew Rybchenko
  0 siblings, 1 reply; 72+ messages in thread
From: Wang, YuanX @ 2022-10-04  2:21 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: ferruh.yigit, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, October 3, 2022 3:04 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Cc: ferruh.yigit@xilinx.com; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman
> Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v7 1/4] ethdev: introduce protocol header API
> 
> On 10/2/22 00:05, Yuan Wang wrote:
> > Add a new ethdev API to retrieve supported protocol headers of a PMD,
> > which helps to configure protocol header based buffer split.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  5 ++++
> >   lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
> >   lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
> >   lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
> >   lib/ethdev/version.map                 |  3 +++
> >   5 files changed, 86 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 0231959874..6a7474a3d6 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -96,6 +96,11 @@ New Features
> >     * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx
> Adapter
> >       from enqueueing any packets to the Tx queue.
> >
> > +* **Added new ethdev API for PMD to get buffer split supported
> > +protocol types.**
> > +
> > +  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> > +    header protocols of a PMD to split.
> > +
> 
> ethdev features should be grouped together in release notes.
> I'll fix it on applying if a new version is not required.

We will send a new version. For the doc changes, I don't understand your point very well.
Since will be no new changes to the code within this patch, could you help to adjust the doc?
Thanks very much.

> 
> [snip]
> 
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 0c2c1088c0..1f0a7f8f3f 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE
> *file)
> >   	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev,
> file));
> >   }
> >
> > +int
> > +rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id,
> > +uint32_t *ptypes, int num) {
> > +	int i, j;
> > +	struct rte_eth_dev *dev;
> > +	const uint32_t *all_types;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	if (ptypes == NULL && num > 0) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Cannot get ethdev port %u supported header
> protocol types to NULL when array size is non zero\n",
> > +			port_id);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
> > +		return -ENOTSUP;
> > +	all_types =
> > +(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
> > +
> > +	if (!all_types)
> 
> Should be compared with NULL explicitly as coding standard says. I can fix it
> on applying as well.

Sure, I will fix in v8.

> 
> [snip]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-03  7:47     ` Andrew Rybchenko
@ 2022-10-04  2:48       ` Wang, YuanX
  2022-10-04  8:22         ` Andrew Rybchenko
  0 siblings, 1 reply; 72+ messages in thread
From: Wang, YuanX @ 2022-10-04  2:48 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, October 3, 2022 3:47 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/2/22 00:05, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happen after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > Examples for proto_hdr field defines:
> > To split after ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> RTE_PTYPE_L4_UDP
> >
> > For inner ETH-IPV4-UDP, it should be defined as
> > RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> > RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          /**
> > 	 * Proto_hdr defines a bit mask of the protocol sequence as
> >           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
> >           * in the mask indicates the split position.
> > 	 * For non-tunneling packets, the complete protocol sequence
> >           * should be defined.
> > 	 * For tunneling packets, for simplicity, only the tunnel and
> >           * inner protocol sequence should be defined.
> > 	 */
> >          uint32_t proto_hdr;
> > };
> >
> > If protocol header split can be supported by a PMD, the
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
> >                 off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
> >                 | RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header. These packets will be
> > put into the last valid mempool, with zero offset.
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> I apologize for delay with review. Overall LGTM now. See few notes below.

Thanks so much for your time and patience for this patch series.

> 
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  7 +++
> >   lib/ethdev/rte_ethdev.c                | 74 ++++++++++++++++++++++----
> >   lib/ethdev/rte_ethdev.h                | 29 +++++++++-
> >   3 files changed, 98 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 6a7474a3d6..510869c73a 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -101,6 +101,13 @@ New Features
> >     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >       header protocols of a PMD to split.
> >
> > +* **Added protocol header based buffer split.**
> > +
> > +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> > +    replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> > +    User can choose length or protocol header to configure buffer split
> > +    according to NIC's capability.
> > +
> 
> It should be grouped together with other ethdev features.

We will send a new version. For the doc changes, the same as patch 1, could you help to adjust the doc?
Thanks very much.

> 
> >
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 1f0a7f8f3f..27ec19faed 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1649,9 +1649,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +			const struct rte_eth_rxseg_split *rx_seg,
> > +			uint16_t n_seg, uint32_t *mbp_buf_size,
> > +			const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> > @@ -1674,6 +1675,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1707,13
> > +1709,63 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		if (proto_hdr > 0) {
> > +			/* Split based on protocol headers. */
> 
> Isn't safer here to ensure that segment length is set to 0?
> Just to protect agains misusage etc.

It's a reasonable suggestion, I will take it, please see v8.

> 
> > +
> > +			/* skip the payload */
> 
> Sorry, it is confusing. What do you mean here?

Because setting n proto_hdr will generate (n+1) segments. If we want to split the packet into n segments, we only need to check the first (n-1) proto_hdr.
For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP header, we only need to set and check the UDP header in the first segment.

Maybe mask is not a good way, so we will use index to filter out the check of proto_hdr inside the last segment.

> 
> > +			if (proto_hdr == RTE_PTYPE_ALL_MASK)
> > +				continue;
> > +
> > +			int ptype_cnt;
> > +
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
> > +			if (ptype_cnt <= 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				return -EINVAL;
> > +			}
> > +
> > +			uint32_t ptypes[ptype_cnt];
> > +			int i;
> 
> First of all do no mix code and variable declaration.
> It significantly complicates code reading.

Thanks, the code and variable declaration will be separated.

> Second creation of an array on stack based on function return value is very
> dangerours from security point of view - potential stack overflow and
> corresponding vulnerabilities.

The function value is used for defining how much space is needed to store ptypes. Thanks for your correction of stack overflow, we will use heap instead.

> 
> > +
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +
> 	ptypes, ptype_cnt);
> > +			if (ptype_cnt < 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				return -EINVAL;
> > +			}
> > +
> > +			for (i = 0; i < ptype_cnt; i++)
> > +				if (ptypes[i] == proto_hdr)
> > +					break;
> > +			if (i == ptype_cnt) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> > +					proto_hdr);
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (*mbp_buf_size < offset) {
> 
> The check is obviously insufficient, but I agree that it should be driver
> reponsibility to do extra checks for required space in mbuf.
> 
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1793,7 +1845,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > cf14e04010..a5f9647bd3 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -994,6 +994,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1015,12 +1018,36 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field will be ignored.
> 
> Looking at the code above I think proto_hdr must be 0.
> 
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field will be ignored.
> 
> I'd require length to be 0 to avoid misusage of the API.

Sure, we will fix them in v8.

> 
> > + *
> > + * - For Protocol header based buffer split, if the received packets
> > + *   don't exactly match all protocol headers in the elements, packets
> > + *   will not be split.
> > + *   These packets will be put into:
> > + *     - pool from the last valid element
> > + *     - the buffer size from this pool
> > + *     - zero offset
> 
> Shoundl't be check that dataroom in the last segment mempool is sufficient
> for up to MTU packet if Rx scatter is disabled?

Yes, we will add this check in the last segment.

Thanks,
Yuan

> 
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**
> > +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> > +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> > +	 * split position.
> > +	 * For non-tunneling packets, the complete protocol sequence should
> be defined.
> > +	 * For tunneling packets, for simplicity, only the tunnel and inner
> > +	 * protocol sequence should be defined.
> > +	 */
> > +	uint32_t proto_hdr;
> >   };
> >
> >   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 1/4] ethdev: introduce protocol header API
  2022-10-04  2:21       ` Wang, YuanX
@ 2022-10-04  7:52         ` Andrew Rybchenko
  2022-10-04 15:00           ` Wang, YuanX
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-04  7:52 UTC (permalink / raw)
  To: Wang, YuanX, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: ferruh.yigit, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi, Wenxuan Wu

On 10/4/22 05:21, Wang, YuanX wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, October 3, 2022 3:04 PM
>> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
>> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>;
>> Ray Kinsella <mdr@ashroe.eu>
>> Cc: ferruh.yigit@xilinx.com; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman
>> Deep <aman.deep.singh@intel.com>; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
>> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
>> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
>> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
>> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
>> Subject: Re: [PATCH v7 1/4] ethdev: introduce protocol header API
>>
>> On 10/2/22 00:05, Yuan Wang wrote:
>>> Add a new ethdev API to retrieve supported protocol headers of a PMD,
>>> which helps to configure protocol header based buffer split.
>>>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>> ---
>>>    doc/guides/rel_notes/release_22_11.rst |  5 ++++
>>>    lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
>>>    lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
>>>    lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
>>>    lib/ethdev/version.map                 |  3 +++
>>>    5 files changed, 86 insertions(+)
>>>
>>> diff --git a/doc/guides/rel_notes/release_22_11.rst
>>> b/doc/guides/rel_notes/release_22_11.rst
>>> index 0231959874..6a7474a3d6 100644
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -96,6 +96,11 @@ New Features
>>>      * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx
>> Adapter
>>>        from enqueueing any packets to the Tx queue.
>>>
>>> +* **Added new ethdev API for PMD to get buffer split supported
>>> +protocol types.**
>>> +
>>> +  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
>> supported
>>> +    header protocols of a PMD to split.
>>> +
>>
>> ethdev features should be grouped together in release notes.
>> I'll fix it on applying if a new version is not required.
> 
> We will send a new version. For the doc changes, I don't understand your point very well.
> Since will be no new changes to the code within this patch, could you help to adjust the doc?
> Thanks very much.

Please, read a comment just after 'New Features' section start.
Hopefully it will make my note clearer.
Anyway, don't worry about it a lot. I can easily fix it on
applying.

> 
>>
>> [snip]
>>
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 0c2c1088c0..1f0a7f8f3f 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -6002,6 +6002,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE
>> *file)
>>>    	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev,
>> file));
>>>    }
>>>
>>> +int
>>> +rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id,
>>> +uint32_t *ptypes, int num) {
>>> +	int i, j;
>>> +	struct rte_eth_dev *dev;
>>> +	const uint32_t *all_types;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>> +	dev = &rte_eth_devices[port_id];
>>> +
>>> +	if (ptypes == NULL && num > 0) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Cannot get ethdev port %u supported header
>> protocol types to NULL when array size is non zero\n",
>>> +			port_id);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
>>> +		return -ENOTSUP;
>>> +	all_types =
>>> +(*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
>>> +
>>> +	if (!all_types)
>>
>> Should be compared with NULL explicitly as coding standard says. I can fix it
>> on applying as well.
> 
> Sure, I will fix in v8.
> 
>>
>> [snip]


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-04  2:48       ` Wang, YuanX
@ 2022-10-04  8:22         ` Andrew Rybchenko
  2022-10-04 15:01           ` Wang, YuanX
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-04  8:22 UTC (permalink / raw)
  To: Wang, YuanX, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi

On 10/4/22 05:48, Wang, YuanX wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> On 10/2/22 00:05, Yuan Wang wrote:
>>> +
>>> +			/* skip the payload */
>>
>> Sorry, it is confusing. What do you mean here?
> 
> Because setting n proto_hdr will generate (n+1) segments. If we want to split the packet into n segments, we only need to check the first (n-1) proto_hdr.
> For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP header, we only need to set and check the UDP header in the first segment.
> 
> Maybe mask is not a good way, so we will use index to filter out the check of proto_hdr inside the last segment.

I see your point and understand the problem now.
Thinking a bit more about it I realize that consistency check
here should be more sophisticated.
It should not allow:
  - seg1 - length-based, seg2 - proto-based, seg3 - payload
  - seg1 - proto-based, seg2 - legnth-based, seg3 - proto-based, seg4 - 
payload
I.e. no protocol-based split after length-based.
But should allow:
  - seg1 - proto-based, seg2 - legnth-based, seg3 - payload
I.e. length based split after protocol-based.

Taking the last point above into account, proto_hdr in the last
segment should be 0 like in length-based split (not
RTE_PTYPE_ALL_MASK).

It is an interesting question how to request:
  - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload
Should we really repeat ETH in seg2->proto_hdr and
seg3->proto_hdr header and IPv4 in seg3->proto_hdr again?
I tend to say no since when packet comes to seg2 it already
has no ETH header.

If so, how to handle configuration when ETH is repeat in seg2?
For example,
   - seg1 ETH+IPv4+UDP
   - seg2 ETH+IPv6+UDP
   - seg2 0
Should we deny it or should we define behaviour like.
If a packet does not match segX proto_hdr, the segment is
skipped and segX+1 considered.
Of course, not all drivers/HW supports it. If so, such
configuration should be just discarded by the driver itself.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v7 1/4] ethdev: introduce protocol header API
  2022-10-04  7:52         ` Andrew Rybchenko
@ 2022-10-04 15:00           ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-10-04 15:00 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: ferruh.yigit, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 4, 2022 3:53 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Cc: ferruh.yigit@xilinx.com; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman
> Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v7 1/4] ethdev: introduce protocol header API
> 
> On 10/4/22 05:21, Wang, YuanX wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Monday, October 3, 2022 3:04 PM
> >> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon
> >> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>; Ray
> >> Kinsella <mdr@ashroe.eu>
> >> Cc: ferruh.yigit@xilinx.com; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yang,
> >> Qiming <qiming.yang@intel.com>; jerinjacobk@gmail.com;
> >> viacheslavo@nvidia.com; stephen@networkplumber.org; Ding, Xuan
> >> <xuan.ding@intel.com>; hpothula@marvell.com; Tang, Yaqi
> >> <yaqi.tang@intel.com>; Wenxuan Wu <wenxuanx.wu@intel.com>
> >> Subject: Re: [PATCH v7 1/4] ethdev: introduce protocol header API
> >>
> >> On 10/2/22 00:05, Yuan Wang wrote:
> >>> Add a new ethdev API to retrieve supported protocol headers of a
> >>> PMD, which helps to configure protocol header based buffer split.
> >>>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>> ---
> >>>    doc/guides/rel_notes/release_22_11.rst |  5 ++++
> >>>    lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
> >>>    lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
> >>>    lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
> >>>    lib/ethdev/version.map                 |  3 +++
> >>>    5 files changed, 86 insertions(+)
> >>>
> >>> diff --git a/doc/guides/rel_notes/release_22_11.rst
> >>> b/doc/guides/rel_notes/release_22_11.rst
> >>> index 0231959874..6a7474a3d6 100644
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -96,6 +96,11 @@ New Features
> >>>      * Added ``rte_event_eth_tx_adapter_queue_stop`` to stop the Tx
> >> Adapter
> >>>        from enqueueing any packets to the Tx queue.
> >>>
> >>> +* **Added new ethdev API for PMD to get buffer split supported
> >>> +protocol types.**
> >>> +
> >>> +  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to
> >>> + get
> >> supported
> >>> +    header protocols of a PMD to split.
> >>> +
> >>
> >> ethdev features should be grouped together in release notes.
> >> I'll fix it on applying if a new version is not required.
> >
> > We will send a new version. For the doc changes, I don't understand your
> point very well.
> > Since will be no new changes to the code within this patch, could you help
> to adjust the doc?
> > Thanks very much.
> 
> Please, read a comment just after 'New Features' section start.
> Hopefully it will make my note clearer.
> Anyway, don't worry about it a lot. I can easily fix it on applying.

Is it written like the following, if it is not correct please help to fix.

* **Added protocol header based buffer split.**

  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
    header protocols of a PMD to split.
  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
    replaced with ``proto_hdr`` to support protocol header based buffer split.
    User can choose length or protocol header to configure buffer split
    according to NIC's capability.

Thanks,
Yuan

 [snip]


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-04  8:22         ` Andrew Rybchenko
@ 2022-10-04 15:01           ` Wang, YuanX
  0 siblings, 0 replies; 72+ messages in thread
From: Wang, YuanX @ 2022-10-04 15:01 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	Ding, Xuan, hpothula, Tang, Yaqi

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, October 4, 2022 4:23 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>
> Subject: Re: [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/4/22 05:48, Wang, YuanX wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> On 10/2/22 00:05, Yuan Wang wrote:
> >>> +
> >>> +			/* skip the payload */
> >>
> >> Sorry, it is confusing. What do you mean here?
> >
> > Because setting n proto_hdr will generate (n+1) segments. If we want to
> split the packet into n segments, we only need to check the first (n-1)
> proto_hdr.
> > For example, for ETH-IPV4-UDP-PAYLOAD, if we want to split after the UDP
> header, we only need to set and check the UDP header in the first segment.
> >
> > Maybe mask is not a good way, so we will use index to filter out the check
> of proto_hdr inside the last segment.
> 
> I see your point and understand the problem now.
> Thinking a bit more about it I realize that consistency check here should be
> more sophisticated.
> It should not allow:
>   - seg1 - length-based, seg2 - proto-based, seg3 - payload
>   - seg1 - proto-based, seg2 - legnth-based, seg3 - proto-based, seg4 - payload
> I.e. no protocol-based split after length-based.
> But should allow:
>   - seg1 - proto-based, seg2 - legnth-based, seg3 - payload I.e. length based
> split after protocol-based.
> 
> Taking the last point above into account, proto_hdr in the last segment
> should be 0 like in length-based split (not RTE_PTYPE_ALL_MASK).

Just to confirm, do you mean that the payload as last segment should be treated as a length-based split(proto_hdr == 0)?
If so, for this question, 'check that dataroom in the last segment mempool is sufficient> for up to MTU packet if Rx scatter is disabled'
Is it not necessary to compare MTU size and mbuf_size? Because the check in length based split is sufficient. We will send v8 soon with above thought, please help to check.

> 
> It is an interesting question how to request:
>   - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload Should we really repeat
> ETH in seg2->proto_hdr and
> seg3->proto_hdr header and IPv4 in seg3->proto_hdr again?
> I tend to say no since when packet comes to seg2 it already has no ETH
> header.
> 
> If so, how to handle configuration when ETH is repeat in seg2?
> For example,
>    - seg1 ETH+IPv4+UDP
>    - seg2 ETH+IPv6+UDP
>    - seg2 0
> Should we deny it or should we define behaviour like.
> If a packet does not match segX proto_hdr, the segment is skipped and
> segX+1 considered.
> Of course, not all drivers/HW supports it. If so, such configuration should be
> just discarded by the driver itself.

Here a question that needs to be clarified, whether the segments are sequential or independent. I prefer the former because it's more readable. Furthermore, it consists with length based split, which also configures the lengths sequentially. In this case, the following situation does not exist:
- seg1 ETH+IPv4+UDP
- seg2 ETH+IPv6+UDP
- seg3 0

For the case of repeating ETH, such as - seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload, as you suggested, we can omit ETH in the following segment. but IPV4-UDP and IPV6-UDP still need  to be distinguished, follow our previous discussion (user wants to split at IPV4-UDP rather than IPV6-UDP although driver supports both). In this case, seg1 - ETH, seg2 - IPv4, seg3 - UDP, seg4 - payload,
we set proto_hdr with:
seg1 proto_hdr1=RTE_PTYPE_L2_ETHER
seg2 proto_hdr2=RTE_PTYPE_L3_IPV4
seg3 proto_hdr3=RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP

Thanks,
Yuan


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v8 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (13 preceding siblings ...)
  2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
@ 2022-10-05 23:18 ` Yuan Wang
  2022-10-05 23:18   ` [PATCH v8 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (4 more replies)
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
  15 siblings, 5 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-05 23:18 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v8:
Restrict length == 0 and proto_hdr == 0 in another buffer split.
Add check for proto_hdr == 0 in last segment.
Use heap instead of stack for array to avoid vulnerabilities.
Define the proto_hdr for two segments and multi-segments, respectively.
Separate variable definition and code.
Refine the doc and commit log.

v7:
ice: Fix CI issue.

v6:
ice: Fix proto_hdr mappings to NIC configuration.

v5:
Define proto_hdr to use mask instead of single protocol type.
Define PMD to return protocol header mask.
Refine the doc and commit log.
Remove deprecated RTE_FUNC_PTR_OR_ERR_RET.

v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 152 +++++++++++++-
 app/test-pmd/config.c                  |  95 +++++++++
 app/test-pmd/parameters.c              |  16 +-
 app/test-pmd/testpmd.c                 |  11 +-
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_11.rst |  13 ++
 drivers/net/ice/ice_ethdev.c           |  97 ++++++++-
 drivers/net/ice/ice_rxtx.c             | 266 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  15 ++
 lib/ethdev/rte_ethdev.c                | 122 +++++++++++-
 lib/ethdev/rte_ethdev.h                |  64 +++++-
 lib/ethdev/version.map                 |   1 +
 14 files changed, 826 insertions(+), 51 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v8 1/4] ethdev: introduce protocol header API
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
@ 2022-10-05 23:18   ` Yuan Wang
  2022-10-06 10:11     ` Andrew Rybchenko
  2022-10-05 23:18   ` [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-10-05 23:18 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
 lib/ethdev/version.map                 |  1 +
 5 files changed, 84 insertions(+)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index ac67e7e710..141fd9442b 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -123,6 +123,11 @@ New Features
   into single event containing ``rte_event_vector``
   whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
 
+* **Added protocol header based buffer split.**
+
+  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+    header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 8cd8eb8685..791b264610 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1055,6 +1055,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1302,6 +1314,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2821770e2d..ee3b490889 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6045,6 +6045,39 @@ rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 	return eth_err(port_id, (*dev->dev_ops->eth_dev_priv_dump)(dev, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
+		return -ENOTSUP;
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (all_types == NULL)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a21f58b9cd..c51c1f3fa0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6025,6 +6025,36 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * When a packet type is announced to be split, it *must* be supported by
+ * the PMD. For instance, if eth-ipv4, eth-ipv4-udp is announced, the PMD must
+ * return the following packet types for these packets:
+ * - Ether/IPv4             -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
+ * - Ether/IPv4/UDP         -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 3def7bfd24..87f06d4ea6 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -288,6 +288,7 @@ EXPERIMENTAL {
 
 	# added in 22.11
 	rte_flow_async_action_handle_query;
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
  2022-10-05 23:18   ` [PATCH v8 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-10-05 23:18   ` Yuan Wang
  2022-10-06 10:11     ` Andrew Rybchenko
  2022-10-05 23:18   ` [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-10-05 23:18 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
            RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
            RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

If the protocol header is repeated with the previously defined one,
the repeated part can be omitted. For example, split after ETH, ETH-IPV4
and ETH-IPV4-UDP, it should be defined as
proto_hdr0 = RTE_PTYPE_L2_ETHER
proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
proto_hdr2 = RTE_PTYPE_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures split point */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        /**
	 * Proto_hdr defines a bit mask of the protocol sequence as
         * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
         * in the mask indicates the split position.
         * If one protocol header is defined to split packets into two
         * segments, for non-tunneling packets, the complete protocol
         * sequence should be defined.
         * For tunneling packets, for simplicity,
         * only the tunnel and inner part of comple protocol sequence
         * is required.
         * If several protocol headers are defined to split packets into
         * multi-segments, the repeated parts of adjacent segments
         * should be omitted.
	 */
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be use to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Note: NIC will only do split when the packets exactly match all the
protocol headers in the segments. For example, if ARP packets received
with above config, the NIC won't do split for ARP packets since
it does not contains ipv4 header and udp header. These packets will be put
into the last valid mempool, with zero offset.

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field will be ignored.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field will
be ignored.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  4 ++
 lib/ethdev/rte_ethdev.c                | 89 ++++++++++++++++++++++----
 lib/ethdev/rte_ethdev.h                | 34 +++++++++-
 3 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 141fd9442b..4c3a7f8b8b 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -127,6 +127,10 @@ New Features
 
   * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
     header protocols of a PMD to split.
+  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+    replaced with ``proto_hdr`` to support protocol header based buffer split.
+    User can choose length or protocol header to configure buffer split
+    according to NIC's capability.
 
 
 Removed Items
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index ee3b490889..60fe6eb2bd 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1650,14 +1650,18 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
 	uint32_t offset_mask;
 	uint16_t seg_idx;
+	int ptype_cnt;
+	uint32_t *ptypes;
+	int i;
 
 	if (n_seg > seg_capa->max_nseg) {
 		RTE_ETHDEV_LOG(ERR,
@@ -1675,6 +1679,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1708,13 +1713,75 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
+		if (proto_hdr > 0) {
+			/* Split based on protocol headers. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Do not set length split and protocol split within a segment\n"
+					);
+				return -EINVAL;
+			}
+
+			if (seg_idx == n_seg - 1) {
+				RTE_ETHDEV_LOG(ERR,
+					"The proto_hdr in the last segment should be 0\n"
+					);
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				return -EINVAL;
+			}
+
+			ptypes = malloc(sizeof(uint32_t) * ptype_cnt);
+			if (ptypes == NULL)
+				return -ENOMEM;
+
+			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
+										ptypes, ptype_cnt);
+			if (ptype_cnt < 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to supported buffer split header protocols\n",
+					port_id);
+				free(ptypes);
+				return -EINVAL;
+			}
+
+			for (i = 0; i < ptype_cnt; i++)
+				if (ptypes[i] == proto_hdr)
+					break;
+
+			free(ptypes);
+
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1794,7 +1861,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c51c1f3fa0..4c9b121355 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,41 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field must be 0.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field must be 0.
+ *     - The proto_hdr field in the last segment should be 0.
+ *
+ * - For Protocol header based buffer split, if the received packets
+ *   don't exactly match all protocol headers in the elements, packets
+ *   will not be split.
+ *   These packets will be put into:
+ *     - pool from the last valid element
+ *     - the buffer size from this pool
+ *     - zero offset
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 *
+	 * If one protocol header is defined to split packets into two segments,
+	 * for non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner part of
+	 * comple protocol sequence is required.
+	 * If several protocol headers are defined to split packets into multi-segments,
+	 * the repeated parts of adjacent segments should be omitted.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
  2022-10-05 23:18   ` [PATCH v8 1/4] ethdev: introduce protocol header API Yuan Wang
  2022-10-05 23:18   ` [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-10-05 23:18   ` Yuan Wang
  2022-10-06 10:12     ` Andrew Rybchenko
  2022-10-05 23:18   ` [PATCH v8 4/4] net/ice: support buffer split in Rx path Yuan Wang
  2022-10-06 10:13   ` [PATCH v8 0/4] support protocol based buffer split Andrew Rybchenko
  4 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-10-05 23:18 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth[,ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,ipv4,udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
        (default protocols of testpmd : eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|
         ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|grenat|inner-eth|
         inner-ipv4|inner-ipv6|inner-ipv4-tcp|inner-ipv6-tcp|
         inner-ipv4-udp|inner-ipv6-udp|inner-ipv4-sctp|inner-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c    | 152 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  95 ++++++++++++++++++++++++
 app/test-pmd/parameters.c |  16 +++-
 app/test-pmd/testpmd.c    |  11 ++-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 273 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ba749f588a..49e0321786 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -181,7 +181,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -305,6 +305,17 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,ipv4,udp])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|"
+			"ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|"
+			"grenat|inner-eth|inner-ipv4|inner-ipv6|inner-ipv4-tcp|"
+			"inner-ipv6-tcp|inner-ipv4-udp|inner-ipv6-udp|"
+			"inner-ipv4-sctp|inner-ipv6-sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3366,6 +3377,94 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "ipv4-tcp"))
+		protocol = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "ipv6-tcp"))
+		protocol = RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "ipv4-udp"))
+		protocol = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "ipv6-udp"))
+		protocol = RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "ipv4-sctp"))
+		protocol = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "ipv6-sctp"))
+		protocol = RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "grenat"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT;
+	else if (!strcmp(value, "inner-eth"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner-ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "inner-ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "inner-ipv4-tcp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner-ipv6-tcp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner-ipv4-udp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner-ipv6-udp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner-ipv4-sctp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "inner-ipv6-sctp"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "inner-tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner-udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner-sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3735,6 +3834,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,ipv4,udp]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6487,6 +6630,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6499,12 +6644,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -12455,6 +12600,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b57fb97f2e..802f555fcc 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4844,6 +4844,101 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "eth";
+	case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+		return "ipv6";
+	case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "ipv4-tcp";
+	case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP:
+		return "ipv6-tcp";
+	case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "ipv4-udp";
+	case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP:
+		return "ipv6-udp";
+	case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "ipv4-sctp";
+	case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP:
+		return "ipv6-sctp";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "tcp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_TUNNEL_GRENAT:
+		return "grenat";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner-eth";
+	case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		return "inner-ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+		return "inner-ipv6";
+	case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "inner-ipv4-tcp";
+	case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP:
+		return "inner-ipv6-tcp";
+	case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "inner-ipv4-udp";
+	case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP:
+		return "inner-ipv6-udp";
+	case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "inner-ipv4-sctp";
+	case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP:
+		return "inner-ipv6-sctp";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner-tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner-udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner-sctp";
+	default:
+		return "unsupported";
+	}
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1024b5419c..5bf4219c46 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -152,6 +152,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -660,6 +661,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1254,7 +1256,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1264,6 +1265,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index de6ad00138..f5b7ee6e27 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2646,12 +2647,16 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i >= mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		if (rx_pkt_hdr_protos[i] != 0 && rx_pkt_seg_lengths[i] == 0) {
+			rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
+		} else {
+			rx_seg->length = rx_pkt_seg_lengths[i] ?
+					rx_pkt_seg_lengths[i] :
+					mbuf_data_size[mp_n];
+		}
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a7b8565a6d..f58a7b4d5d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -580,6 +580,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -851,6 +852,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
 void cmdline_read_from_file(const char *filename);
@@ -1002,6 +1006,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v8 4/4] net/ice: support buffer split in Rx path
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
                     ` (2 preceding siblings ...)
  2022-10-05 23:18   ` [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-10-05 23:18   ` Yuan Wang
  2022-10-06 10:12     ` Andrew Rybchenko
  2022-10-06 10:13   ` [PATCH v8 0/4] support protocol based buffer split Andrew Rybchenko
  4 siblings, 1 reply; 72+ messages in thread
From: Yuan Wang @ 2022-10-05 23:18 UTC (permalink / raw)
  To: dev, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  97 ++++++++-
 drivers/net/ice/ice_rxtx.c             | 266 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 5 files changed, 354 insertions(+), 32 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4c3a7f8b8b..513db6476b 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -132,6 +132,10 @@ New Features
     User can choose length or protocol header to configure buffer split
     according to NIC's capability.
 
+* **Updated Intel ice driver.**
+
+  * Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 8aa37722c3..320e92d760 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -161,6 +161,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -275,6 +276,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3802,7 +3804,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3814,7 +3817,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3883,6 +3886,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5960,6 +5968,91 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		/* Non tunneled */
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+
+		/* Tunneled */
+		RTE_PTYPE_TUNNEL_GRENAT,
+		RTE_PTYPE_INNER_L2_ETHER,
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index d1e1fadf9d..ceaad15c78 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -259,7 +259,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -288,11 +287,91 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		uint32_t proto_hdr;
+		proto_hdr = rxq->rxseg[0].proto_hdr;
+
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L4_MASK) {
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L3_MASK) {
+		case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L2_MASK) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_TUNNEL_MASK) {
+		case RTE_PTYPE_TUNNEL_GRENAT:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_ALWAYS;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L4_MASK) {
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L3_MASK) {
+		case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L2_MASK) {
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+		return -EINVAL;
+
+set_hsplit_finish:
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -378,6 +457,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -385,8 +465,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -394,9 +472,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -420,14 +521,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -719,7 +820,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1053,6 +1154,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	uint16_t i;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1064,6 +1167,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1075,12 +1187,24 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		for (i = 0; i < n_seg; i++)
+			memcpy(&rxq->rxseg[i], &rx_conf->rx_seg[i].split,
+				sizeof(struct rte_eth_rxseg_split));
+
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1551,7 +1675,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1606,6 +1730,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1697,7 +1842,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1710,6 +1857,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1718,13 +1874,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2333,11 +2497,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2365,12 +2531,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2383,24 +2553,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index e1d4fe8e47..4947d5c25f 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -45,6 +48,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -55,6 +63,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -101,6 +115,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v8 1/4] ethdev: introduce protocol header API
  2022-10-05 23:18   ` [PATCH v8 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-10-06 10:11     ` Andrew Rybchenko
  0 siblings, 0 replies; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-06 10:11 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Wenxuan Wu

On 10/6/22 02:18, Yuan Wang wrote:
> Add a new ethdev API to retrieve supported protocol headers
> of a PMD, which helps to configure protocol header based buffer split.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

New function should be mentioned in the
doc/guides/nics/features.rst in Buffer Split on Rx feature
definition.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-05 23:18   ` [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-10-06 10:11     ` Andrew Rybchenko
  2022-10-08 14:30       ` Ding, Xuan
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-06 10:11 UTC (permalink / raw)
  To: Yuan Wang, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Wenxuan Wu

On 10/6/22 02:18, Yuan Wang wrote:
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happen
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> Examples for proto_hdr field defines:
> To split after ETH-IPV4-UDP, it should be defined as
> proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
>              RTE_PTYPE_L4_UDP
> 
> For inner ETH-IPV4-UDP, it should be defined as
> proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
>              RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP
> 
> If the protocol header is repeated with the previously defined one,
> the repeated part can be omitted. For example, split after ETH, ETH-IPV4
> and ETH-IPV4-UDP, it should be defined as
> proto_hdr0 = RTE_PTYPE_L2_ETHER
> proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
> proto_hdr2 = RTE_PTYPE_L4_UDP

Ack

> 
> struct rte_eth_rxseg_split {
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures split point */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          /**
> 	 * Proto_hdr defines a bit mask of the protocol sequence as
>           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
>           * in the mask indicates the split position.
>           * If one protocol header is defined to split packets into two
>           * segments, for non-tunneling packets, the complete protocol
>           * sequence should be defined.
>           * For tunneling packets, for simplicity,
>           * only the tunnel and inner part of comple protocol sequence
>           * is required.
>           * If several protocol headers are defined to split packets into
>           * multi-segments, the repeated parts of adjacent segments
>           * should be omitted.
> 	 */
>          uint32_t proto_hdr;
> };

Sorry, but I see no reason to repeat in the descrtion.
What is the purpose of the duplication?

> 
> If protocol header split can be supported by a PMD, the
> rte_eth_buffer_split_get_supported_hdr_ptypes function can
> be use to obtain a list of these protocol headers.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
>                 off0=2B
>          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>          seg2 - pool2, off1=0B
> 
> The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> following:
>          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>          seg1 - udp header @ 128 in mbuf from pool1
>          seg2 - payload @ 0 in mbuf from pool2
> 
> Note: NIC will only do split when the packets exactly match all the
> protocol headers in the segments. For example, if ARP packets received
> with above config, the NIC won't do split for ARP packets since
> it does not contains ipv4 header and udp header. These packets will be put

ipv4 -> IPv4, udp -> UDP.

> into the last valid mempool, with zero offset.

What should happen if we have seg1 -> ETH, seg2 -> IPv4, seg3 - 
remaining and receive ARP? Will we see ETH header split in seg1
and everything else in the seg3? I would say yes.

May be we should define intended behavior using pseudo-code?

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field will be ignored.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field will
> be ignored.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>   doc/guides/rel_notes/release_22_11.rst |  4 ++
>   lib/ethdev/rte_ethdev.c                | 89 ++++++++++++++++++++++----
>   lib/ethdev/rte_ethdev.h                | 34 +++++++++-
>   3 files changed, 115 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 141fd9442b..4c3a7f8b8b 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -127,6 +127,10 @@ New Features
>   
>     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
>       header protocols of a PMD to split.
> +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> +    replaced with ``proto_hdr`` to support protocol header based buffer split.
> +    User can choose length or protocol header to configure buffer split
> +    according to NIC's capability.

It sounds like it should be mentioned in API change section as
well. Here I'd concentrate on top level feature overview only.
I.e. Supported protocol-based buffer split using added
``proto_hdr`` in structure ``rte_eth_rxseg_split``.

>   
>   
>   Removed Items
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index ee3b490889..60fe6eb2bd 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1650,14 +1650,18 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +			const struct rte_eth_rxseg_split *rx_seg,
> +			uint16_t n_seg, uint32_t *mbp_buf_size,
> +			const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
>   	uint32_t offset_mask;
>   	uint16_t seg_idx;
> +	int ptype_cnt;
> +	uint32_t *ptypes;
> +	int i;
>   
>   	if (n_seg > seg_capa->max_nseg) {
>   		RTE_ETHDEV_LOG(ERR,
> @@ -1675,6 +1679,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1708,13 +1713,75 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +
> +		if (proto_hdr > 0) {

proto_hdr != 0, please. I know that it is the same, but != 0
raises a bit less question if the field is signed or unsigned.

As the first condition here we should check if protocol-based
split is supported at all (see note about separate helper
function below).

> +			/* Split based on protocol headers. */
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Do not set length split and protocol split within a segment\n"
> +					);
> +				return -EINVAL;
> +			}
> +
> +			if (seg_idx == n_seg - 1) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"The proto_hdr in the last segment should be 0\n"
> +					);
> +				return -EINVAL;
> +			}

I think here we should check if we have seen any segment
with proto_hdr == 0 before. If so, we can't do protocol
based split any more. Since we need to collect already
split protcols (prev_proto_hdrs), I would use the variable
as a marker and set it to all 1's MASK as soon as
proto_hdr==0 met.

So, the condition will be
if ((proto_hdr & prev_proto_hdrs) != 0)

So, it will check two since no repeat of previou
protocol headers which are already split and no
ptoto-split after length-based split.

> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
> +

(separate helper function starts here)

> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);

Three is no point to do it in a loop. It should be done
outside. Moreover, it should be a helper function
which does it to make this functionshort.

> +			if (ptype_cnt <= 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				return -EINVAL;
> +			}
> +
> +			ptypes = malloc(sizeof(uint32_t) * ptype_cnt);
> +			if (ptypes == NULL)
> +				return -ENOMEM;
> +
> +			ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> +										ptypes, ptype_cnt);
> +			if (ptype_cnt < 0) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Port %u failed to supported buffer split header protocols\n",
> +					port_id);
> +				free(ptypes);
> +				return -EINVAL;
> +			}

(separate helper function ends here)

> +
> +			for (i = 0; i < ptype_cnt; i++)
> +				if (ptypes[i] == proto_hdr)

It should be if ((prev_proto_hdrs | proto_hdr) == ptypes[i])

> +					break;
> +
> +			free(ptypes);
> +
> +			if (i == ptype_cnt) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"Requested Rx split header protocols 0x%x is not supported.\n",
> +					proto_hdr);
> +				return -EINVAL;
> +			}

prev_proto_hdrs |= proto_hdr;

> +		} else {

NOTE If driver does not support length-based split,
it should reject such configuration itself.

> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}

prev_proto_hdrs = RTE_PTYPE_ALL_MASK;

>   		}
>   	}
>   	return 0;
> @@ -1794,7 +1861,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c51c1f3fa0..4c9b121355 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -994,6 +994,9 @@ struct rte_eth_txmode {
>    *   specified in the first array element, the second buffer, from the
>    *   pool in the second element, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - The offsets from the segment description elements specify
>    *   the data offset from the buffer beginning except the first mbuf.
>    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1015,12 +1018,41 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field must be 0.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field must be 0.
> + *     - The proto_hdr field in the last segment should be 0.
> + *
> + * - For Protocol header based buffer split, if the received packets
> + *   don't exactly match all protocol headers in the elements, packets
> + *   will not be split.
> + *   These packets will be put into:
> + *     - pool from the last valid element
> + *     - the buffer size from this pool
> + *     - zero offset
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**
> +	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
> +	 * configures split point. The last RTE_PTYPE* in the mask indicates the
> +	 * split position.
> +	 *
> +	 * If one protocol header is defined to split packets into two segments,
> +	 * for non-tunneling packets, the complete protocol sequence should be defined.
> +	 * For tunneling packets, for simplicity, only the tunnel and inner part of
> +	 * comple protocol sequence is required.
> +	 * If several protocol headers are defined to split packets into multi-segments,
> +	 * the repeated parts of adjacent segments should be omitted.
> +	 */
> +	uint32_t proto_hdr;
>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-10-05 23:18   ` [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-10-06 10:12     ` Andrew Rybchenko
  0 siblings, 0 replies; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-06 10:12 UTC (permalink / raw)
  To: Yuan Wang, dev, Aman Singh, Yuying Zhang
  Cc: thomas, ferruh.yigit, mdr, xiaoyun.li, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Wenxuan Wu

On 10/6/22 02:18, Yuan Wang wrote:
> Add command line parameter:
> --rxhdrs=eth[,ipv4,udp]
> 
> Set the protocol_hdr of segments to scatter packets on receiving if
> split feature is engaged. And the queues with BUFFER_SPLIT flag.
> 
> Add interactive mode command:
> testpmd>set rxhdrs eth,ipv4,udp
> (protocol sequence should be valid)
> 
> The protocol split feature is off by default. To enable protocol split,
> you need:
> 1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
> 2. Configure Rx queue with rx_offload buffer split on.
> 3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
>          (default protocols of testpmd : eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|
>           ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|grenat|inner-eth|
>           inner-ipv4|inner-ipv6|inner-ipv4-tcp|inner-ipv6-tcp|
>           inner-ipv4-udp|inner-ipv6-udp|inner-ipv4-sctp|inner-ipv6-sctp)
> Above protocols can be configured in testpmd. But the configuration can
> only be applied when it is supported by specific pmd.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
>   app/test-pmd/cmdline.c    | 152 +++++++++++++++++++++++++++++++++++++-
>   app/test-pmd/config.c     |  95 ++++++++++++++++++++++++
>   app/test-pmd/parameters.c |  16 +++-
>   app/test-pmd/testpmd.c    |  11 ++-
>   app/test-pmd/testpmd.h    |   6 ++
>   5 files changed, 273 insertions(+), 7 deletions(-)

testpmd documentaion must be updated:
doc/guides/testpmd_app_ug/testpmd_funcs.rst


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v8 4/4] net/ice: support buffer split in Rx path
  2022-10-05 23:18   ` [PATCH v8 4/4] net/ice: support buffer split in Rx path Yuan Wang
@ 2022-10-06 10:12     ` Andrew Rybchenko
  0 siblings, 0 replies; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-06 10:12 UTC (permalink / raw)
  To: Yuan Wang, dev, Qiming Yang, Qi Zhang
  Cc: thomas, ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh,
	yuying.zhang, jerinjacobk, viacheslavo, stephen, xuan.ding,
	hpothula, yaqi.tang, Wenxuan Wu

On 10/6/22 02:18, Yuan Wang wrote:
> This patch adds support for protocol based buffer split in normal Rx

"This patch adds" -> "Add"

> data paths. When the Rx queue is configured with specific protocol type,
> packets received will be directly split into protocol header and
> payload parts limitation of pmd. And the two parts will be

pmd -> PMD

> put into different mempools.
> 
> Currently, protocol based buffer split is not supported in vectorized
> paths.
> 
> A new api ice_buffer_split_supported_hdr_ptypes_get() has been

api -> API

> introduced, it will return the supported header protocols of ice PMD
> to app for splitting.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>

The patch should update doc/guides/nics/features/ice.ini to
list Buffer Split on Rx feature.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v8 0/4] support protocol based buffer split
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
                     ` (3 preceding siblings ...)
  2022-10-05 23:18   ` [PATCH v8 4/4] net/ice: support buffer split in Rx path Yuan Wang
@ 2022-10-06 10:13   ` Andrew Rybchenko
  4 siblings, 0 replies; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-06 10:13 UTC (permalink / raw)
  To: Yuan Wang, dev
  Cc: thomas, ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo,
	stephen, xuan.ding, hpothula, yaqi.tang

On 10/6/22 02:18, Yuan Wang wrote:
> Protocol type based buffer split consists of splitting a received packet
> into several separate segments based on the packet content. It is useful
> in some scenarios, such as GPU acceleration. The splitting will help to
> enable true zero copy and hence improve the performance significantly.
> 
> This patchset aims to support protocol header split based on current buffer
> split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> offload and corresponding protocol, packets received will be directly split
> into different mempools.
> 
> Change log:
> v8:
> Restrict length == 0 and proto_hdr == 0 in another buffer split.
> Add check for proto_hdr == 0 in last segment.
> Use heap instead of stack for array to avoid vulnerabilities.
> Define the proto_hdr for two segments and multi-segments, respectively.
> Separate variable definition and code.
> Refine the doc and commit log.

Overall LGTM, please, process review notes and I think it will
be ready to be accepted.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-06 10:11     ` Andrew Rybchenko
@ 2022-10-08 14:30       ` Ding, Xuan
  0 siblings, 0 replies; 72+ messages in thread
From: Ding, Xuan @ 2022-10-08 14:30 UTC (permalink / raw)
  To: Andrew Rybchenko, Wang, YuanX, dev, Thomas Monjalon, Ferruh Yigit
  Cc: ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo, stephen,
	hpothula, Tang, Yaqi, Wenxuan Wu

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, October 6, 2022 6:12 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org; Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>; Wenxuan Wu
> <wenxuanx.wu@intel.com>
> Subject: Re: [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 10/6/22 02:18, Yuan Wang wrote:
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happen after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > Examples for proto_hdr field defines:
> > To split after ETH-IPV4-UDP, it should be defined as proto_hdr =
> > RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> >              RTE_PTYPE_L4_UDP
> >
> > For inner ETH-IPV4-UDP, it should be defined as proto_hdr =
> > RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
> >              RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
> > RTE_PTYPE_INNER_L4_UDP
> >
> > If the protocol header is repeated with the previously defined one,
> > the repeated part can be omitted. For example, split after ETH,
> > ETH-IPV4 and ETH-IPV4-UDP, it should be defined as
> > proto_hdr0 = RTE_PTYPE_L2_ETHER
> > proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
> > proto_hdr2 = RTE_PTYPE_L4_UDP
> 
> Ack
> 
> >
> > struct rte_eth_rxseg_split {
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures split point */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          /**
> > 	 * Proto_hdr defines a bit mask of the protocol sequence as
> >           * RTE_PTYPE_*, configures split point. The last RTE_PTYPE*
> >           * in the mask indicates the split position.
> >           * If one protocol header is defined to split packets into two
> >           * segments, for non-tunneling packets, the complete protocol
> >           * sequence should be defined.
> >           * For tunneling packets, for simplicity,
> >           * only the tunnel and inner part of comple protocol sequence
> >           * is required.
> >           * If several protocol headers are defined to split packets into
> >           * multi-segments, the repeated parts of adjacent segments
> >           * should be omitted.
> > 	 */
> >          uint32_t proto_hdr;
> > };
> 
> Sorry, but I see no reason to repeat in the descrtion.
> What is the purpose of the duplication?

The intension for repeating here is to make the commit log more comprehensive.
We can remove these lines to make log cleaner.

> 
> >
> > If protocol header split can be supported by a PMD, the
> > rte_eth_buffer_split_get_supported_hdr_ptypes function can be use to
> > obtain a list of these protocol headers.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >          seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
> >                 off0=2B
> >          seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >          seg2 - pool2, off1=0B
> >
> > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
> > following:
> >          seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >          seg1 - udp header @ 128 in mbuf from pool1
> >          seg2 - payload @ 0 in mbuf from pool2
> >
> > Note: NIC will only do split when the packets exactly match all the
> > protocol headers in the segments. For example, if ARP packets received
> > with above config, the NIC won't do split for ARP packets since it
> > does not contains ipv4 header and udp header. These packets will be
> > put
> 
> ipv4 -> IPv4, udp -> UDP.
> 
> > into the last valid mempool, with zero offset.
> 
> What should happen if we have seg1 -> ETH, seg2 -> IPv4, seg3 - remaining
> and receive ARP? Will we see ETH header split in seg1 and everything else in
> the seg3? I would say yes.
> 
> May be we should define intended behavior using pseudo-code?

When NIC receives these packets (like ARP), we think the expected split behavior is not decided by library, but the driver itself.
It is possible for NIC to do split in exact match and longest match cases.

The exact match means NIC only do split when the packets exactly match all the protocol headers in the segments.
Otherwise, these packets won't be split and the whole packet will be put into the last valid mempool, that's what we defined.
The longest match means NIC will split as long as the packets meet some of the protocol headers.

Since both cases are possible, so IMO the two scenarios should be both defined. The final result of split will always be one of them.
Hope to get your insights.

Attached pseudo-code for two cases below:
Exact match:
FOR each seg in segs except last one
    IF proto_hdr is not matched THEN
        BREAK
    END IF
END FOR
IF loop breaked THEN
    put whole pkt in last seg
ELSE
    put protocol header in each seg
    put everything else in last seg
END IF

Longest match:
FOR each seg in segs except last one
    IF proto_hdr is matched THEN
        put protocol header in seg
    ELSE
        BREAK
    END IF
END FOR
put everything else in last seg

> 
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field will be ignored.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field will be ignored.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > ---
> >   doc/guides/rel_notes/release_22_11.rst |  4 ++
> >   lib/ethdev/rte_ethdev.c                | 89 ++++++++++++++++++++++----
> >   lib/ethdev/rte_ethdev.h                | 34 +++++++++-
> >   3 files changed, 115 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index 141fd9442b..4c3a7f8b8b 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -127,6 +127,10 @@ New Features
> >
> >     * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get
> supported
> >       header protocols of a PMD to split.
> > +  * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
> > +    replaced with ``proto_hdr`` to support protocol header based buffer
> split.
> > +    User can choose length or protocol header to configure buffer split
> > +    according to NIC's capability.
> 
> It sounds like it should be mentioned in API change section as well. Here I'd
> concentrate on top level feature overview only.
> I.e. Supported protocol-based buffer split using added ``proto_hdr`` in
> structure ``rte_eth_rxseg_split``.

Thanks for your suggestion, please see next version.

> 
> >
> >
> >   Removed Items
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > ee3b490889..60fe6eb2bd 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1650,14 +1650,18 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +			const struct rte_eth_rxseg_split *rx_seg,
> > +			uint16_t n_seg, uint32_t *mbp_buf_size,
> > +			const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> >   	uint32_t offset_mask;
> >   	uint16_t seg_idx;
> > +	int ptype_cnt;
> > +	uint32_t *ptypes;
> > +	int i;
> >
> >   	if (n_seg > seg_capa->max_nseg) {
> >   		RTE_ETHDEV_LOG(ERR,
> > @@ -1675,6 +1679,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1708,13
> > +1713,75 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> > +		if (proto_hdr > 0) {
> 
> proto_hdr != 0, please. I know that it is the same, but != 0 raises a bit less
> question if the field is signed or unsigned.

Get it.

> 
> As the first condition here we should check if protocol-based split is
> supported at all (see note about separate helper function below).
> 
> > +			/* Split based on protocol headers. */
> > +			if (length != 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Do not set length split and protocol
> split within a segment\n"
> > +					);
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (seg_idx == n_seg - 1) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"The proto_hdr in the last segment
> should be 0\n"
> > +					);
> > +				return -EINVAL;
> > +			}
> 
> I think here we should check if we have seen any segment with proto_hdr ==
> 0 before. If so, we can't do protocol based split any more. Since we need to
> collect already split protcols (prev_proto_hdrs), I would use the variable as a
> marker and set it to all 1's MASK as soon as
> proto_hdr==0 met.
> 
> So, the condition will be
> if ((proto_hdr & prev_proto_hdrs) != 0)
> 
> So, it will check two since no repeat of previou protocol headers which are
> already split and no ptoto-split after length-based split.

Thanks for your suggestion.
The introduction of prev_proto_hdrs is a good idea, which helps to solve the two issues above.
We will adopt this implementation, please see next version.

> 
> > +
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> > +
> 
> (separate helper function starts here)
> 
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +NULL, 0);
> 
> Three is no point to do it in a loop. It should be done outside. Moreover, it
> should be a helper function which does it to make this functionshort.

A new helper function eth_dev_buffer_split_get_supported_hdrs_helper() will be added in next version.

> 
> > +			if (ptype_cnt <= 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				return -EINVAL;
> > +			}
> > +
> > +			ptypes = malloc(sizeof(uint32_t) * ptype_cnt);
> > +			if (ptypes == NULL)
> > +				return -ENOMEM;
> > +
> > +			ptype_cnt =
> rte_eth_buffer_split_get_supported_hdr_ptypes(port_id,
> > +
> 	ptypes, ptype_cnt);
> > +			if (ptype_cnt < 0) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Port %u failed to supported buffer
> split header protocols\n",
> > +					port_id);
> > +				free(ptypes);
> > +				return -EINVAL;
> > +			}
> 
> (separate helper function ends here)
> 
> > +
> > +			for (i = 0; i < ptype_cnt; i++)
> > +				if (ptypes[i] == proto_hdr)
> 
> It should be if ((prev_proto_hdrs | proto_hdr) == ptypes[i])
> 
> > +					break;
> > +
> > +			free(ptypes);
> > +
> > +			if (i == ptype_cnt) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Requested Rx split header protocols
> 0x%x is not supported.\n",
> > +					proto_hdr);
> > +				return -EINVAL;
> > +			}
> 
> prev_proto_hdrs |= proto_hdr;
> 
> > +		} else {
> 
> NOTE If driver does not support length-based split, it should reject such
> configuration itself.

Here we have a question. From the code perspective, how to know
whether length-based split is supported by driver, thus to reject such configuration.
Because we have get_support_ptypes() API for driver to know proto-based split, but no API for length-based split.
Or are you referring to a doc update?

Thanks,
Xuan

> 
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> 
> prev_proto_hdrs = RTE_PTYPE_ALL_MASK;
> 
> >   		}
> >   	}
> >   	return 0;
> > @@ -1794,7 +1861,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c51c1f3fa0..4c9b121355 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -994,6 +994,9 @@ struct rte_eth_txmode {
> >    *   specified in the first array element, the second buffer, from the
> >    *   pool in the second element, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - The offsets from the segment description elements specify
> >    *   the data offset from the buffer beginning except the first mbuf.
> >    *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1015,12 +1018,41 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field must be 0.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field must be 0.
> > + *     - The proto_hdr field in the last segment should be 0.
> > + *
> > + * - For Protocol header based buffer split, if the received packets
> > + *   don't exactly match all protocol headers in the elements, packets
> > + *   will not be split.
> > + *   These packets will be put into:
> > + *     - pool from the last valid element
> > + *     - the buffer size from this pool
> > + *     - zero offset
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**
> > +	 * Proto_hdr defines a bit mask of the protocol sequence as
> RTE_PTYPE_*,
> > +	 * configures split point. The last RTE_PTYPE* in the mask indicates
> the
> > +	 * split position.
> > +	 *
> > +	 * If one protocol header is defined to split packets into two
> segments,
> > +	 * for non-tunneling packets, the complete protocol sequence should
> be defined.
> > +	 * For tunneling packets, for simplicity, only the tunnel and inner part
> of
> > +	 * comple protocol sequence is required.
> > +	 * If several protocol headers are defined to split packets into multi-
> segments,
> > +	 * the repeated parts of adjacent segments should be omitted.
> > +	 */
> > +	uint32_t proto_hdr;
> >   };
> >
> >   /**


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v9 0/4] support protocol based buffer split
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
@ 2022-10-09 14:58   ` Andrew Rybchenko
  2022-10-10  2:45     ` Ding, Xuan
  2022-10-09 20:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API Yuan Wang
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Andrew Rybchenko @ 2022-10-09 14:58 UTC (permalink / raw)
  To: Yuan Wang, dev
  Cc: thomas, ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo,
	stephen, xuan.ding, hpothula, yaqi.tang

On 10/9/22 23:25, Yuan Wang wrote:
> Protocol type based buffer split consists of splitting a received packet
> into several separate segments based on the packet content. It is useful
> in some scenarios, such as GPU acceleration. The splitting will help to
> enable true zero copy and hence improve the performance significantly.
> 
> This patchset aims to support protocol header split based on current buffer
> split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> offload and corresponding protocol, packets received will be directly split
> into different mempools.
> 
> Change log:
> v9:
> Define the intend behaviors for exact match and longest match.
> Add protocol headers repeat check.
> Add no proto-split after length-based split check.
> Add a helper function to short the check function.
> Refine the doc and commit log.

With few minor fixes, applied to dpdk-next-net/main, thanks.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v9 0/4] support protocol based buffer split
  2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
                   ` (14 preceding siblings ...)
  2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
@ 2022-10-09 20:25 ` Yuan Wang
  2022-10-09 14:58   ` Andrew Rybchenko
                     ` (4 more replies)
  15 siblings, 5 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-09 20:25 UTC (permalink / raw)
  To: dev
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, qi.z.zhang, qiming.yang,
	jerinjacobk, viacheslavo, stephen, xuan.ding, hpothula,
	yaqi.tang, Yuan Wang

Protocol type based buffer split consists of splitting a received packet
into several separate segments based on the packet content. It is useful
in some scenarios, such as GPU acceleration. The splitting will help to
enable true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

Change log:
v9:
Define the intend behaviors for exact match and longest match.
Add protocol headers repeat check.
Add no proto-split after length-based split check.
Add a helper function to short the check function.
Refine the doc and commit log.

v8:
Restrict length == 0 and proto_hdr == 0 in another buffer split.
Add check for proto_hdr == 0 in last segment.
Use heap instead of stack for array to avoid vulnerabilities.
Define the proto_hdr for two segments and multi-segments, respectively.
Separate variable definition and code.
Refine the doc and commit log.

v7:
ice: Fix CI issue.

v6:
ice: Fix proto_hdr mappings to NIC configuration.

v5:
Define proto_hdr to use mask instead of single protocol type.
Define PMD to return protocol header mask.
Refine the doc and commit log.
Remove deprecated RTE_FUNC_PTR_OR_ERR_RET.

v4:
Change proto_hdr to a bit mask of RTE_PTYPE_*.
Add the description on how to put the unsplit packages.
Use proto_hdr to determine whether to use protocol based split.

v3:
Fix mail thread.

v2:
Add mbuf dump to the driver's buffer split path.
Add buffer split to the driver feature list.
Remove unsupported header protocols from the driver.

Yuan Wang (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                      | 152 ++++++++++-
 app/test-pmd/config.c                       | 108 ++++++++
 app/test-pmd/parameters.c                   |  16 +-
 app/test-pmd/testpmd.c                      |  11 +-
 app/test-pmd/testpmd.h                      |   6 +
 doc/guides/nics/features.rst                |   2 +-
 doc/guides/nics/features/default.ini        |   1 +
 doc/guides/nics/features/ice.ini            |   1 +
 doc/guides/rel_notes/release_22_11.rst      |  16 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  19 +-
 drivers/net/ice/ice_ethdev.c                |  58 ++++-
 drivers/net/ice/ice_rxtx.c                  | 263 +++++++++++++++++---
 drivers/net/ice/ice_rxtx.h                  |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h       |   3 +
 lib/ethdev/ethdev_driver.h                  |  15 ++
 lib/ethdev/rte_ethdev.c                     | 128 +++++++++-
 lib/ethdev/rte_ethdev.h                     |  67 ++++-
 lib/ethdev/version.map                      |   1 +
 18 files changed, 829 insertions(+), 54 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
  2022-10-09 14:58   ` Andrew Rybchenko
@ 2022-10-09 20:25   ` Yuan Wang
  2022-10-09 20:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-09 20:25 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Ray Kinsella
  Cc: ferruh.yigit, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add a new ethdev API to retrieve supported protocol headers
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/nics/features.rst           |  2 +-
 doc/guides/rel_notes/release_22_11.rst |  5 ++++
 lib/ethdev/ethdev_driver.h             | 15 ++++++++++++
 lib/ethdev/rte_ethdev.c                | 33 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 30 +++++++++++++++++++++++
 lib/ethdev/version.map                 |  1 +
 6 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 6aa1085c5b..fea604e77f 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -183,7 +183,7 @@ Scatters the packets being received on specified boundaries to segmented mbufs.
 * **[uses]       rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``.
 * **[implements] datapath**: ``Buffer Split functionality``.
 * **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
-* **[related] API**: ``rte_eth_rx_queue_setup()``.
+* **[related] API**: ``rte_eth_rx_queue_setup()``, ``rte_eth_buffer_split_get_supported_hdr_ptypes()``.
 
 
 .. _nic_features_lro:
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index c560dbdab7..16aca14bab 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -189,6 +189,11 @@ New Features
   into single event containing ``rte_event_vector``
   whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
 
+* **Added protocol header based buffer split.**
+
+  * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
+    header protocols of a PMD to split.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index e2bd4642b9..1300acc95d 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1055,6 +1055,18 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported header protocols of a PMD to split.
+ *
+ * @param dev
+ *   Ethdev handle of port.
+ *
+ * @return
+ *   An array pointer to store supported protocol headers.
+ */
+typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1366,6 +1378,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported header ptypes to split */
+	eth_buffer_split_supported_hdr_ptypes_get_t buffer_split_supported_hdr_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 4703ab0caf..79d1f9b993 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6209,6 +6209,39 @@ rte_eth_tx_descriptor_dump(uint16_t port_id, uint16_t queue_id,
 						queue_id, offset, num, file));
 }
 
+int
+rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num)
+{
+	int i, j;
+	struct rte_eth_dev *dev;
+	const uint32_t *all_types;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (ptypes == NULL && num > 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Cannot get ethdev port %u supported header protocol types to NULL when array size is non zero\n",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get == NULL)
+		return -ENOTSUP;
+	all_types = (*dev->dev_ops->buffer_split_supported_hdr_ptypes_get)(dev);
+
+	if (all_types == NULL)
+		return 0;
+
+	for (i = 0, j = 0; all_types[i] != RTE_PTYPE_UNKNOWN; ++i) {
+		if (j < num)
+			ptypes[j] = all_types[i];
+		j++;
+	}
+
+	return j;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 8c4a35cc1f..f9da569179 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6337,6 +6337,36 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split on Rx.
+ *
+ * When a packet type is announced to be split, it *must* be supported by
+ * the PMD. For instance, if eth-ipv4, eth-ipv4-udp is announced, the PMD must
+ * return the following packet types for these packets:
+ * - Ether/IPv4             -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4
+ * - Ether/IPv4/UDP         -> RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param[out] ptypes
+ *   An array pointer to store supported protocol headers, allocated by caller.
+ *   These ptypes are composed with RTE_PTYPE_*.
+ * @param num
+ *   Size of the array pointed by param ptypes.
+ * @return
+ *   - (>=0) Number of supported ptypes. If the number of types exceeds num,
+ *           only num entries will be filled into the ptypes array, but the full
+ *           count of supported ptypes will be returned.
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes, int num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 3205556ce7..30b067e0b6 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -296,6 +296,7 @@ EXPERIMENTAL {
 	rte_flow_async_action_handle_query;
 	rte_mtr_meter_policy_get;
 	rte_mtr_meter_profile_get;
+	rte_eth_buffer_split_get_supported_hdr_ptypes;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
  2022-10-09 14:58   ` Andrew Rybchenko
  2022-10-09 20:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API Yuan Wang
@ 2022-10-09 20:25   ` Yuan Wang
  2022-10-09 20:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
  2022-10-09 20:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path Yuan Wang
  4 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-09 20:25 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: ferruh.yigit, mdr, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happen
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

Examples for proto_hdr field defines:
To split after ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
            RTE_PTYPE_L4_UDP

For inner ETH-IPV4-UDP, it should be defined as
proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
            RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP

If the protocol header is repeated with the previously defined one,
the repeated part should be omitted. For example, split after ETH, ETH-IPV4
and ETH-IPV4-UDP, it should be defined as
proto_hdr0 = RTE_PTYPE_L2_ETHER
proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN
proto_hdr2 = RTE_PTYPE_L4_UDP

struct rte_eth_rxseg_split {
        struct rte_mempool *mp;
        uint16_t length;
        uint16_t offset;
        uint32_t proto_hdr;
};

If protocol header split can be supported by a PMD, the
rte_eth_buffer_split_get_supported_hdr_ptypes function can
be used to obtain a list of these protocol headers.

For example, let's suppose we configured the Rx queue with the
following segments:
        seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4,
               off0=2B
        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
        seg2 - pool2, proto_hdr2=0, off1=0B

The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like
following:
        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
        seg1 - udp header @ 128 in mbuf from pool1
        seg2 - payload @ 0 in mbuf from pool2

Now buffer split can be configured in two modes. User can choose length
or protocol header to configure buffer split according to NIC's
capability. For length based buffer split, the mp, length, offset field
in Rx packet segment should be configured, while the proto_hdr field
must be 0. For protocol header based buffer split, the mp, offset,
proto_hdr field in Rx packet segment should be configured, while the
length field must be 0.

Note: When protocol header split is enabled, NIC may receive packets
which do not match all the protocol headers within the Rx segments.
At this point, NIC will have two possible split behaviors according to
matching results, one is exact match, another is longest match.
The split result of NIC must belong to one of them.

The exact match means NIC only do split when the packets exactly match all
the protocol headers in the segments. Otherwise, the whole packet will be
put into the last valid mempool. The longest match means NIC will do split
until packets mismatch the protocol header in the segments. The rest will
be put into the last valid pool.

Pseudo-code for exact match:
FOR each seg in segs except last one
    IF proto_hdr is not matched THEN
        BREAK
    END IF
END FOR
IF loop breaked THEN
    put whole pkt in last seg
ELSE
    put protocol header in each seg
    put everything else in last seg
END IF

Pseudo-code for longest match:
FOR each seg in segs except last one
    IF proto_hdr is matched THEN
        put protocol header in seg
    ELSE
        BREAK
    END IF
END FOR
put everything else in last seg

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_11.rst |  7 ++
 lib/ethdev/rte_ethdev.c                | 95 +++++++++++++++++++++++---
 lib/ethdev/rte_ethdev.h                | 37 +++++++++-
 3 files changed, 127 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 16aca14bab..b4329d4cb0 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -193,6 +193,8 @@ New Features
 
   * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported
     header protocols of a PMD to split.
+  * Supported protocol-based buffer split using added ``proto_hdr``
+    in structure ``rte_eth_rxseg_split``.
 
 
 Removed Items
@@ -338,6 +340,11 @@ API Changes
   for per-queue packet split offload,
   which is configured by ``rte_eth_rxseg_split``.
 
+* ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is
+  replaced with ``proto_hdr`` to support protocol header based buffer split.
+  User can choose length or protocol header to configure buffer split
+  according to NIC's capability.
+
 * ethdev: Changed the type of the parameter ``rate`` of the function
   ``rte_eth_set_queue_rate_limit()`` from ``uint16_t`` to ``uint32_t``
   to support more than 64 Gbps.
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 79d1f9b993..3696d4f044 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1687,15 +1687,38 @@ rte_eth_check_rx_mempool(struct rte_mempool *mp, uint16_t offset,
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+eth_dev_buffer_split_get_supported_hdrs_helper(uint16_t port_id, uint32_t **ptypes)
+{
+	int cnt;
+
+	cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0);
+	if (cnt <= 0)
+		return cnt;
+
+	*ptypes = malloc(sizeof(uint32_t) * cnt);
+	if (*ptypes == NULL)
+		return -ENOMEM;
+
+	return rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, *ptypes, cnt);
+}
+
+static int
+rte_eth_rx_queue_check_split(uint16_t port_id,
+			const struct rte_eth_rxseg_split *rx_seg,
+			uint16_t n_seg, uint32_t *mbp_buf_size,
+			const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
 	uint32_t offset_mask;
 	uint16_t seg_idx;
 	int ret;
+	int ptype_cnt;
+	uint32_t *ptypes, prev_proto_hdrs;
+	int i;
+
+	ret = 0;
+	prev_proto_hdrs = RTE_PTYPE_UNKNOWN;
 
 	if (n_seg > seg_capa->max_nseg) {
 		RTE_ETHDEV_LOG(ERR,
@@ -1709,42 +1732,92 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 	 */
 	mp_first = rx_seg[0].mp;
 	offset_mask = RTE_BIT32(seg_capa->offset_align_log2) - 1;
+
+	ptypes = NULL;
+	ptype_cnt = eth_dev_buffer_split_get_supported_hdrs_helper(port_id, &ptypes);
+
 	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 		if (seg_idx != 0 && mp_first != mpl &&
 		    seg_capa->multi_pools == 0) {
 			RTE_ETHDEV_LOG(ERR, "Receiving to multiple pools is not supported\n");
-			return -ENOTSUP;
+			ret = -ENOTSUP;
+			goto out;
 		}
 		if (offset != 0) {
 			if (seg_capa->offset_allowed == 0) {
 				RTE_ETHDEV_LOG(ERR, "Rx segmentation with offset is not supported\n");
-				return -ENOTSUP;
+				ret = -ENOTSUP;
+				goto out;
 			}
 			if (offset & offset_mask) {
 				RTE_ETHDEV_LOG(ERR, "Rx segmentation invalid offset alignment %u, %u\n",
 					       offset,
 					       seg_capa->offset_align_log2);
-				return -EINVAL;
+				ret = -EINVAL;
+				goto out;
 			}
 		}
 
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
+		if (proto_hdr != 0) {
+			/* Split based on protocol headers. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Do not set length split and protocol split within a segment\n"
+					);
+				ret = -EINVAL;
+				goto out;
+			}
+			if ((proto_hdr & prev_proto_hdrs) != 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Repeat with previous protocol headers or proto-split after length-based split\n"
+					);
+				ret = -EINVAL;
+				goto out;
+			}
+			if (ptype_cnt <= 0) {
+				RTE_ETHDEV_LOG(ERR,
+					"Port %u failed to get supported buffer split header protocols\n",
+					port_id);
+				ret = -ENOTSUP;
+				goto out;
+			}
+			for (i = 0; i < ptype_cnt; i++) {
+				if ((prev_proto_hdrs | proto_hdr) == ptypes[i])
+					break;
+			}
+			if (i == ptype_cnt) {
+				RTE_ETHDEV_LOG(ERR,
+					"Requested Rx split header protocols 0x%x is not supported.\n",
+					proto_hdr);
+				ret = -EINVAL;
+				goto out;
+			}
+			prev_proto_hdrs |= proto_hdr;
+		} else {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			prev_proto_hdrs = RTE_PTYPE_ALL_MASK;
+		}
 
 		ret = rte_eth_check_rx_mempool(mpl, offset, length);
 		if (ret != 0)
-			return ret;
+			goto out;
 	}
-	return 0;
+out:
+	free(ptypes);
+	return ret;
 }
 
 static int
@@ -1846,7 +1919,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index f9da569179..811c029bf8 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -994,6 +994,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1015,12 +1018,44 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field must be 0.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field must be 0.
+ *     - The proto_hdr field in the last segment should be 0.
+ *
+ * - When protocol header split is enabled, NIC may receive packets
+ *   which do not match all the protocol headers within the Rx segments.
+ *   At this point, NIC will have two possible split behaviors according to
+ *   matching results, one is exact match, another is longest match.
+ *   The split result of NIC must belong to one of them.
+ *   The exact match means NIC only do split when the packets exactly match all
+ *   the protocol headers in the segments. Otherwise, the whole packet will be
+ *   put into the last valid mempool. The longest match means NIC will do split
+ *   until packets mismatch the protocol header in the segments. The rest will
+ *   be put into the last valid pool.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**
+	 * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*,
+	 * configures split point. The last RTE_PTYPE* in the mask indicates the
+	 * split position.
+	 *
+	 * If one protocol header is defined to split packets into two segments,
+	 * for non-tunneling packets, the complete protocol sequence should be defined.
+	 * For tunneling packets, for simplicity, only the tunnel and inner part of
+	 * comple protocol sequence is required.
+	 * If several protocol headers are defined to split packets into multi-segments,
+	 * the repeated parts of adjacent segments should be omitted.
+	 */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
                     ` (2 preceding siblings ...)
  2022-10-09 20:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
@ 2022-10-09 20:25   ` Yuan Wang
  2022-10-09 20:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path Yuan Wang
  4 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-09 20:25 UTC (permalink / raw)
  To: dev, Aman Singh, Yuying Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	qi.z.zhang, qiming.yang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add command line parameter:
--rxhdrs=eth[,ipv4]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs eth,ipv4,ipv4-udp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with multiple mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs eth,eth-ipv4
        (default protocols of testpmd : eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|
         ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|grenat|inner-eth|
         inner-ipv4|inner-ipv6|inner-ipv4-tcp|inner-ipv6-tcp|
         inner-ipv4-udp|inner-ipv6-udp|inner-ipv4-sctp|inner-ipv6-sctp)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 152 +++++++++++++++++++-
 app/test-pmd/config.c                       | 108 ++++++++++++++
 app/test-pmd/parameters.c                   |  16 ++-
 app/test-pmd/testpmd.c                      |  11 +-
 app/test-pmd/testpmd.h                      |   6 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  19 ++-
 6 files changed, 303 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4565a3953a..57ac6828d0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -181,7 +181,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -305,6 +305,17 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (eth[,ipv4])*\n"
+			"    Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n"
+			"    Supported values: eth|ipv4|ipv6|ipv4-tcp|ipv6-tcp|"
+			"ipv4-udp|ipv6-udp|ipv4-sctp|ipv6-sctp|"
+			"grenat|inner-eth|inner-ipv4|inner-ipv6|inner-ipv4-tcp|"
+			"inner-ipv6-tcp|inner-ipv4-udp|inner-ipv6-udp|"
+			"inner-ipv4-sctp|inner-ipv6-sctp\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3366,6 +3377,94 @@ static cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+
+	if (!strcmp(value, "eth"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "ipv4-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "ipv4-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "ipv4-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "ipv6-tcp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "ipv6-udp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "ipv6-sctp"))
+		protocol = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "grenat"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT;
+	else if (!strcmp(value, "inner-eth"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner-ipv4"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN;
+	else if (!strcmp(value, "inner-ipv6"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN;
+	else if (!strcmp(value, "inner-ipv4-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner-ipv4-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner-ipv4-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "inner-ipv6-tcp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner-ipv6-udp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner-ipv6-sctp"))
+		protocol = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+				RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unsupported protocol: %s\n", value);
+		protocol = RTE_PTYPE_UNKNOWN;
+	}
+
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+	unsigned int cur_item, prev_items = 0;
+
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		cur_item = get_ptype(cur);
+		cur_item &= ~prev_items;
+		parsed_items[nb_item] = cur_item;
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+		prev_items |= cur_item;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3735,6 +3834,50 @@ static cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t values;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->values, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+static cmdline_parse_token_string_t cmd_set_rxhdrs_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				set, "set");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_rxhdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				rxhdrs, "rxhdrs");
+static cmdline_parse_token_string_t cmd_set_rxhdrs_values =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				values, NULL);
+
+static cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <eth[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_set,
+		(void *)&cmd_set_rxhdrs_rxhdrs,
+		(void *)&cmd_set_rxhdrs_values,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -6487,6 +6630,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -6499,12 +6644,12 @@ static cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 static cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 static cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -12455,6 +12600,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 841e8efe78..dec16a9049 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4889,6 +4889,114 @@ show_rx_pkt_segments(void)
 	}
 }
 
+static const char *get_ptype_str(uint32_t ptype)
+{
+	if ((ptype & (RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP)) ==
+		(RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP))
+		return "ipv4-tcp";
+	else if ((ptype & (RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP)) ==
+		(RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP))
+		return "ipv4-udp";
+	else if ((ptype & (RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP)) ==
+		(RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP))
+		return "ipv4-sctp";
+	else if ((ptype & (RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP)) ==
+		(RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP))
+		return "ipv6-tcp";
+	else if ((ptype & (RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP)) ==
+		(RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP))
+		return "ipv6-udp";
+	else if ((ptype & (RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP)) ==
+		(RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP))
+		return "ipv6-sctp";
+	else if ((ptype & RTE_PTYPE_L4_TCP) == RTE_PTYPE_L4_TCP)
+		return "tcp";
+	else if ((ptype & RTE_PTYPE_L4_UDP) == RTE_PTYPE_L4_UDP)
+		return "udp";
+	else if ((ptype & RTE_PTYPE_L4_SCTP) == RTE_PTYPE_L4_SCTP)
+		return "sctp";
+	else if ((ptype & RTE_PTYPE_L3_IPV4_EXT_UNKNOWN) == RTE_PTYPE_L3_IPV4_EXT_UNKNOWN)
+		return "ipv4";
+	else if ((ptype & RTE_PTYPE_L3_IPV6_EXT_UNKNOWN) == RTE_PTYPE_L3_IPV6_EXT_UNKNOWN)
+		return "ipv6";
+	else if ((ptype & RTE_PTYPE_L2_ETHER) == RTE_PTYPE_L2_ETHER)
+		return "eth";
+
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP)) ==
+		(RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP))
+		return "inner-ipv4-tcp";
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP)) ==
+		(RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP))
+		return "inner-ipv4-udp";
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP)) ==
+		(RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP))
+		return "inner-ipv4-sctp";
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP)) ==
+		(RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP))
+		return "inner-ipv6-tcp";
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP)) ==
+		(RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP))
+		return "inner-ipv6-udp";
+	else if ((ptype & (RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP)) ==
+		(RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP))
+		return "inner-ipv6-sctp";
+	else if ((ptype & RTE_PTYPE_INNER_L4_TCP) == RTE_PTYPE_INNER_L4_TCP)
+		return "inner-tcp";
+	else if ((ptype & RTE_PTYPE_INNER_L4_UDP) == RTE_PTYPE_INNER_L4_UDP)
+		return "inner-udp";
+	else if ((ptype & RTE_PTYPE_INNER_L4_SCTP) == RTE_PTYPE_INNER_L4_SCTP)
+		return "inner-sctp";
+	else if ((ptype & RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN) ==
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN)
+		return "inner-ipv4";
+	else if ((ptype & RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN) ==
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN)
+		return "inner-ipv6";
+	else if ((ptype & RTE_PTYPE_INNER_L2_ETHER) == RTE_PTYPE_INNER_L2_ETHER)
+		return "inner-eth";
+	else if ((ptype & RTE_PTYPE_TUNNEL_GRENAT) == RTE_PTYPE_TUNNEL_GRENAT)
+		return "grenat";
+	else
+		return "unsupported";
+}
+
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i < n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("payload\n");
+	}
+}
+
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs + 1 > MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u > "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs + 1);
+		return;
+	}
+
+	memset(rx_pkt_hdr_protos, 0, sizeof(rx_pkt_hdr_protos));
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t)seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = nb_segs + 1;
+}
+
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 14752f9571..ff760460ec 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -152,6 +152,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=eth[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -660,6 +661,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1254,7 +1256,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1264,6 +1265,19 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index bb1c901742..5b0f0838dc 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -247,6 +247,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2668,12 +2669,16 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i >= mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		if (rx_pkt_hdr_protos[i] != 0 && rx_pkt_seg_lengths[i] == 0) {
+			rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
+		} else {
+			rx_seg->length = rx_pkt_seg_lengths[i] ?
+					rx_pkt_seg_lengths[i] :
+					mbuf_data_size[mp_n];
+		}
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index ca2408cb6b..e65be323b8 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -580,6 +580,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -851,6 +852,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmd_reconfig_device_queue(portid_t id, uint8_t dev, uint8_t queue);
 void cmdline_read_from_file(const char *filename);
@@ -1006,6 +1010,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 1cf814ae89..fdad100944 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -278,7 +278,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes)
 
 The available information categories are:
 
@@ -290,7 +290,9 @@ The available information categories are:
 
 * ``rxoffs``: Packet offsets for RX split.
 
-* ``rxpkts``: Packets to RX split configuration.
+* ``rxpkts``: Packets to RX length-based split configuration.
+
+* ``rxhdrs``: Packets to RX proto-based split configuration.
 
 * ``txpkts``: Packets to TX configuration.
 
@@ -799,6 +801,19 @@ mbuf for remaining segments will be allocated from the last valid pool).
 Where x[,y]* represents a CSV list of values, without white space. Zero value
 means to use the corresponding memory pool data buffer size.
 
+set rxhdrs
+~~~~~~~~~~
+
+Set the protocol headers of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split
+offloads (currently BUFFER_SPLIT is supported only).
+
+   testpmd> set rxhdrs (eth[,ipv4]*)
+
+Where eth[,ipv4]* represents a CSV list of values, without white space. If the list
+of offsets is shorter than the list of segments the zero offsets will be used
+for the remaining segments.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v9 4/4] net/ice: support buffer split in Rx path
  2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
                     ` (3 preceding siblings ...)
  2022-10-09 20:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
@ 2022-10-09 20:25   ` Yuan Wang
  4 siblings, 0 replies; 72+ messages in thread
From: Yuan Wang @ 2022-10-09 20:25 UTC (permalink / raw)
  To: dev, Ferruh Yigit, Qiming Yang, Qi Zhang
  Cc: thomas, andrew.rybchenko, ferruh.yigit, mdr, xiaoyun.li,
	aman.deep.singh, yuying.zhang, jerinjacobk, viacheslavo, stephen,
	xuan.ding, hpothula, yaqi.tang, Yuan Wang, Wenxuan Wu

Add support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new API ice_buffer_split_supported_hdr_ptypes_get() has been
introduced, it will return the supported header protocols of ice PMD
to app for splitting.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/nics/features/default.ini   |   1 +
 doc/guides/nics/features/ice.ini       |   1 +
 doc/guides/rel_notes/release_22_11.rst |   4 +
 drivers/net/ice/ice_ethdev.c           |  58 +++++-
 drivers/net/ice/ice_rxtx.c             | 263 ++++++++++++++++++++++---
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 7 files changed, 314 insertions(+), 32 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 05e47d7552..1c736ca1aa 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -7,6 +7,7 @@
 ; string should not exceed feature_str_len defined in conf.py.
 ;
 [Features]
+Buffer Split on Rx   =
 Speed capabilities   =
 Link status          =
 Link status event    =
diff --git a/doc/guides/nics/features/ice.ini b/doc/guides/nics/features/ice.ini
index 2f4a5a9a30..b72e83e42e 100644
--- a/doc/guides/nics/features/ice.ini
+++ b/doc/guides/nics/features/ice.ini
@@ -7,6 +7,7 @@
 ; is selected.
 ;
 [Features]
+Buffer Split on Rx   = P
 Speed capabilities   = Y
 Link status          = Y
 Link status event    = Y
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index b4329d4cb0..537cdeee61 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -196,6 +196,10 @@ New Features
   * Supported protocol-based buffer split using added ``proto_hdr``
     in structure ``rte_eth_rxseg_split``.
 
+* **Updated Intel ice driver.**
+
+  * Added protocol based buffer split support in scalar path.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 6e21c38152..8618a3e6b7 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -161,6 +161,7 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static const uint32_t *ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -275,6 +276,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
 	.tm_ops_get                   = ice_tm_ops_get,
+	.buffer_split_supported_hdr_ptypes_get = ice_buffer_split_supported_hdr_ptypes_get,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3802,7 +3804,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3814,7 +3817,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3883,6 +3886,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5960,6 +5968,52 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static const uint32_t *
+ice_buffer_split_supported_hdr_ptypes_get(struct rte_eth_dev *dev __rte_unused)
+{
+	/* Buffer split protocol header capability. */
+	static const uint32_t ptypes[] = {
+		/* Non tunneled */
+		RTE_PTYPE_L2_ETHER,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_UDP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_TCP,
+		RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_L4_SCTP,
+
+		/* Tunneled */
+		RTE_PTYPE_TUNNEL_GRENAT,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_TCP,
+		RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER |
+		RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_SCTP,
+
+		RTE_PTYPE_UNKNOWN
+	};
+
+	return ptypes;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index d1e1fadf9d..697251c603 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -259,7 +259,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -288,11 +287,91 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		uint32_t proto_hdr;
+		proto_hdr = rxq->rxseg[0].proto_hdr;
+
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L4_MASK) {
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L3_MASK) {
+		case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_L2_MASK) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_TUNNEL_MASK) {
+		case RTE_PTYPE_TUNNEL_GRENAT:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_ALWAYS;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L4_MASK) {
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			goto set_hsplit_finish;
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L3_MASK) {
+		case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+		case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			goto set_hsplit_finish;
+		}
+
+		switch (proto_hdr & RTE_PTYPE_INNER_L2_MASK) {
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			goto set_hsplit_finish;
+		}
+
+		PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+		return -EINVAL;
+
+set_hsplit_finish:
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -378,6 +457,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -385,8 +465,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -394,9 +472,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		} else {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -420,14 +521,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -719,7 +820,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1053,6 +1154,8 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
+	uint16_t i;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1064,6 +1167,15 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1 && !(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+		PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+				dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1075,12 +1187,24 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		for (i = 0; i < n_seg; i++)
+			memcpy(&rxq->rxseg[i], &rx_conf->rx_seg[i].split,
+				sizeof(struct rte_eth_rxseg_split));
+
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1551,7 +1675,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1606,6 +1730,27 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			} else {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+#ifdef RTE_ETHDEV_DEBUG_RX
+				rte_pktmbuf_dump(stdout, mb, rte_pktmbuf_pkt_len(mb));
+#endif
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1697,7 +1842,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1710,6 +1857,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1718,13 +1874,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		} else {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbufs_pay[i]));
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = pay_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2333,11 +2497,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2365,12 +2531,13 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2383,24 +2550,60 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		} else {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
+
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (!(rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+			rte_pktmbuf_dump(stdout, rxm, rte_pktmbuf_pkt_len(rxm));
+#endif
+		}
+
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index e1d4fe8e47..4947d5c25f 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -45,6 +48,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -55,6 +63,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -101,6 +115,8 @@ struct ice_rx_queue {
 	uint32_t hw_time_high; /* high 32 bits of timestamp */
 	uint32_t hw_time_low; /* low 32 bits of timestamp */
 	uint64_t hw_time_update; /* SW time of HW record updating */
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: [PATCH v9 0/4] support protocol based buffer split
  2022-10-09 14:58   ` Andrew Rybchenko
@ 2022-10-10  2:45     ` Ding, Xuan
  0 siblings, 0 replies; 72+ messages in thread
From: Ding, Xuan @ 2022-10-10  2:45 UTC (permalink / raw)
  To: Andrew Rybchenko, Wang, YuanX, dev
  Cc: thomas, ferruh.yigit, mdr, Li, Xiaoyun, Singh, Aman Deep, Zhang,
	Yuying, Zhang, Qi Z, Yang, Qiming, jerinjacobk, viacheslavo,
	stephen, hpothula, Tang, Yaqi

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Sunday, October 9, 2022 10:59 PM
> To: Wang, YuanX <yuanx.wang@intel.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; ferruh.yigit@xilinx.com; mdr@ashroe.eu; Li,
> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yang, Qiming <qiming.yang@intel.com>;
> jerinjacobk@gmail.com; viacheslavo@nvidia.com;
> stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> hpothula@marvell.com; Tang, Yaqi <yaqi.tang@intel.com>
> Subject: Re: [PATCH v9 0/4] support protocol based buffer split
> 
> On 10/9/22 23:25, Yuan Wang wrote:
> > Protocol type based buffer split consists of splitting a received
> > packet into several separate segments based on the packet content. It
> > is useful in some scenarios, such as GPU acceleration. The splitting
> > will help to enable true zero copy and hence improve the performance
> significantly.
> >
> > This patchset aims to support protocol header split based on current
> > buffer split. When Rx queue is configured with
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload and corresponding protocol,
> > packets received will be directly split into different mempools.
> >
> > Change log:
> > v9:
> > Define the intend behaviors for exact match and longest match.
> > Add protocol headers repeat check.
> > Add no proto-split after length-based split check.
> > Add a helper function to short the check function.
> > Refine the doc and commit log.
> 
> With few minor fixes, applied to dpdk-next-net/main, thanks.

Sincere thanks to all the reviewers who provided valuable comments on this patch series.

Regards,
Xuan

> 


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2022-10-10  2:45 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 18:15 [PATCH 0/4] support protocol based buffer split Yuan Wang
2022-08-12 18:15 ` [PATCH 1/4] ethdev: introduce protocol header API Yuan Wang
2022-08-12 18:15 ` [PATCH 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-08-12 18:15 ` [PATCH 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-08-12 18:15 ` [PATCH 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-01 22:33 ` [PATCH v2 0/4] support protocol based buffer split Yuan Wang
2022-09-01 22:34 ` [PATCH v2 1/4] ethdev: introduce protocol header API Yuan Wang
2022-09-01 22:35 ` [PATCH v2 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-09-01 22:36 ` [PATCH v2 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-09-01 22:37 ` [PATCH v2 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-02 19:10 ` [PATCH v3 0/4] support protocol based buffer split Yuan Wang
2022-09-02 19:10   ` [PATCH v3 1/4] ethdev: introduce protocol header API Yuan Wang
2022-09-12 11:24     ` Andrew Rybchenko
2022-09-16  8:34       ` Wang, YuanX
2022-09-02 19:10   ` [PATCH v3 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-09-12 11:47     ` Andrew Rybchenko
2022-09-16  8:38       ` Wang, YuanX
2022-09-20  5:35         ` Andrew Rybchenko
2022-09-22  3:13           ` Wang, YuanX
2022-09-13  7:56     ` Suanming Mou
2022-09-16  8:39       ` Wang, YuanX
2022-09-02 19:10   ` [PATCH v3 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-09-02 19:10   ` [PATCH v3 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-20 11:12 ` [PATCH v4 0/4] support protocol based buffer split Yuan Wang
2022-09-20 11:12   ` [PATCH v4 1/4] ethdev: introduce protocol header API Yuan Wang
2022-09-20 11:12   ` [PATCH v4 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-09-20 11:12   ` [PATCH v4 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-09-20 11:12   ` [PATCH v4 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-26  9:40 ` [PATCH v5 0/4] support protocol based buffer split Yuan Wang
2022-09-26  9:40   ` [PATCH v5 1/4] ethdev: introduce protocol header API Yuan Wang
2022-09-26  9:40   ` [PATCH v5 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-09-28 15:42     ` Wang, YuanX
2022-09-26  9:40   ` [PATCH v5 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-09-26  9:40   ` [PATCH v5 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-29 18:59 ` [PATCH v6 0/4] support protocol based buffer split Yuan Wang
2022-09-29 18:59   ` [PATCH v6 1/4] ethdev: introduce protocol header API Yuan Wang
2022-09-29 18:59   ` [PATCH v6 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-09-29 18:59   ` [PATCH v6 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-09-29 18:59   ` [PATCH v6 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-09-30  6:45     ` Tang, Yaqi
2022-10-01 21:05 ` [PATCH v7 0/4] support protocol based buffer split Yuan Wang
2022-10-01 21:05   ` [PATCH v7 1/4] ethdev: introduce protocol header API Yuan Wang
2022-10-03  7:04     ` Andrew Rybchenko
2022-10-04  2:21       ` Wang, YuanX
2022-10-04  7:52         ` Andrew Rybchenko
2022-10-04 15:00           ` Wang, YuanX
2022-10-01 21:05   ` [PATCH v7 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-10-02  4:01     ` Wang, YuanX
2022-10-03  7:47     ` Andrew Rybchenko
2022-10-04  2:48       ` Wang, YuanX
2022-10-04  8:22         ` Andrew Rybchenko
2022-10-04 15:01           ` Wang, YuanX
2022-10-01 21:05   ` [PATCH v7 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-10-01 21:05   ` [PATCH v7 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-10-05 23:18 ` [PATCH v8 0/4] support protocol based buffer split Yuan Wang
2022-10-05 23:18   ` [PATCH v8 1/4] ethdev: introduce protocol header API Yuan Wang
2022-10-06 10:11     ` Andrew Rybchenko
2022-10-05 23:18   ` [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-10-06 10:11     ` Andrew Rybchenko
2022-10-08 14:30       ` Ding, Xuan
2022-10-05 23:18   ` [PATCH v8 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-10-06 10:12     ` Andrew Rybchenko
2022-10-05 23:18   ` [PATCH v8 4/4] net/ice: support buffer split in Rx path Yuan Wang
2022-10-06 10:12     ` Andrew Rybchenko
2022-10-06 10:13   ` [PATCH v8 0/4] support protocol based buffer split Andrew Rybchenko
2022-10-09 20:25 ` [PATCH v9 " Yuan Wang
2022-10-09 14:58   ` Andrew Rybchenko
2022-10-10  2:45     ` Ding, Xuan
2022-10-09 20:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API Yuan Wang
2022-10-09 20:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split Yuan Wang
2022-10-09 20:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters Yuan Wang
2022-10-09 20:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path Yuan Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).