DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] ethdev: introduce protocol type based header split
@ 2022-03-03  6:01 xuan.ding
  2022-03-03  8:55 ` Thomas Monjalon
                   ` (10 more replies)
  0 siblings, 11 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-03  6:01 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, viacheslavo, qi.z.zhang, ping.yu, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. Splitting is usually between the
packet header that can be posted to a dedicated buffer and the packet
payload that can be posted to a different buffer. This kind of splitting
is useful in some use cases, such as GPU. GPU can directly process the
payload part and improve the performance significantly.

Currently, Rx buffer split supports length and offset based packet split.
This is not suitable for some NICs that do split based on protocol types.
Tunneling makes the conversion from offset to protocol inaccurate.

This patch extends the current buffer split to support protocol based
header split. A new proto field is introduced in the rte_eth_rxseg_split
structure reserved field to specify header split type.

With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol
type configured, PMD will split the ingress packets into two separate
regions. Currently, L2/L3/L4 level header split is supported.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 lib/ethdev/rte_ethdev.c |  2 +-
 lib/ethdev/rte_ethdev.h | 17 ++++++++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 70c850a2f1..d37c8f9d7e 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1784,7 +1784,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 							   &dev_info);
 			if (ret != 0)
 				return ret;
-		} else {
+		} else if (!(rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)) {
 			RTE_ETHDEV_LOG(ERR, "No Rx segmentation offload configured\n");
 			return -EINVAL;
 		}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c2d1f9a972..6743648c22 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint16_t proto;
+	uint16_t reserved; /**< Reserved field. */
 };
 
 /**
@@ -1664,6 +1665,20 @@ struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice.
+ * This enum indicates the header split protocol type
+ */
+enum rte_eth_rx_header_split_protocol_type {
+	RTE_ETH_RX_HEADER_SPLIT_DEFAULT = 0,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L2,
+	RTE_ETH_RX_HEADER_SPLIT_OUTER_L2,
+	RTE_ETH_RX_HEADER_SPLIT_IP,
+	RTE_ETH_RX_HEADER_SPLIT_TCP_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_SCTP
+};
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
@ 2022-03-03  8:55 ` Thomas Monjalon
  2022-03-08  7:48   ` Ding, Xuan
  2022-03-03 16:15 ` Stephen Hemminger
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-03-03  8:55 UTC (permalink / raw)
  To: xuan.ding
  Cc: ferruh.yigit, andrew.rybchenko, dev, viacheslavo, qi.z.zhang,
	ping.yu, Xuan Ding, Yuan Wang, ajit.khaparde, jerinj

03/03/2022 07:01, xuan.ding@intel.com:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> Header split consists of splitting a received packet into two separate
> regions based on the packet content. Splitting is usually between the
> packet header that can be posted to a dedicated buffer and the packet
> payload that can be posted to a different buffer. This kind of splitting
> is useful in some use cases, such as GPU. GPU can directly process the
> payload part and improve the performance significantly.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> This is not suitable for some NICs that do split based on protocol types.
> Tunneling makes the conversion from offset to protocol inaccurate.
> 
> This patch extends the current buffer split to support protocol based
> header split. A new proto field is introduced in the rte_eth_rxseg_split
> structure reserved field to specify header split type.
> 
> With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol
> type configured, PMD will split the ingress packets into two separate
> regions. Currently, L2/L3/L4 level header split is supported.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> ---
>  lib/ethdev/rte_ethdev.c |  2 +-
>  lib/ethdev/rte_ethdev.h | 17 ++++++++++++++++-
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 70c850a2f1..d37c8f9d7e 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1784,7 +1784,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  							   &dev_info);
>  			if (ret != 0)
>  				return ret;
> -		} else {
> +		} else if (!(rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)) {
>  			RTE_ETHDEV_LOG(ERR, "No Rx segmentation offload configured\n");
>  			return -EINVAL;
>  		}
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c2d1f9a972..6743648c22 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	uint16_t proto;

If it is not explicitly documented, it cannot be accepted.

What happens if we have a non-0 proto and length/offset defined?

> +	uint16_t reserved; /**< Reserved field. */
>  };
[...]
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice.

This is not a structure.

> + * This enum indicates the header split protocol type
> + */
> +enum rte_eth_rx_header_split_protocol_type {
> +	RTE_ETH_RX_HEADER_SPLIT_DEFAULT = 0,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L2,
> +	RTE_ETH_RX_HEADER_SPLIT_OUTER_L2,
> +	RTE_ETH_RX_HEADER_SPLIT_IP,
> +	RTE_ETH_RX_HEADER_SPLIT_TCP_UDP,
> +	RTE_ETH_RX_HEADER_SPLIT_SCTP
> +};

Lack of documentation.
Where the split should happen? before or after the header?
What means DEFAULT?
What means IP, TCP_UDP and SCTP? Is it inner or outer?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
  2022-03-03  8:55 ` Thomas Monjalon
@ 2022-03-03 16:15 ` Stephen Hemminger
  2022-03-04  9:58   ` Zhang, Qi Z
  2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 88+ messages in thread
From: Stephen Hemminger @ 2022-03-03 16:15 UTC (permalink / raw)
  To: xuan.ding
  Cc: thomas, ferruh.yigit, andrew.rybchenko, dev, viacheslavo,
	qi.z.zhang, ping.yu, Yuan Wang

On Thu,  3 Mar 2022 06:01:36 +0000
xuan.ding@intel.com wrote:

> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c2d1f9a972..6743648c22 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	uint16_t proto;
> +	uint16_t reserved; /**< Reserved field. */
>  };

This feature suffers from a common bad design pattern.
You can't just start using reserved fields unless the previous versions
enforced that the field was a particular value (usually zero).

There is no guarantee that application will initialize these reserved
fields and now using them risks breaking the API/ABI. It looks like

rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,

Would have had to check in previous release.

This probably has to wait until 22.11 next API release.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC] ethdev: introduce protocol type based header split
  2022-03-03 16:15 ` Stephen Hemminger
@ 2022-03-04  9:58   ` Zhang, Qi Z
  2022-03-04 11:54     ` Morten Brørup
  2022-03-04 17:32     ` Stephen Hemminger
  0 siblings, 2 replies; 88+ messages in thread
From: Zhang, Qi Z @ 2022-03-04  9:58 UTC (permalink / raw)
  To: Stephen Hemminger, Ding, Xuan
  Cc: thomas, Yigit, Ferruh, andrew.rybchenko, dev, viacheslavo, Yu,
	Ping, Wang, YuanX



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, March 4, 2022 12:16 AM
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> andrew.rybchenko@oktetlabs.ru; dev@dpdk.org; viacheslavo@nvidia.com;
> Zhang, Qi Z <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: Re: [RFC] ethdev: introduce protocol type based header split
> 
> On Thu,  3 Mar 2022 06:01:36 +0000
> xuan.ding@intel.com wrote:
> 
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c2d1f9a972..6743648c22 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
> >  	struct rte_mempool *mp; /**< Memory pool to allocate segment from.
> */
> >  	uint16_t length; /**< Segment data length, configures split point. */
> >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	uint16_t proto;
> > +	uint16_t reserved; /**< Reserved field. */
> >  };
> 
> This feature suffers from a common bad design pattern.
> You can't just start using reserved fields unless the previous versions enforced
> that the field was a particular value (usually zero).

Yes, agree, that's a mistake there is no document for the reserved field in the previous release, and usually, it should be zero, 
And I think one of the typical purposes of the reserved field is to make life easy for new feature adding without breaking ABI.
So, should we just take the risk as I guess it might not be a big deal in real cases? 

Thanks
Qi



> 
> There is no guarantee that application will initialize these reserved fields and
> now using them risks breaking the API/ABI. It looks like
> 
> rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> 
> Would have had to check in previous release.
> 
> This probably has to wait until 22.11 next API release.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC] ethdev: introduce protocol type based header split
  2022-03-04  9:58   ` Zhang, Qi Z
@ 2022-03-04 11:54     ` Morten Brørup
  2022-03-04 17:32     ` Stephen Hemminger
  1 sibling, 0 replies; 88+ messages in thread
From: Morten Brørup @ 2022-03-04 11:54 UTC (permalink / raw)
  To: Zhang, Qi Z, Stephen Hemminger, Ding, Xuan
  Cc: thomas, Yigit, Ferruh, andrew.rybchenko, dev, viacheslavo, Yu,
	Ping, Wang, YuanX

> From: Zhang, Qi Z [mailto:qi.z.zhang@intel.com]
> Sent: Friday, 4 March 2022 10.58
> 
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Friday, March 4, 2022 12:16 AM
> >
> > On Thu,  3 Mar 2022 06:01:36 +0000
> > xuan.ding@intel.com wrote:
> >
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index
> > > c2d1f9a972..6743648c22 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
> > >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from.
> > */
> > >  	uint16_t length; /**< Segment data length, configures split
> point. */
> > >  	uint16_t offset; /**< Data offset from beginning of mbuf data
> buffer. */
> > > -	uint32_t reserved; /**< Reserved field. */
> > > +	uint16_t proto;
> > > +	uint16_t reserved; /**< Reserved field. */
> > >  };
> >
> > This feature suffers from a common bad design pattern.
> > You can't just start using reserved fields unless the previous
> versions enforced
> > that the field was a particular value (usually zero).
> 
> Yes, agree, that's a mistake there is no document for the reserved
> field in the previous release, and usually, it should be zero,
> And I think one of the typical purposes of the reserved field is to
> make life easy for new feature adding without breaking ABI.
> So, should we just take the risk as I guess it might not be a big deal
> in real cases?
> 

In this specific case, I think it can be done with very low risk in real cases.

Assuming that splitting based on fixed length and protocol header parsing is mutually exclusive, the PMDs can simply ignore the "proto" field (and log a warning about it) if the length field is non-zero. This will provide backwards compatibility with applications not zeroing out the 32 bit "reserved" field.

> Thanks
> Qi
> 
> 
> 
> >
> > There is no guarantee that application will initialize these reserved
> fields and
> > now using them risks breaking the API/ABI. It looks like
> >
> > rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> *rx_seg,
> >
> > Would have had to check in previous release.
> >
> > This probably has to wait until 22.11 next API release.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [RFC] ethdev: introduce protocol type based header split
  2022-03-04  9:58   ` Zhang, Qi Z
  2022-03-04 11:54     ` Morten Brørup
@ 2022-03-04 17:32     ` Stephen Hemminger
  1 sibling, 0 replies; 88+ messages in thread
From: Stephen Hemminger @ 2022-03-04 17:32 UTC (permalink / raw)
  To: Zhang, Qi Z
  Cc: Ding, Xuan, thomas, Yigit, Ferruh, andrew.rybchenko, dev,
	viacheslavo, Yu, Ping, Wang, YuanX

On Fri, 4 Mar 2022 09:58:11 +0000
"Zhang, Qi Z" <qi.z.zhang@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Friday, March 4, 2022 12:16 AM
> > To: Ding, Xuan <xuan.ding@intel.com>
> > Cc: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > andrew.rybchenko@oktetlabs.ru; dev@dpdk.org; viacheslavo@nvidia.com;
> > Zhang, Qi Z <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wang,
> > YuanX <yuanx.wang@intel.com>
> > Subject: Re: [RFC] ethdev: introduce protocol type based header split
> > 
> > On Thu,  3 Mar 2022 06:01:36 +0000
> > xuan.ding@intel.com wrote:
> >   
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > > c2d1f9a972..6743648c22 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
> > >  	struct rte_mempool *mp; /**< Memory pool to allocate segment from.  
> > */  
> > >  	uint16_t length; /**< Segment data length, configures split point. */
> > >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> > > -	uint32_t reserved; /**< Reserved field. */
> > > +	uint16_t proto;
> > > +	uint16_t reserved; /**< Reserved field. */
> > >  };  
> > 
> > This feature suffers from a common bad design pattern.
> > You can't just start using reserved fields unless the previous versions enforced
> > that the field was a particular value (usually zero).  
> 
> Yes, agree, that's a mistake there is no document for the reserved field in the previous release, and usually, it should be zero, 
> And I think one of the typical purposes of the reserved field is to make life easy for new feature adding without breaking ABI.
> So, should we just take the risk as I guess it might not be a big deal in real cases? 

There is a cost/benefit tradeoff here. Although HW vendors would like to enable
more features, it really is not that much of an impact to wait until next LTS
for users.

Yes, the API/ABI rules are restrictive, but IMHO it is about learning how to
handle SW upgrades in a more user friendly manner. It was hard for the Linux
kernel to learn how to do this, but after 10 years they mostly have it right.

If this were a bug (especially a security bug), then the rules could be lifted.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC] ethdev: introduce protocol type based header split
  2022-03-03  8:55 ` Thomas Monjalon
@ 2022-03-08  7:48   ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-03-08  7:48 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Yigit, Ferruh, andrew.rybchenko, dev, viacheslavo, Zhang, Qi Z,
	Yu, Ping, Wang, YuanX, ajit.khaparde, jerinj, mb,
	Stephen Hemminger

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: 2022年3月3日 16:55
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru;
> dev@dpdk.org; viacheslavo@nvidia.com; Zhang, Qi Z <qi.z.zhang@intel.com>;
> Yu, Ping <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>; ajit.khaparde@broadcom.com;
> jerinj@marvell.com
> Subject: Re: [RFC] ethdev: introduce protocol type based header split
> 
> 03/03/2022 07:01, xuan.ding@intel.com:
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. Splitting is usually between the
> > packet header that can be posted to a dedicated buffer and the packet
> > payload that can be posted to a different buffer. This kind of
> > splitting is useful in some use cases, such as GPU. GPU can directly
> > process the payload part and improve the performance significantly.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > This is not suitable for some NICs that do split based on protocol types.
> > Tunneling makes the conversion from offset to protocol inaccurate.
> >
> > This patch extends the current buffer split to support protocol based
> > header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header split type.
> >
> > With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> > protocol type configured, PMD will split the ingress packets into two
> > separate regions. Currently, L2/L3/L4 level header split is supported.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > ---
> >  lib/ethdev/rte_ethdev.c |  2 +-
> >  lib/ethdev/rte_ethdev.h | 17 ++++++++++++++++-
> >  2 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 70c850a2f1..d37c8f9d7e 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1784,7 +1784,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t
> rx_queue_id,
> >  							   &dev_info);
> >  			if (ret != 0)
> >  				return ret;
> > -		} else {
> > +		} else if (!(rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT))
> > +{
> >  			RTE_ETHDEV_LOG(ERR, "No Rx segmentation offload
> configured\n");
> >  			return -EINVAL;
> >  		}
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c2d1f9a972..6743648c22 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1202,7 +1202,8 @@ struct rte_eth_rxseg_split {
> >  	struct rte_mempool *mp; /**< Memory pool to allocate segment from.
> */
> >  	uint16_t length; /**< Segment data length, configures split point. */
> >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	uint16_t proto;
> 
> If it is not explicitly documented, it cannot be accepted.

Thanks for your suggestion. The documentation will be enriched in next version.
Let me give a brief introduction here.

> 
> What happens if we have a non-0 proto and length/offset defined?
 
As Morten said, the proto field is exclude from the length field here.
For buffer split, the length/offset is needed.
For header split, the proto is needed.
As for offset field in header split, by default it is zero, it can also be
configured to decide the beginning of mbuf data buffer.

In conclusion, non-0 proto indicates PMDs can do header split. 
Length defines PMDs can do buffer split.

> 
> > +	uint16_t reserved; /**< Reserved field. */
> >  };
> [...]
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change without prior notice.
> 
> This is not a structure.
> 
> > + * This enum indicates the header split protocol type  */ enum
> > +rte_eth_rx_header_split_protocol_type {
> > +	RTE_ETH_RX_HEADER_SPLIT_DEFAULT = 0,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_L2,
> > +	RTE_ETH_RX_HEADER_SPLIT_OUTER_L2,
> > +	RTE_ETH_RX_HEADER_SPLIT_IP,
> > +	RTE_ETH_RX_HEADER_SPLIT_TCP_UDP,
> > +	RTE_ETH_RX_HEADER_SPLIT_SCTP
> > +};
> 
> Lack of documentation.
> Where the split should happen? before or after the header? 

When header split is configured, the split happens at the boundary of header and payload.
So, after the header, before payload.

> What means DEFAULT?
 
DEFAULT means no header split protocol type was defined.
As this time, even header split offload is configured in Rx queue,
the PMD won't do header split. Actually, using NONE is more accurate.

> What means IP, TCP_UDP and SCTP? Is it inner or outer?

Since header split happens after the header, so the IP/TCP/UDP/SCTP defines
the header type.
When take inner and outer into consideration, The definition should be refined here.
For example:
rte_eth_rx_header_split_protocol_type {
	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
	RTE_ETH_RX_HEADER_SPLIT_MAC,
	RTE_ETH_RX_HEADER_SPLIT_IPV4,
	RTE_ETH_RX_HEADER_SPLIT_IPV6,
	RTE_ETH_RX_HEADER_SPLIT_L3,
	RTE_ETH_RX_HEADER_SPLIT_TCP,
	RTE_ETH_RX_HEADER_SPLIT_UDP,
	RTE_ETH_RX_HEADER_SPLIT_SCTP,
	RTE_ETH_RX_HEADER_SPLIT_L4,
	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
};

Considering some NICs don’t distinguish the L2/L3/L4 in header split,
a separate L2/L3/L4 is also defined.

Thanks,
Xuan


> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v2 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
  2022-03-03  8:55 ` Thomas Monjalon
  2022-03-03 16:15 ` Stephen Hemminger
@ 2022-03-22  3:56 ` xuan.ding
  2022-03-22  3:56   ` [RFC,v2 1/3] " xuan.ding
                     ` (2 more replies)
  2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-22  3:56 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The spliting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support protocol based
header split. When Rx queue is configured with header split feature,
packets received will be directly splited into two different mempools.

Xuan Ding (3):
  ethdev: introduce protocol type based header split
  app/testpmd: add header split configuration
  net/ice: support header split in Rx data path

 app/test-pmd/cmdline.c                | 117 ++++++++++++++
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   2 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  24 +--
 lib/ethdev/rte_ethdev.h               |  43 ++++-
 9 files changed, 397 insertions(+), 44 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v2 1/3] ethdev: introduce protocol type based header split
  2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
@ 2022-03-22  3:56   ` xuan.ding
  2022-03-22  7:14     ` Zhang, Qi Z
  2022-03-22  3:56   ` [RFC,v2 2/3] app/testpmd: add header split configuration xuan.ding
  2022-03-22  3:56   ` [RFC,v2 3/3] net/ice: support header split in Rx data path xuan.ding
  2 siblings, 1 reply; 88+ messages in thread
From: xuan.ding @ 2022-03-22  3:56 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. The split happens after the
packet header and before the packet payload. Splitting is usually between
the packet header that can be posted to a dedicated buffer and the packet
payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
Although header split is a subset of buffer split, configure buffer split
based on length and offset is not suitable for NICs that do split based on
header protocol types. And tunneling makes the conversion from offset to
protocol impossible.

This patch extends the current buffer split to support protocol based
header split. A new proto field is introduced in the rte_eth_rxseg_split
structure reserved field to specify header protocol type. With Rx offload
flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol type configured,
PMD will split the ingress packets into two separate regions. Currently,
both inner and outer L2/L3/L4 level header split can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0
    seg1 - pool1

With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
    seg0 - pool0, udp_header
    seg1 - pool1, payload

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 24 +++++++++++++----------
 lib/ethdev/rte_ethdev.h | 43 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 70c850a2f1..49c8fef1c3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint16_t proto = rx_seg[seg_idx].proto;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1692,15 +1693,17 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 					(struct rte_pktmbuf_pool_private));
 			return -ENOSPC;
 		}
-		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
-		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto == 0) {
+			offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
+			*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1781,8 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
-		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
+			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
 			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c2d1f9a972..6d66de316c 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1197,12 +1197,26 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * Header split is a subset of buffer split. The split happens after the
+ * packet header and before the packet payload. For PMDs that do not
+ * support header split configuration by length and offset, the location
+ * of the split needs to be specified by the header protocol type. While for
+ * buffer split, this field should not be configured.
+ *
+ * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into two separate regions:
+ * - The header buffer will be allocated from the memory pool,
+ *   specified in the first array element, the second buffer, from the
+ *   pool in the second element.
+ * - The length and offset do not need to be configured in header split.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint16_t proto; /**< header protocol type, configures header split point. */
+	uint16_t reserved; /**< Reserved field. */
 };
 
 /**
@@ -1212,7 +1226,7 @@ struct rte_eth_rxseg_split {
  * A common structure used to describe Rx packet segment properties.
  */
 union rte_eth_rxseg {
-	/* The settings for buffer split offload. */
+	/* The settings for buffer split and header split offload. */
 	struct rte_eth_rxseg_split split;
 	/* The other features settings should be added here. */
 };
@@ -1664,6 +1678,31 @@ struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this enumaration may change without prior notice.
+ * This enum indicates the header split protocol type
+ */
+enum rte_eth_rx_header_split_protocol_type {
+	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
+	RTE_ETH_RX_HEADER_SPLIT_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_L3,
+	RTE_ETH_RX_HEADER_SPLIT_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_L4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
+};
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v2 2/3] app/testpmd: add header split configuration
  2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
  2022-03-22  3:56   ` [RFC,v2 1/3] " xuan.ding
@ 2022-03-22  3:56   ` xuan.ding
  2022-03-22  3:56   ` [RFC,v2 3/3] net/ice: support header split in Rx data path xuan.ding
  2 siblings, 0 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-22  3:56 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds header split configuration in testpmd. The header split
feature is off by default. To enable header split, you need:
1. Configure Rx queue with rx_offload header split on.
2. Set the protocol type of header split.

Command for set header split protocol type:
testpmd> port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|
		    l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
		    inner_udp|inner_sctp|inner_l4

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 app/test-pmd/cmdline.c | 117 +++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.c |   6 ++-
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4ba8da2b0..73257ddb38 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
+			"port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp|inner_l4\n"
+			"     Configure protocol for header split"
+			" on all Rx queues of a port\n\n"
+
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
@@ -16352,6 +16358,116 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = {
 	}
 };
 
+/* config a per port header split protocol */
+struct cmd_config_per_port_headersplit_protocol_result {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	uint16_t port_id;
+	cmdline_fixed_string_t headersplit;
+	cmdline_fixed_string_t protocol;
+};
+
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_port =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port, "port");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_config =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 config, "config");
+cmdline_parse_token_num_t cmd_config_per_port_headersplit_protocol_result_port_id =
+	TOKEN_NUM_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_headersplit =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 headersplit, "header_split");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_protocol =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#"
+			   "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#"
+			   "inner_udp#inner_sctp#inner_l4");
+
+static void
+cmd_config_per_port_headersplit_protocol_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct cmd_config_per_port_headersplit_protocol_result *res = parsed_result;
+	portid_t port_id = res->port_id;
+	struct rte_port *port = &ports[port_id];
+	uint16_t protocol;
+
+	if (port_id_is_invalid(port_id, ENABLED_WARN))
+		return;
+
+	if (port->port_status != RTE_PORT_STOPPED) {
+		fprintf(stderr,
+			"Error: Can't config offload when Port %d is not stopped\n",
+			port_id);
+		return;
+	}
+
+	if (!strcmp(res->protocol, "mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_MAC;
+	else if (!strcmp(res->protocol, "ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV4;
+	else if (!strcmp(res->protocol, "ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV6;
+	else if (!strcmp(res->protocol, "l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L3;
+	else if (!strcmp(res->protocol, "tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_TCP;
+	else if (!strcmp(res->protocol, "udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_UDP;
+	else if (!strcmp(res->protocol, "sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_SCTP;
+	else if (!strcmp(res->protocol, "l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L4;
+	else if (!strcmp(res->protocol, "inner_mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_MAC;
+	else if (!strcmp(res->protocol, "inner_ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4;
+	else if (!strcmp(res->protocol, "inner_ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6;
+	else if (!strcmp(res->protocol, "inner_l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L3;
+	else if (!strcmp(res->protocol, "inner_tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_TCP;
+	else if (!strcmp(res->protocol, "inner_udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_UDP;
+	else if (!strcmp(res->protocol, "inner_sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP;
+	else if (!strcmp(res->protocol, "inner_l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L4;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", res->protocol);
+		return;
+	}
+
+	rx_pkt_header_split_proto = protocol;
+
+	cmd_reconfig_device_queue(port_id, 1, 1);
+}
+
+cmdline_parse_inst_t cmd_config_per_port_headersplit_protocol = {
+	.f = cmd_config_per_port_headersplit_protocol_parsed,
+	.data = NULL,
+	.help_str = "port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+		    "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+		    "inner_udp|inner_sctp|inner_l4",
+	.tokens = {
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_config,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port_id,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_headersplit,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_protocol,
+		NULL,
+	}
+};
+
 /* Enable/Disable a per queue offloading */
 struct cmd_config_per_queue_rx_offload_result {
 	cmdline_fixed_string_t port;
@@ -18070,6 +18186,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_configuration,
 	(cmdline_parse_inst_t *)&cmd_config_per_port_rx_offload,
+	(cmdline_parse_inst_t *)&cmd_config_per_port_headersplit_protocol,
 	(cmdline_parse_inst_t *)&cmd_config_per_queue_rx_offload,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_configuration,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6d2e52c790..4aba8e4ac4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -253,6 +253,8 @@ uint8_t  tx_pkt_nb_segs = 1; /**< Number of segments in TXONLY packets */
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+uint8_t rx_pkt_header_split_proto;
+
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
@@ -2568,7 +2570,8 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 	int ret;
 
 	if (rx_pkt_nb_segs <= 1 ||
-	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) {
+	    (((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) &&
+	     ((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) == 0))) {
 		rx_conf->rx_seg = NULL;
 		rx_conf->rx_nseg = 0;
 		ret = rte_eth_rx_queue_setup(port_id, rx_queue_id,
@@ -2592,6 +2595,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto = rx_pkt_header_split_proto;
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9967825044..a9681372a4 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -531,6 +531,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t rx_pkt_header_split_proto;
+
 extern uint8_t txonly_multi_flow;
 
 extern uint32_t rxq_share;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v2 3/3] net/ice: support header split in Rx data path
  2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
  2022-03-22  3:56   ` [RFC,v2 1/3] " xuan.ding
  2022-03-22  3:56   ` [RFC,v2 2/3] app/testpmd: add header split configuration xuan.ding
@ 2022-03-22  3:56   ` xuan.ding
  2 siblings, 0 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-22  3:56 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds support for header split in normal Rx data paths.
When the Rx queue is configured with header split for specific
protocol type, packets received will be directly splited into
header and payload parts. And the two parts will be put into
different mempools.

Currently, header split is not supported in vectorized paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 218 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 1a469afeac..c9762d810d 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3707,7 +3707,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3719,7 +3720,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3788,6 +3789,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 4f218bcd0d..fbc88c7473 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,51 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		switch (rxq->rxseg[0].proto) {
+		case RTE_ETH_RX_HEADER_SPLIT_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_L3:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_UDP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_SCTP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_NONE:
+			PMD_DRV_LOG(ERR, "Header split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Header split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Header Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +440,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +448,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +455,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * header split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +504,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -1076,6 +1137,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1149,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+		if (n_seg > ICE_RX_MAX_NSEG) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split seg exceed maximum",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1176,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1656,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1616,6 +1704,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1695,7 +1801,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1708,6 +1816,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1716,13 +1833,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2315,11 +2440,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2342,12 +2469,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2360,24 +2491,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 8ff01046e1..db394ceca8 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -290,6 +290,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC,v2 1/3] ethdev: introduce protocol type based header split
  2022-03-22  3:56   ` [RFC,v2 1/3] " xuan.ding
@ 2022-03-22  7:14     ` Zhang, Qi Z
  2022-03-22  7:43       ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Qi Z @ 2022-03-22  7:14 UTC (permalink / raw)
  To: Ding, Xuan, thomas, Yigit, Ferruh, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wu, WenxuanX, Wang, YuanX



> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Tuesday, March 22, 2022 11:56 AM
> To: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: [RFC,v2 1/3] ethdev: introduce protocol type based header split
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> Header split consists of splitting a received packet into two separate regions
> based on the packet content. The split happens after the packet header and
> before the packet payload. Splitting is usually between the packet header
> that can be posted to a dedicated buffer and the packet payload that can be
> posted to a different buffer.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configure buffer split based
> on length and offset is not suitable for NICs that do split based on header
> protocol types. And tunneling makes the conversion from offset to protocol
> impossible.
> 
> This patch extends the current buffer split to support protocol based header
> split. A new proto field is introduced in the rte_eth_rxseg_split structure
> reserved field to specify header protocol type. With Rx offload flag
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol type
> configured, PMD will split the ingress packets into two separate regions.
> Currently, both inner and outer L2/L3/L4 level header split can be supported.
> 
> For example, let's suppose we configured the Rx queue with the following
> segments:
>     seg0 - pool0
>     seg1 - pool1
> 
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>     seg0 - pool0, udp_header
>     seg1 - pool1, payload
> 
> The memory attributes for the split parts may differ either - for example the
> mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> ---
>  lib/ethdev/rte_ethdev.c | 24 +++++++++++++----------
> lib/ethdev/rte_ethdev.h | 43
> +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 55 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 70c850a2f1..49c8fef1c3 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint16_t proto = rx_seg[seg_idx].proto;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1692,15 +1693,17 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  					(struct rte_pktmbuf_pool_private));
>  			return -ENOSPC;
>  		}
> -		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> -		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto == 0) {
> +			offset += seg_idx != 0 ? 0 :
> RTE_PKTMBUF_HEADROOM;
> +			*mbp_buf_size =
> rte_pktmbuf_data_room_size(mpl);
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
>  		}

As the length and proto is exclusive, it better also check the length when proto!=0

.....

> @@ -1197,12 +1197,26 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * Header split is a subset of buffer split. The split happens after
> + the
> + * packet header and before the packet payload. For PMDs that do not
> + * support header split configuration by length and offset, the
> + location
> + * of the split needs to be specified by the header protocol type.
> + While for
> + * buffer split, this field should not be configured.
> + *
> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> + * the PMD will split the received packets into two separate regions:
> + * - The header buffer will be allocated from the memory pool,
> + *   specified in the first array element, the second buffer, from the
> + *   pool in the second element.
> + * - The length and offset do not need to be configured in header split.

We may not necessarily ignore the offset configure for header split as there is no confliction, a driver still can support copying a split header to a specific mbuf offset
And if we support offset with header split, offset boundary check can also be considered in rte_eth_rx_queue_check_split

Regards
Qi



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC,v2 1/3] ethdev: introduce protocol type based header split
  2022-03-22  7:14     ` Zhang, Qi Z
@ 2022-03-22  7:43       ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-03-22  7:43 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas, Yigit, Ferruh, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wu, WenxuanX, Wang, YuanX

Hi Qi,

> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Tuesday, March 22, 2022 3:14 PM
> To: Ding, Xuan <xuan.ding@intel.com>; thomas@monjalon.net; Yigit, Ferruh
> <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: RE: [RFC,v2 1/3] ethdev: introduce protocol type based header split
> 
> 
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Tuesday, March 22, 2022 11:56 AM
> > To: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > andrew.rybchenko@oktetlabs.ru
> > Cc: dev@dpdk.org; stephen@networkplumber.org;
> > mb@smartsharesystems.com; viacheslavo@nvidia.com; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wu, WenxuanX
> > <wenxuanx.wu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX
> > <yuanx.wang@intel.com>
> > Subject: [RFC,v2 1/3] ethdev: introduce protocol type based header
> > split
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configure buffer
> > split based on length and offset is not suitable for NICs that do
> > split based on header protocol types. And tunneling makes the
> > conversion from offset to protocol impossible.
> >
> > This patch extends the current buffer split to support protocol based
> > header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > enabled and protocol type configured, PMD will split the ingress packets
> into two separate regions.
> > Currently, both inner and outer L2/L3/L4 level header split can be
> supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following
> > segments:
> >     seg0 - pool0
> >     seg1 - pool1
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >     seg0 - pool0, udp_header
> >     seg1 - pool1, payload
> >
> > The memory attributes for the split parts may differ either - for
> > example the
> > mempool0 and mempool1 belong to dpdk memory and external memory,
> > respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > ---
> >  lib/ethdev/rte_ethdev.c | 24 +++++++++++++----------
> > lib/ethdev/rte_ethdev.h | 43
> > +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 55 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 70c850a2f1..49c8fef1c3 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,  struct rte_mempool *mpl =
> > rx_seg[seg_idx].mp;  uint32_t length = rx_seg[seg_idx].length;
> > uint32_t offset = rx_seg[seg_idx].offset;
> > +uint16_t proto = rx_seg[seg_idx].proto;
> >
> >  if (mpl == NULL) {
> >  RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); @@ -1692,15
> +1693,17
> > @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> > *rx_seg,  (struct rte_pktmbuf_pool_private));  return -ENOSPC;  }
> > -offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; -*mbp_buf_size =
> > rte_pktmbuf_data_room_size(mpl); -length = length != 0 ? length :
> > *mbp_buf_size; -if (*mbp_buf_size < length + offset) {
> > -RTE_ETHDEV_LOG(ERR,
> > -       "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > -       mpl->name, *mbp_buf_size,
> > -       length + offset, length, offset);
> > -return -EINVAL;
> > +if (proto == 0) {
> > +offset += seg_idx != 0 ? 0 :
> > RTE_PKTMBUF_HEADROOM;
> > +*mbp_buf_size =
> > rte_pktmbuf_data_room_size(mpl);
> > +length = length != 0 ? length : *mbp_buf_size; if (*mbp_buf_size <
> > +length + offset) { RTE_ETHDEV_LOG(ERR, "%s mbuf_data_room_size %u <
> > +%u
> > (segment length=%u + segment offset=%u)\n",
> > +mpl->name, *mbp_buf_size,
> > +length + offset, length, offset);
> > +return -EINVAL;
> > +}
> >  }
> 
> As the length and proto is exclusive, it better also check the length when
> proto!=0

Thanks for your comments, will fix it in next version.

> 
> .....
> 
> > @@ -1197,12 +1197,26 @@ struct rte_eth_txmode {
> >   *     - pool from the last valid element
> >   *     - the buffer size from this pool
> >   *     - zero offset
> > + *
> > + * Header split is a subset of buffer split. The split happens after
> > + the
> > + * packet header and before the packet payload. For PMDs that do not
> > + * support header split configuration by length and offset, the
> > + location
> > + * of the split needs to be specified by the header protocol type.
> > + While for
> > + * buffer split, this field should not be configured.
> > + *
> > + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> > + * the PMD will split the received packets into two separate regions:
> > + * - The header buffer will be allocated from the memory pool,
> > + *   specified in the first array element, the second buffer, from the
> > + *   pool in the second element.
> > + * - The length and offset do not need to be configured in header split.
> 
> We may not necessarily ignore the offset configure for header split as there
> is no confliction, a driver still can support copying a split header to a specific
> mbuf offset And if we support offset with header split, offset boundary check
> can also be considered in rte_eth_rx_queue_check_split

You are right. Only length and proto is exclusive in between buffer split and header split.
Will update in next version.

Thanks,
Xuan

> 
> Regards
> Qi
> 
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v3 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (2 preceding siblings ...)
  2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
@ 2022-03-29  6:49 ` xuan.ding
  2022-03-29  6:49   ` [RFC,v3 1/3] " xuan.ding
                     ` (2 more replies)
  2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
                   ` (6 subsequent siblings)
  10 siblings, 3 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-29  6:49 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The spliting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support protocol based
header split. When Rx queue is configured with header split feature,
packets received will be directly splited into two different mempools.

rfc v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

rfc v1->v2:
* Add support for all header split protocol types.

Xuan Ding (3):
  ethdev: introduce protocol type based header split
  app/testpmd: add header split configuration
  net/ice: support header split in Rx data path

 app/test-pmd/cmdline.c                | 117 ++++++++++++++
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   2 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 223 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  34 +++-
 lib/ethdev/rte_ethdev.h               |  48 +++++-
 9 files changed, 417 insertions(+), 42 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v3 1/3] ethdev: introduce protocol type based header split
  2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
@ 2022-03-29  6:49   ` xuan.ding
  2022-03-29  7:56     ` Zhang, Qi Z
  2022-03-29  6:49   ` [RFC,v3 2/3] app/testpmd: add header split configuration xuan.ding
  2022-03-29  6:49   ` [RFC,v3 3/3] net/ice: support header split in Rx data path xuan.ding
  2 siblings, 1 reply; 88+ messages in thread
From: xuan.ding @ 2022-03-29  6:49 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. The split happens after the
packet header and before the packet payload. Splitting is usually between
the packet header that can be posted to a dedicated buffer and the packet
payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
Although header split is a subset of buffer split, configuring buffer
split based on length is not suitable for NICs that do split based on
header protocol types. Because tunneling makes the conversion from length
to protocol type impossible.

This patch extends the current buffer split to support protocol type and
offset based header split. A new proto field is introduced in the
rte_eth_rxseg_split structure reserved field to specify header protocol
type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
protocol type configured, PMD will split the ingress packets into two
separate regions. Currently, both inner and outer L2/L3/L4 level header
split can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, off0=2B
    seg1 - pool1, off1=128B

With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
    seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - payload @ 128 in mbuf from pool1

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 48 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..144a43588c 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint16_t proto = rx_seg[seg_idx].proto;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,29 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto == 0) {
+			/* Check buffer split. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Check header split. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in header split\n");
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u segment offset)\n",
+					mpl->name, *mbp_buf_size,
+					offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
-		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
+			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
 			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..e8371b98ed 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * Header split is a subset of buffer split. The split happens after the
+ * packet header and before the packet payload. For PMDs that do not
+ * support header split configuration by length, the location of the split
+ * needs to be specified by the header protocol type. While for buffer split,
+ * this field should not be configured.
+ *
+ * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into two separate regions:
+ * - The header buffer will be allocated from the memory pool,
+ *   specified in the first array element, the second buffer, from the
+ *   pool in the second element.
+ *
+ * - The lengths do not need to be configured in header split.
+ *
+ * - The offsets from the segment description elements specify
+ *   the data offset from the buffer beginning except the first mbuf.
+ *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint16_t proto; /**< header protocol type, configures header split point. */
+	uint16_t reserved; /**< Reserved field. */
 };
 
 /**
@@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
  * A common structure used to describe Rx packet segment properties.
  */
 union rte_eth_rxseg {
-	/* The settings for buffer split offload. */
+	/* The settings for buffer split and header split offload. */
 	struct rte_eth_rxseg_split split;
 	/* The other features settings should be added here. */
 };
@@ -1664,6 +1683,31 @@ struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this enum may change without prior notice.
+ * This enum indicates the header split protocol type
+ */
+enum rte_eth_rx_header_split_protocol_type {
+	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
+	RTE_ETH_RX_HEADER_SPLIT_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_L3,
+	RTE_ETH_RX_HEADER_SPLIT_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_L4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
+};
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v3 2/3] app/testpmd: add header split configuration
  2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
  2022-03-29  6:49   ` [RFC,v3 1/3] " xuan.ding
@ 2022-03-29  6:49   ` xuan.ding
  2022-03-29  6:49   ` [RFC,v3 3/3] net/ice: support header split in Rx data path xuan.ding
  2 siblings, 0 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-29  6:49 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds header split configuration in testpmd. The header split
feature is off by default. To enable header split, you need:
1. Configure Rx queue with rx_offload header split on.
2. Set the protocol type of header split.

Command for set header split protocol type:
testpmd> port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|
		    l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
		    inner_udp|inner_sctp|inner_l4

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 app/test-pmd/cmdline.c | 117 +++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.c |   6 ++-
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..abda81b4bc 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
+			"port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp|inner_l4\n"
+			"     Configure protocol for header split"
+			" on all Rx queues of a port\n\n"
+
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
@@ -16353,6 +16359,116 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = {
 	}
 };
 
+/* config a per port header split protocol */
+struct cmd_config_per_port_headersplit_protocol_result {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	uint16_t port_id;
+	cmdline_fixed_string_t headersplit;
+	cmdline_fixed_string_t protocol;
+};
+
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_port =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port, "port");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_config =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 config, "config");
+cmdline_parse_token_num_t cmd_config_per_port_headersplit_protocol_result_port_id =
+	TOKEN_NUM_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_headersplit =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 headersplit, "header_split");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_protocol =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#"
+			   "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#"
+			   "inner_udp#inner_sctp#inner_l4");
+
+static void
+cmd_config_per_port_headersplit_protocol_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct cmd_config_per_port_headersplit_protocol_result *res = parsed_result;
+	portid_t port_id = res->port_id;
+	struct rte_port *port = &ports[port_id];
+	uint16_t protocol;
+
+	if (port_id_is_invalid(port_id, ENABLED_WARN))
+		return;
+
+	if (port->port_status != RTE_PORT_STOPPED) {
+		fprintf(stderr,
+			"Error: Can't config offload when Port %d is not stopped\n",
+			port_id);
+		return;
+	}
+
+	if (!strcmp(res->protocol, "mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_MAC;
+	else if (!strcmp(res->protocol, "ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV4;
+	else if (!strcmp(res->protocol, "ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV6;
+	else if (!strcmp(res->protocol, "l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L3;
+	else if (!strcmp(res->protocol, "tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_TCP;
+	else if (!strcmp(res->protocol, "udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_UDP;
+	else if (!strcmp(res->protocol, "sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_SCTP;
+	else if (!strcmp(res->protocol, "l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L4;
+	else if (!strcmp(res->protocol, "inner_mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_MAC;
+	else if (!strcmp(res->protocol, "inner_ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4;
+	else if (!strcmp(res->protocol, "inner_ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6;
+	else if (!strcmp(res->protocol, "inner_l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L3;
+	else if (!strcmp(res->protocol, "inner_tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_TCP;
+	else if (!strcmp(res->protocol, "inner_udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_UDP;
+	else if (!strcmp(res->protocol, "inner_sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP;
+	else if (!strcmp(res->protocol, "inner_l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L4;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", res->protocol);
+		return;
+	}
+
+	rx_pkt_header_split_proto = protocol;
+
+	cmd_reconfig_device_queue(port_id, 1, 1);
+}
+
+cmdline_parse_inst_t cmd_config_per_port_headersplit_protocol = {
+	.f = cmd_config_per_port_headersplit_protocol_parsed,
+	.data = NULL,
+	.help_str = "port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+		    "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+		    "inner_udp|inner_sctp|inner_l4",
+	.tokens = {
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_config,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port_id,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_headersplit,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_protocol,
+		NULL,
+	}
+};
+
 /* Enable/Disable a per queue offloading */
 struct cmd_config_per_queue_rx_offload_result {
 	cmdline_fixed_string_t port;
@@ -18071,6 +18187,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_configuration,
 	(cmdline_parse_inst_t *)&cmd_config_per_port_rx_offload,
+	(cmdline_parse_inst_t *)&cmd_config_per_port_headersplit_protocol,
 	(cmdline_parse_inst_t *)&cmd_config_per_queue_rx_offload,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_configuration,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..a00fa0e236 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -253,6 +253,8 @@ uint8_t  tx_pkt_nb_segs = 1; /**< Number of segments in TXONLY packets */
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+uint8_t rx_pkt_header_split_proto;
+
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
@@ -2568,7 +2570,8 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 	int ret;
 
 	if (rx_pkt_nb_segs <= 1 ||
-	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) {
+	    (((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) &&
+	     ((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) == 0))) {
 		rx_conf->rx_seg = NULL;
 		rx_conf->rx_nseg = 0;
 		ret = rte_eth_rx_queue_setup(port_id, rx_queue_id,
@@ -2592,6 +2595,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto = rx_pkt_header_split_proto;
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..021e2768be 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -557,6 +557,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t rx_pkt_header_split_proto;
+
 extern uint8_t txonly_multi_flow;
 
 extern uint32_t rxq_share;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [RFC,v3 3/3] net/ice: support header split in Rx data path
  2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
  2022-03-29  6:49   ` [RFC,v3 1/3] " xuan.ding
  2022-03-29  6:49   ` [RFC,v3 2/3] app/testpmd: add header split configuration xuan.ding
@ 2022-03-29  6:49   ` xuan.ding
  2 siblings, 0 replies; 88+ messages in thread
From: xuan.ding @ 2022-03-29  6:49 UTC (permalink / raw)
  To: thomas, ferruh.yigit, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, qi.z.zhang, ping.yu, wenxuanx.wu,
	Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds support for header split in normal Rx data paths.
When the Rx queue is configured with header split for specific
protocol type, packets received will be directly splited into
header and payload parts. And the two parts will be put into
different mempools.

Currently, header split is not supported in vectorized paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 223 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 221 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 13adcf90ed..cb32265dbe 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 041f4bc91f..1f245c853b 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,54 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		switch (rxq->rxseg[0].proto) {
+		case RTE_ETH_RX_HEADER_SPLIT_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_IPV4:
+		case RTE_ETH_RX_HEADER_SPLIT_IPV6:
+		case RTE_ETH_RX_HEADER_SPLIT_L3:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_L3:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_UDP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_SCTP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_NONE:
+			PMD_DRV_LOG(ERR, "Header split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Header split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Header Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +443,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +451,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +458,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * header split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+		if (n_seg > ICE_RX_MAX_NSEG) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split seg exceed maximum",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1179,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1659,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1616,6 +1707,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1695,7 +1804,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1708,6 +1819,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1716,13 +1836,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2315,11 +2443,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2342,12 +2472,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2360,24 +2494,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..7a155a66f2 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC,v3 1/3] ethdev: introduce protocol type based header split
  2022-03-29  6:49   ` [RFC,v3 1/3] " xuan.ding
@ 2022-03-29  7:56     ` Zhang, Qi Z
  2022-03-29  8:18       ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Qi Z @ 2022-03-29  7:56 UTC (permalink / raw)
  To: Ding, Xuan, thomas, Yigit, Ferruh, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wu, WenxuanX, Wang, YuanX



> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Tuesday, March 29, 2022 2:50 PM
> To: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: [RFC,v3 1/3] ethdev: introduce protocol type based header split
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> Header split consists of splitting a received packet into two separate regions
> based on the packet content. The split happens after the packet header and
> before the packet payload. Splitting is usually between the packet header
> that can be posted to a dedicated buffer and the packet payload that can be
> posted to a different buffer.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configuring buffer split based
> on length is not suitable for NICs that do split based on header protocol types.
> Because tunneling makes the conversion from length to protocol type
> impossible.
> 
> This patch extends the current buffer split to support protocol type and
> offset based header split. A new proto field is introduced in the
> rte_eth_rxseg_split structure reserved field to specify header protocol type.
> With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> protocol type configured, PMD will split the ingress packets into two separate
> regions. Currently, both inner and outer L2/L3/L4 level header split can be
> supported.
> 
> For example, let's suppose we configured the Rx queue with the following
> segments:
>     seg0 - pool0, off0=2B
>     seg1 - pool1, off1=128B
> 
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - payload @ 128 in mbuf from pool1
> 
> The memory attributes for the split parts may differ either - for example the
> mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> ---
>  lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> lib/ethdev/rte_ethdev.h | 48
> +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 72 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 29a3d80466..144a43588c 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint16_t proto = rx_seg[seg_idx].proto;
> 
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,29 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto == 0) {

 use RTE_ETH_RX_HEADER_SPLIT_NONE looks better?

Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>




^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [RFC,v3 1/3] ethdev: introduce protocol type based header split
  2022-03-29  7:56     ` Zhang, Qi Z
@ 2022-03-29  8:18       ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-03-29  8:18 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas, andrew.rybchenko
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wu, WenxuanX, Wang, YuanX

Hi Qi,

> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: Tuesday, March 29, 2022 3:56 PM
> To: Ding, Xuan <xuan.ding@intel.com>; thomas@monjalon.net; Yigit, Ferruh
> <ferruh.yigit@intel.com>; andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: RE: [RFC,v3 1/3] ethdev: introduce protocol type based header split
> 
> 
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Tuesday, March 29, 2022 2:50 PM
> > To: thomas@monjalon.net; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > andrew.rybchenko@oktetlabs.ru
> > Cc: dev@dpdk.org; stephen@networkplumber.org;
> > mb@smartsharesystems.com; viacheslavo@nvidia.com; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Yu, Ping <ping.yu@intel.com>; Wu, WenxuanX
> > <wenxuanx.wu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX
> > <yuanx.wang@intel.com>
> > Subject: [RFC,v3 1/3] ethdev: introduce protocol type based header
> > split
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configuring buffer
> > split based on length is not suitable for NICs that do split based on header
> protocol types.
> > Because tunneling makes the conversion from length to protocol type
> > impossible.
> >
> > This patch extends the current buffer split to support protocol type
> > and offset based header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header protocol type.
> > With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> > protocol type configured, PMD will split the ingress packets into two
> > separate regions. Currently, both inner and outer L2/L3/L4 level
> > header split can be supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following
> > segments:
> >     seg0 - pool0, off0=2B
> >     seg1 - pool1, off1=128B
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >     seg1 - payload @ 128 in mbuf from pool1
> >
> > The memory attributes for the split parts may differ either - for
> > example the
> > mempool0 and mempool1 belong to dpdk memory and external memory,
> > respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > ---
> >  lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> > lib/ethdev/rte_ethdev.h | 48
> > +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 72 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..144a43588c 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,  struct rte_mempool *mpl =
> > rx_seg[seg_idx].mp;  uint32_t length = rx_seg[seg_idx].length;
> > uint32_t offset = rx_seg[seg_idx].offset;
> > +uint16_t proto = rx_seg[seg_idx].proto;
> >
> >  if (mpl == NULL) {
> >  RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); @@ -1694,13
> +1695,29
> > @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split
> > *rx_seg,  }  offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); -length = length != 0
> > ? length : *mbp_buf_size; -if (*mbp_buf_size < length + offset) {
> > -RTE_ETHDEV_LOG(ERR,
> > -       "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > -       mpl->name, *mbp_buf_size,
> > -       length + offset, length, offset);
> > -return -EINVAL;
> > +if (proto == 0) {
> 
>  use RTE_ETH_RX_HEADER_SPLIT_NONE looks better?

Yes, it is better to use RTE_ETH_RX_HEADER_SPLIT_NONE here.
Will fix it in next version.

Thanks,
Xuan

> 
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> 
> 
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [v4 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (3 preceding siblings ...)
  2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
@ 2022-04-02 10:41 ` wenxuanx.wu
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
                     ` (2 more replies)
  2022-05-27  7:54 ` [PATCH v6] ethdev: introduce protocol header based buffer split xuan.ding
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-02 10:41 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang
  Cc: dev, stephen, mb, viacheslavo, ping.yu, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The splitting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support protocol based
header split. When Rx queue is configured with header split feature,
packets received will be directly splitted into two different mempools.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Xuan Ding (3):
  ethdev: introduce protocol type based header split
  app/testpmd: add header split configuration
  net/ice: support header split in Rx data path

 app/test-pmd/cmdline.c                | 117 ++++++++++++++
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   2 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 223 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  34 +++-
 lib/ethdev/rte_ethdev.h               |  48 +++++-
 9 files changed, 417 insertions(+), 42 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-04-02 10:41   ` wenxuanx.wu
  2022-04-07 10:47     ` Andrew Rybchenko
                       ` (2 more replies)
  2022-04-02 10:41   ` [v4 2/3] app/testpmd: add header split configuration wenxuanx.wu
  2022-04-02 10:41   ` [v4 3/3] net/ice: support header split in Rx data path wenxuanx.wu
  2 siblings, 3 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-02 10:41 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang
  Cc: dev, stephen, mb, viacheslavo, ping.yu, Xuan Ding, Yuan Wang, Wenxuan Wu

From: Xuan Ding <xuan.ding@intel.com>

Header split consists of splitting a received packet into two separate
regions based on the packet content. The split happens after the
packet header and before the packet payload. Splitting is usually between
the packet header that can be posted to a dedicated buffer and the packet
payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
Although header split is a subset of buffer split, configuring buffer
split based on length is not suitable for NICs that do split based on
header protocol types. Because tunneling makes the conversion from length
to protocol type impossible.

This patch extends the current buffer split to support protocol type and
offset based header split. A new proto field is introduced in the
rte_eth_rxseg_split structure reserved field to specify header protocol
type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
protocol type configured, PMD will split the ingress packets into two
separate regions. Currently, both inner and outer L2/L3/L4 level header
split can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, off0=2B
    seg1 - pool1, off1=128B

With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
    seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - payload @ 128 in mbuf from pool1

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 48 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..29adcdc2f0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint16_t proto = rx_seg[seg_idx].proto;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,29 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
+			/* Check buffer split. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Check header split. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in header split\n");
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u segment offset)\n",
+					mpl->name, *mbp_buf_size,
+					offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
@@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
 		n_seg = rx_conf->rx_nseg;
 
-		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
+			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
 			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..e8371b98ed 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * Header split is a subset of buffer split. The split happens after the
+ * packet header and before the packet payload. For PMDs that do not
+ * support header split configuration by length, the location of the split
+ * needs to be specified by the header protocol type. While for buffer split,
+ * this field should not be configured.
+ *
+ * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into two separate regions:
+ * - The header buffer will be allocated from the memory pool,
+ *   specified in the first array element, the second buffer, from the
+ *   pool in the second element.
+ *
+ * - The lengths do not need to be configured in header split.
+ *
+ * - The offsets from the segment description elements specify
+ *   the data offset from the buffer beginning except the first mbuf.
+ *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint16_t proto; /**< header protocol type, configures header split point. */
+	uint16_t reserved; /**< Reserved field. */
 };
 
 /**
@@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
  * A common structure used to describe Rx packet segment properties.
  */
 union rte_eth_rxseg {
-	/* The settings for buffer split offload. */
+	/* The settings for buffer split and header split offload. */
 	struct rte_eth_rxseg_split split;
 	/* The other features settings should be added here. */
 };
@@ -1664,6 +1683,31 @@ struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this enum may change without prior notice.
+ * This enum indicates the header split protocol type
+ */
+enum rte_eth_rx_header_split_protocol_type {
+	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
+	RTE_ETH_RX_HEADER_SPLIT_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_L3,
+	RTE_ETH_RX_HEADER_SPLIT_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_L4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
+	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
+};
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [v4 2/3] app/testpmd: add header split configuration
  2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
@ 2022-04-02 10:41   ` wenxuanx.wu
  2022-04-02 10:41   ` [v4 3/3] net/ice: support header split in Rx data path wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-02 10:41 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang
  Cc: dev, stephen, mb, viacheslavo, ping.yu, Xuan Ding, Yuan Wang, Wenxuan Wu

From: Xuan Ding <xuan.ding@intel.com>

This patch adds header split configuration in testpmd. The header split
feature is off by default. To enable header split, you need:
1. Configure Rx queue with rx_offload header split on.
2. Set the protocol type of header split.

Command for set header split protocol type:
testpmd> port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|
		    l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
		    inner_udp|inner_sctp|inner_l4

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 app/test-pmd/cmdline.c | 117 +++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.c |   6 ++-
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..abda81b4bc 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
+			"port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp|inner_l4\n"
+			"     Configure protocol for header split"
+			" on all Rx queues of a port\n\n"
+
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
@@ -16353,6 +16359,116 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = {
 	}
 };
 
+/* config a per port header split protocol */
+struct cmd_config_per_port_headersplit_protocol_result {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	uint16_t port_id;
+	cmdline_fixed_string_t headersplit;
+	cmdline_fixed_string_t protocol;
+};
+
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_port =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port, "port");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_config =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 config, "config");
+cmdline_parse_token_num_t cmd_config_per_port_headersplit_protocol_result_port_id =
+	TOKEN_NUM_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_headersplit =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 headersplit, "header_split");
+cmdline_parse_token_string_t cmd_config_per_port_headersplit_protocol_result_protocol =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_headersplit_protocol_result,
+		 protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#"
+			   "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#"
+			   "inner_udp#inner_sctp#inner_l4");
+
+static void
+cmd_config_per_port_headersplit_protocol_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct cmd_config_per_port_headersplit_protocol_result *res = parsed_result;
+	portid_t port_id = res->port_id;
+	struct rte_port *port = &ports[port_id];
+	uint16_t protocol;
+
+	if (port_id_is_invalid(port_id, ENABLED_WARN))
+		return;
+
+	if (port->port_status != RTE_PORT_STOPPED) {
+		fprintf(stderr,
+			"Error: Can't config offload when Port %d is not stopped\n",
+			port_id);
+		return;
+	}
+
+	if (!strcmp(res->protocol, "mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_MAC;
+	else if (!strcmp(res->protocol, "ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV4;
+	else if (!strcmp(res->protocol, "ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_IPV6;
+	else if (!strcmp(res->protocol, "l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L3;
+	else if (!strcmp(res->protocol, "tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_TCP;
+	else if (!strcmp(res->protocol, "udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_UDP;
+	else if (!strcmp(res->protocol, "sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_SCTP;
+	else if (!strcmp(res->protocol, "l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_L4;
+	else if (!strcmp(res->protocol, "inner_mac"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_MAC;
+	else if (!strcmp(res->protocol, "inner_ipv4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4;
+	else if (!strcmp(res->protocol, "inner_ipv6"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6;
+	else if (!strcmp(res->protocol, "inner_l3"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L3;
+	else if (!strcmp(res->protocol, "inner_tcp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_TCP;
+	else if (!strcmp(res->protocol, "inner_udp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_UDP;
+	else if (!strcmp(res->protocol, "inner_sctp"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP;
+	else if (!strcmp(res->protocol, "inner_l4"))
+		protocol = RTE_ETH_RX_HEADER_SPLIT_INNER_L4;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", res->protocol);
+		return;
+	}
+
+	rx_pkt_header_split_proto = protocol;
+
+	cmd_reconfig_device_queue(port_id, 1, 1);
+}
+
+cmdline_parse_inst_t cmd_config_per_port_headersplit_protocol = {
+	.f = cmd_config_per_port_headersplit_protocol_parsed,
+	.data = NULL,
+	.help_str = "port config <port_id> header_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+		    "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+		    "inner_udp|inner_sctp|inner_l4",
+	.tokens = {
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_config,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_port_id,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_headersplit,
+		(void *)&cmd_config_per_port_headersplit_protocol_result_protocol,
+		NULL,
+	}
+};
+
 /* Enable/Disable a per queue offloading */
 struct cmd_config_per_queue_rx_offload_result {
 	cmdline_fixed_string_t port;
@@ -18071,6 +18187,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_configuration,
 	(cmdline_parse_inst_t *)&cmd_config_per_port_rx_offload,
+	(cmdline_parse_inst_t *)&cmd_config_per_port_headersplit_protocol,
 	(cmdline_parse_inst_t *)&cmd_config_per_queue_rx_offload,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_configuration,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..a00fa0e236 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -253,6 +253,8 @@ uint8_t  tx_pkt_nb_segs = 1; /**< Number of segments in TXONLY packets */
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+uint8_t rx_pkt_header_split_proto;
+
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
@@ -2568,7 +2570,8 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 	int ret;
 
 	if (rx_pkt_nb_segs <= 1 ||
-	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) {
+	    (((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) &&
+	     ((rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) == 0))) {
 		rx_conf->rx_seg = NULL;
 		rx_conf->rx_nseg = 0;
 		ret = rte_eth_rx_queue_setup(port_id, rx_queue_id,
@@ -2592,6 +2595,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto = rx_pkt_header_split_proto;
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..021e2768be 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -557,6 +557,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t rx_pkt_header_split_proto;
+
 extern uint8_t txonly_multi_flow;
 
 extern uint32_t rxq_share;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [v4 3/3] net/ice: support header split in Rx data path
  2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
  2022-04-02 10:41   ` [v4 2/3] app/testpmd: add header split configuration wenxuanx.wu
@ 2022-04-02 10:41   ` wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-02 10:41 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, aman.deep.singh,
	yuying.zhang, qi.z.zhang
  Cc: dev, stephen, mb, viacheslavo, ping.yu, Xuan Ding, Yuan Wang, Wenxuan Wu

From: Xuan Ding <xuan.ding@intel.com>

This patch adds support for header split in normal Rx data paths.
When the Rx queue is configured with header split for specific
protocol type, packets received will be directly splited into
header and payload parts. And the two parts will be put into
different mempools.

Currently, header split is not supported in vectorized paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 223 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 221 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 13adcf90ed..cb32265dbe 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_HEADER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 041f4bc91f..1f245c853b 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,54 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		switch (rxq->rxseg[0].proto) {
+		case RTE_ETH_RX_HEADER_SPLIT_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_MAC:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_IPV4:
+		case RTE_ETH_RX_HEADER_SPLIT_IPV6:
+		case RTE_ETH_RX_HEADER_SPLIT_L3:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_L3:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_UDP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_TCP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_SCTP:
+		case RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_ETH_RX_HEADER_SPLIT_NONE:
+			PMD_DRV_LOG(ERR, "Header split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Header split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Header Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +443,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +451,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +458,32 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * header split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+		if (n_seg > ICE_RX_MAX_NSEG) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split seg exceed maximum",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1179,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1659,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1616,6 +1707,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1695,7 +1804,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1708,6 +1819,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1716,13 +1836,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2315,11 +2443,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2342,12 +2472,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2360,24 +2494,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..7a155a66f2 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
@ 2022-04-07 10:47     ` Andrew Rybchenko
  2022-04-12 16:15       ` Ding, Xuan
  2022-04-07 13:26     ` Jerin Jacob
  2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
  2 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-04-07 10:47 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, aman.deep.singh, yuying.zhang,
	qi.z.zhang
  Cc: dev, stephen, mb, viacheslavo, ping.yu, Xuan Ding, Yuan Wang,
	david.marchand, Ferruh Yigit

On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> Header split consists of splitting a received packet into two separate
> regions based on the packet content. The split happens after the
> packet header and before the packet payload. Splitting is usually between
> the packet header that can be posted to a dedicated buffer and the packet
> payload that can be posted to a different buffer.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configuring buffer
> split based on length is not suitable for NICs that do split based on
> header protocol types. Because tunneling makes the conversion from length
> to protocol type impossible.
> 
> This patch extends the current buffer split to support protocol type and
> offset based header split. A new proto field is introduced in the
> rte_eth_rxseg_split structure reserved field to specify header protocol
> type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> protocol type configured, PMD will split the ingress packets into two
> separate regions. Currently, both inner and outer L2/L3/L4 level header
> split can be supported.

RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some
time ago to substitute bit-field header_split in struct
rte_eth_rxmode. It allows to enable header split offload with
the header size controlled using split_hdr_size in the same
structure.

Right now I see no single PMD which actually supports
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
Many examples and test apps initialize the field to 0
explicitly. The most of drivers simply ignore split_hdr_size
since the offload is not advertised, but some double-check
that its value is 0.

I think that it means that the field should be removed on
the next LTS, and I'd say, together with the
RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.

We should not redefine the offload meaning. 
 
 
 
 
 
 
 
 
 
 
 
 
 


> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, off0=2B
>      seg1 - pool1, off1=128B

Corresponding feature is named Rx buffer split.
Does it mean that protocol type based header split
requires Rx buffer split feature to be supported?

> 
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - payload @ 128 in mbuf from pool1

Is it always outermost UDP? Does it require both UDP over IPv4
and UDP over IPv6 to be supported? What will happen if only one
is supported? How application can find out which protocol stack
are supported?

> 
> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>   lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
>   lib/ethdev/rte_ethdev.h | 48 +++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 72 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..29adcdc2f0 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint16_t proto = rx_seg[seg_idx].proto;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,29 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> +			/* Check buffer split. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Check header split. */
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in header split\n");
> +				return -EINVAL;
> +			}
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u segment offset)\n",
> +					mpl->name, *mbp_buf_size,
> +					offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>   		n_seg = rx_conf->rx_nseg;
>   
> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT ||
> +			rx_conf->offloads & RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
>   			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..e8371b98ed 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * Header split is a subset of buffer split. The split happens after the
> + * packet header and before the packet payload. For PMDs that do not
> + * support header split configuration by length, the location of the split
> + * needs to be specified by the header protocol type. While for buffer split,
> + * this field should not be configured.
> + *
> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> + * the PMD will split the received packets into two separate regions:
> + * - The header buffer will be allocated from the memory pool,
> + *   specified in the first array element, the second buffer, from the
> + *   pool in the second element.
> + *
> + * - The lengths do not need to be configured in header split.
> + *
> + * - The offsets from the segment description elements specify
> + *   the data offset from the buffer beginning except the first mbuf.
> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	uint16_t proto; /**< header protocol type, configures header split point. */

I realize that you don't want to use here enum defined above to
save some reserved space, but description must refer to the
enum rte_eth_rx_header_split_protocol_type.

> +	uint16_t reserved; /**< Reserved field. */

As far as I can see the structure is experimental. So, it
should not be the problem to extend it, but it is a really
good question raised by Stephen in RFC v1 discussion.
Shouldn't we require that all reserved fields are initialized
to zero and ignored on processing? Frankly speaking I always
thought so, but failed to find the place were it is documented.

@Thomas, @David, @Ferruh?

>   };
>   
>   /**
> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
>    * A common structure used to describe Rx packet segment properties.
>    */
>   union rte_eth_rxseg {
> -	/* The settings for buffer split offload. */
> +	/* The settings for buffer split and header split offload. */
>   	struct rte_eth_rxseg_split split;
>   	/* The other features settings should be added here. */
>   };
> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
>   			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
>   #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this enum may change without prior notice.
> + * This enum indicates the header split protocol type
> + */
> +enum rte_eth_rx_header_split_protocol_type {
> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> +	RTE_ETH_RX_HEADER_SPLIT_L3,
> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> +	RTE_ETH_RX_HEADER_SPLIT_L4,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,

Enumeration members should be documented. See my question
in the patch description.

> +};
> +
>   /*
>    * If new Rx offload capabilities are defined, they also must be
>    * mentioned in rte_rx_offload_names in rte_ethdev.c file.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
  2022-04-07 10:47     ` Andrew Rybchenko
@ 2022-04-07 13:26     ` Jerin Jacob
  2022-04-12 16:40       ` Ding, Xuan
  2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
  2 siblings, 1 reply; 88+ messages in thread
From: Jerin Jacob @ 2022-04-07 13:26 UTC (permalink / raw)
  To: wenxuanx.wu
  Cc: Thomas Monjalon, Andrew Rybchenko, Xiaoyun Li, aman.deep.singh,
	yuying.zhang, Qi Zhang, dpdk-dev, Stephen Hemminger,
	Morten Brørup, Viacheslav Ovsiienko, ping.yu, Xuan Ding,
	Yuan Wang

On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
>
> From: Xuan Ding <xuan.ding@intel.com>
>
> Header split consists of splitting a received packet into two separate
> regions based on the packet content. The split happens after the
> packet header and before the packet payload. Splitting is usually between
> the packet header that can be posted to a dedicated buffer and the packet
> payload that can be posted to a different buffer.
>
> Currently, Rx buffer split supports length and offset based packet split.
> Although header split is a subset of buffer split, configuring buffer
> split based on length is not suitable for NICs that do split based on
> header protocol types. Because tunneling makes the conversion from length
> to protocol type impossible.
>
> This patch extends the current buffer split to support protocol type and
> offset based header split. A new proto field is introduced in the
> rte_eth_rxseg_split structure reserved field to specify header protocol
> type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and
> protocol type configured, PMD will split the ingress packets into two
> separate regions. Currently, both inner and outer L2/L3/L4 level header
> split can be supported.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, off0=2B
>     seg1 - pool1, off1=128B
>
> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0

If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
rte_eth_rxseg_split.offset = 2,
What will be the content for seg0,
Will it be,
- offset as Starts atUDP Header
- size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
Right? If not, Please describe

Also, I don't think we need duplate
rte_eth_rx_header_split_protocol_type instead we can
reuse existing RTE_PTYPE_*  flags.


>     seg1 - payload @ 128 in mbuf from pool1
>
> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-07 10:47     ` Andrew Rybchenko
@ 2022-04-12 16:15       ` Ding, Xuan
  2022-04-20 15:48         ` Andrew Rybchenko
  2022-04-21 10:27         ` Thomas Monjalon
  0 siblings, 2 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-04-12 16:15 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun, Singh,
	Aman Deep, Zhang, Yuying, Zhang, Qi Z
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wang, YuanX,
	david.marchand, Ferruh Yigit

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, April 7, 2022 6:48 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
> <ferruhy@xilinx.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configuring buffer
> > split based on length is not suitable for NICs that do split based on
> > header protocol types. Because tunneling makes the conversion from
> > length to protocol type impossible.
> >
> > This patch extends the current buffer split to support protocol type
> > and offset based header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > enabled and protocol type configured, PMD will split the ingress
> > packets into two separate regions. Currently, both inner and outer
> > L2/L3/L4 level header split can be supported.
> 
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
> enable header split offload with the header size controlled using
> split_hdr_size in the same structure.
> 
> Right now I see no single PMD which actually supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> Many examples and test apps initialize the field to 0 explicitly. The most of
> drivers simply ignore split_hdr_size since the offload is not advertised, but
> some double-check that its value is 0.
> 
> I think that it means that the field should be removed on the next LTS, and I'd
> say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> 
> We should not redefine the offload meaning.

Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
Previously, I used this flag is to distinguish buffer split and header split.
The former supports multi-segments split by length and offset.
The later supports two segments split by proto and offset.
At this level, header split is a subset of buffer split.

Since we shouldn't redefine the meaning of this offload,
I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
The existence of tunnel needs to define a proto field in buffer split,
because some PMDs do not support split based on length and offset.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, off0=2B
> >      seg1 - pool1, off1=128B
> 
> Corresponding feature is named Rx buffer split.
> Does it mean that protocol type based header split requires Rx buffer split
> feature to be supported?

Protocol type based header split does not requires Rx buffer split.
In previous design, the header split and buffer split are exclusive.
Because we only configure one split offload for one RX queue.

> 
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >      seg1 - payload @ 128 in mbuf from pool1
> 
> Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
> IPv6 to be supported? What will happen if only one is supported? How
> application can find out which protocol stack are supported?

Both inner and outer UDP are considered.
Current design does not distinguish UDP over IPv4 or IPv6.
If we want to support granularity like only IPv4 or IPv6 supported,
user need add more configurations.

If application want to find out which protocol stack is supported,
one way I think is to expose the protocol stack supported by the driver through dev_info.
Any thoughts are welcomed :)

> 
> >
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >   lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> >   lib/ethdev/rte_ethdev.h | 48
> +++++++++++++++++++++++++++++++++++++++--
> >   2 files changed, 72 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..29adcdc2f0 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint16_t proto = rx_seg[seg_idx].proto;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13
> > +1695,29 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> > +			/* Check buffer split. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Check header split. */
> > +			if (length != 0) {
> > +				RTE_ETHDEV_LOG(ERR, "segment length
> should be set to zero in header split\n");
> > +				return -EINVAL;
> > +			}
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> segment offset)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
> >   		n_seg = rx_conf->rx_nseg;
> >
> > -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> {
> > +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> ||
> > +			rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
> >   			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 04cff8ee10..e8371b98ed 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * Header split is a subset of buffer split. The split happens after
> > + the
> > + * packet header and before the packet payload. For PMDs that do not
> > + * support header split configuration by length, the location of the
> > + split
> > + * needs to be specified by the header protocol type. While for
> > + buffer split,
> > + * this field should not be configured.
> > + *
> > + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
> > + * the PMD will split the received packets into two separate regions:
> > + * - The header buffer will be allocated from the memory pool,
> > + *   specified in the first array element, the second buffer, from the
> > + *   pool in the second element.
> > + *
> > + * - The lengths do not need to be configured in header split.
> > + *
> > + * - The offsets from the segment description elements specify
> > + *   the data offset from the buffer beginning except the first mbuf.
> > + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	uint16_t proto; /**< header protocol type, configures header split
> > +point. */
> 
> I realize that you don't want to use here enum defined above to save some
> reserved space, but description must refer to the enum
> rte_eth_rx_header_split_protocol_type.

Thanks for your suggestion, will fix it in next version.

> 
> > +	uint16_t reserved; /**< Reserved field. */
> 
> As far as I can see the structure is experimental. So, it should not be the
> problem to extend it, but it is a really good question raised by Stephen in RFC
> v1 discussion.
> Shouldn't we require that all reserved fields are initialized to zero and
> ignored on processing? Frankly speaking I always thought so, but failed to
> find the place were it is documented.

Yes, it can be documented. By default is should be zero, and we can configure
it to enable protocol type based buffer split.

> 
> @Thomas, @David, @Ferruh?
> 
> >   };
> >
> >   /**
> > @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
> >    * A common structure used to describe Rx packet segment properties.
> >    */
> >   union rte_eth_rxseg {
> > -	/* The settings for buffer split offload. */
> > +	/* The settings for buffer split and header split offload. */
> >   	struct rte_eth_rxseg_split split;
> >   	/* The other features settings should be added here. */
> >   };
> > @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
> >   			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
> >   #define DEV_RX_OFFLOAD_VLAN
> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> > RTE_ETH_RX_OFFLOAD_VLAN
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this enum may change without prior notice.
> > + * This enum indicates the header split protocol type  */ enum
> > +rte_eth_rx_header_split_protocol_type {
> > +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> > +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> > +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> > +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> > +	RTE_ETH_RX_HEADER_SPLIT_L3,
> > +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> > +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> > +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> > +	RTE_ETH_RX_HEADER_SPLIT_L4,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> > +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
> 
> Enumeration members should be documented. See my question in the patch
> description.

Thanks for your detailed comments, questions are answered accordingly.

Best Regards,
Xuan

> 
> > +};
> > +
> >   /*
> >    * If new Rx offload capabilities are defined, they also must be
> >    * mentioned in rte_rx_offload_names in rte_ethdev.c file.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-07 13:26     ` Jerin Jacob
@ 2022-04-12 16:40       ` Ding, Xuan
  2022-04-20 14:39         ` Andrew Rybchenko
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-04-12 16:40 UTC (permalink / raw)
  To: Jerin Jacob, Wu, WenxuanX
  Cc: Thomas Monjalon, Andrew Rybchenko, Li, Xiaoyun, Singh, Aman Deep,
	Zhang, Yuying, Zhang, Qi Z, dpdk-dev, Stephen Hemminger,
	Morten Brørup, Viacheslav Ovsiienko, Yu, Ping, Wang, YuanX

Hi Jacob,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, April 7, 2022 9:27 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; dpdk-dev
> <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> Morten Brørup <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > Header split consists of splitting a received packet into two separate
> > regions based on the packet content. The split happens after the
> > packet header and before the packet payload. Splitting is usually
> > between the packet header that can be posted to a dedicated buffer and
> > the packet payload that can be posted to a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > Although header split is a subset of buffer split, configuring buffer
> > split based on length is not suitable for NICs that do split based on
> > header protocol types. Because tunneling makes the conversion from
> > length to protocol type impossible.
> >
> > This patch extends the current buffer split to support protocol type
> > and offset based header split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > enabled and protocol type configured, PMD will split the ingress
> > packets into two separate regions. Currently, both inner and outer
> > L2/L3/L4 level header split can be supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, off0=2B
> >     seg1 - pool1, off1=128B
> >
> > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> 
> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
> - offset as Starts atUDP Header
> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
> Right? If not, Please describe

Proto defines the location in packet for split.
Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
With proto and offset configured, packets received will be split into two segments.

So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
Size of seg0 is size of UDP header, size of seg1 is size of payload.
rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.

> 
> Also, I don't think we need duplate
> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> RTE_PTYPE_*  flags.

That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.

Best Regards,
Xuan

> 
> 
> >     seg1 - payload @ 128 in mbuf from pool1
> >
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-12 16:40       ` Ding, Xuan
@ 2022-04-20 14:39         ` Andrew Rybchenko
  2022-04-21 10:36           ` Thomas Monjalon
  2022-04-25  9:23           ` Ding, Xuan
  0 siblings, 2 replies; 88+ messages in thread
From: Andrew Rybchenko @ 2022-04-20 14:39 UTC (permalink / raw)
  To: Ding, Xuan, Jerin Jacob, Wu, WenxuanX
  Cc: Thomas Monjalon, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, dpdk-dev, Stephen Hemminger, Morten Brørup,
	Viacheslav Ovsiienko, Yu, Ping, Wang, YuanX

On 4/12/22 19:40, Ding, Xuan wrote:
> Hi Jacob,
> 
>> -----Original Message-----
>> From: Jerin Jacob <jerinjacobk@gmail.com>
>> Sent: Thursday, April 7, 2022 9:27 PM
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
>> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
>> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; dpdk-dev
>> <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
>> Morten Brørup <mb@smartsharesystems.com>; Viacheslav Ovsiienko
>> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
>> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
>> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
>>
>> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
>>>
>>> From: Xuan Ding <xuan.ding@intel.com>
>>>
>>> Header split consists of splitting a received packet into two separate
>>> regions based on the packet content. The split happens after the
>>> packet header and before the packet payload. Splitting is usually
>>> between the packet header that can be posted to a dedicated buffer and
>>> the packet payload that can be posted to a different buffer.
>>>
>>> Currently, Rx buffer split supports length and offset based packet split.
>>> Although header split is a subset of buffer split, configuring buffer
>>> split based on length is not suitable for NICs that do split based on
>>> header protocol types. Because tunneling makes the conversion from
>>> length to protocol type impossible.
>>>
>>> This patch extends the current buffer split to support protocol type
>>> and offset based header split. A new proto field is introduced in the
>>> rte_eth_rxseg_split structure reserved field to specify header
>>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
>>> enabled and protocol type configured, PMD will split the ingress
>>> packets into two separate regions. Currently, both inner and outer
>>> L2/L3/L4 level header split can be supported.
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>      seg0 - pool0, off0=2B
>>>      seg1 - pool1, off1=128B
>>>
>>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>>
>> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
>> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
>> - offset as Starts atUDP Header
>> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
>> Right? If not, Please describe
> 
> Proto defines the location in packet for split.
> Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
> With proto and offset configured, packets received will be split into two segments.
> 
> So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
> Size of seg0 is size of UDP header, size of seg1 is size of payload.
> rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.

Above discussion proves that definition of the struct
rte_eth_rxseg_split is misleading. It is hard to catch
from naming that length defines a maximum data amount
to be copied, but office is a an offset in destination
mbuf. The structure is still experimental and I think
we should improve naming: offset -> mbuf_offset?

> 
>>
>> Also, I don't think we need duplate
>> rte_eth_rx_header_split_protocol_type instead we can reuse existing
>> RTE_PTYPE_*  flags.
> 
> That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
> If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.
> 
> Best Regards,
> Xuan
> 
>>
>>
>>>      seg1 - payload @ 128 in mbuf from pool1
>>>
>>> The memory attributes for the split parts may differ either - for
>>> example the mempool0 and mempool1 belong to dpdk memory and
>> external
>>> memory, respectively.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-12 16:15       ` Ding, Xuan
@ 2022-04-20 15:48         ` Andrew Rybchenko
  2022-04-25 14:57           ` Ding, Xuan
  2022-04-21 10:27         ` Thomas Monjalon
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-04-20 15:48 UTC (permalink / raw)
  To: Ding, Xuan, Wu, WenxuanX, thomas, Li, Xiaoyun, Singh, Aman Deep,
	Zhang, Yuying, Zhang, Qi Z
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wang, YuanX,
	david.marchand, Ferruh Yigit

Hi Xuan,

On 4/12/22 19:15, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, April 7, 2022 6:48 PM
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
>> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
>> Zhang, Qi Z <qi.z.zhang@intel.com>
>> Cc: dev@dpdk.org; stephen@networkplumber.org;
>> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
>> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
>> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
>> <ferruhy@xilinx.com>
>> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
>>
>> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
>>> From: Xuan Ding <xuan.ding@intel.com>
>>>
>>> Header split consists of splitting a received packet into two separate
>>> regions based on the packet content. The split happens after the
>>> packet header and before the packet payload. Splitting is usually
>>> between the packet header that can be posted to a dedicated buffer and
>>> the packet payload that can be posted to a different buffer.
>>>
>>> Currently, Rx buffer split supports length and offset based packet split.
>>> Although header split is a subset of buffer split, configuring buffer
>>> split based on length is not suitable for NICs that do split based on
>>> header protocol types. Because tunneling makes the conversion from
>>> length to protocol type impossible.
>>>
>>> This patch extends the current buffer split to support protocol type
>>> and offset based header split. A new proto field is introduced in the
>>> rte_eth_rxseg_split structure reserved field to specify header
>>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
>>> enabled and protocol type configured, PMD will split the ingress
>>> packets into two separate regions. Currently, both inner and outer
>>> L2/L3/L4 level header split can be supported.
>>
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
>> ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
>> enable header split offload with the header size controlled using
>> split_hdr_size in the same structure.
>>
>> Right now I see no single PMD which actually supports
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
>> Many examples and test apps initialize the field to 0 explicitly. The most of
>> drivers simply ignore split_hdr_size since the offload is not advertised, but
>> some double-check that its value is 0.
>>
>> I think that it means that the field should be removed on the next LTS, and I'd
>> say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
>>
>> We should not redefine the offload meaning.
> 
> Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> Previously, I used this flag is to distinguish buffer split and header split.
> The former supports multi-segments split by length and offset.

offset is misleading here, since split offset is derived from
segment lengths. Offset specified in segments is a different
thing.

> The later supports two segments split by proto and offset.
> At this level, header split is a subset of buffer split.

IMHO, generic definition of the header split should not limit
it by just two segments.

> 
> Since we shouldn't redefine the meaning of this offload,
> I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> The existence of tunnel needs to define a proto field in buffer split,
> because some PMDs do not support split based on length and offset.

Not sure that I fully understand, but I'm looking forward
to review v5.

>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>       seg0 - pool0, off0=2B
>>>       seg1 - pool1, off1=128B
>>
>> Corresponding feature is named Rx buffer split.
>> Does it mean that protocol type based header split requires Rx buffer split
>> feature to be supported?
> 
> Protocol type based header split does not requires Rx buffer split.
> In previous design, the header split and buffer split are exclusive.
> Because we only configure one split offload for one RX queue.
> 
>>
>>>
>>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
>>>       seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>> pool0
>>>       seg1 - payload @ 128 in mbuf from pool1
>>
>> Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
>> IPv6 to be supported? What will happen if only one is supported? How
>> application can find out which protocol stack are supported?
> 
> Both inner and outer UDP are considered.
> Current design does not distinguish UDP over IPv4 or IPv6.
> If we want to support granularity like only IPv4 or IPv6 supported,
> user need add more configurations.

You should make it clear for application how to use it.
What happens if unsupported packet is received on an RxQ
configured to do header split?

> 
> If application want to find out which protocol stack is supported,
> one way I think is to expose the protocol stack supported by the driver through dev_info.
> Any thoughts are welcomed :)

dev_info is nice, but very heavily overloaded. We can start
from dev_info and understand if it should be factored out
to a separate API or it is OK to have it in dev_info if
it just few simple fields.

>>
>>>
>>> The memory attributes for the split parts may differ either - for
>>> example the mempool0 and mempool1 belong to dpdk memory and
>> external
>>> memory, respectively.
>>>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
>>> ---
>>>    lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
>>>    lib/ethdev/rte_ethdev.h | 48
>> +++++++++++++++++++++++++++++++++++++++--
>>>    2 files changed, 72 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 29a3d80466..29adcdc2f0 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>    		uint32_t length = rx_seg[seg_idx].length;
>>>    		uint32_t offset = rx_seg[seg_idx].offset;
>>> +		uint16_t proto = rx_seg[seg_idx].proto;
>>>
>>>    		if (mpl == NULL) {
>>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1694,13
>>> +1695,29 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		}
>>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>> -		length = length != 0 ? length : *mbp_buf_size;
>>> -		if (*mbp_buf_size < length + offset) {
>>> -			RTE_ETHDEV_LOG(ERR,
>>> -				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> -				       mpl->name, *mbp_buf_size,
>>> -				       length + offset, length, offset);
>>> -			return -EINVAL;
>>> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
>>> +			/* Check buffer split. */
>>> +			length = length != 0 ? length : *mbp_buf_size;
>>> +			if (*mbp_buf_size < length + offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					length + offset, length, offset);
>>> +				return -EINVAL;
>>> +			}
>>> +		} else {
>>> +			/* Check header split. */
>>> +			if (length != 0) {
>>> +				RTE_ETHDEV_LOG(ERR, "segment length
>> should be set to zero in header split\n");
>>> +				return -EINVAL;
>>> +			}
>>> +			if (*mbp_buf_size < offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> segment offset)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					offset);
>>> +				return -EINVAL;
>>> +			}
>>>    		}
>>>    	}
>>>    	return 0;
>>> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
>> uint16_t rx_queue_id,
>>>    		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
>>>    		n_seg = rx_conf->rx_nseg;
>>>
>>> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
>> {
>>> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
>> ||
>>> +			rx_conf->offloads &
>> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
>>>    			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
>>>    							   &mbp_buf_size,
>>>    							   &dev_info);
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> 04cff8ee10..e8371b98ed 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
>>>     *     - pool from the last valid element
>>>     *     - the buffer size from this pool
>>>     *     - zero offset
>>> + *
>>> + * Header split is a subset of buffer split. The split happens after
>>> + the
>>> + * packet header and before the packet payload. For PMDs that do not
>>> + * support header split configuration by length, the location of the
>>> + split
>>> + * needs to be specified by the header protocol type. While for
>>> + buffer split,
>>> + * this field should not be configured.
>>> + *
>>> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads field,
>>> + * the PMD will split the received packets into two separate regions:
>>> + * - The header buffer will be allocated from the memory pool,
>>> + *   specified in the first array element, the second buffer, from the
>>> + *   pool in the second element.
>>> + *
>>> + * - The lengths do not need to be configured in header split.
>>> + *
>>> + * - The offsets from the segment description elements specify
>>> + *   the data offset from the buffer beginning except the first mbuf.
>>> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
>>>     */
>>>    struct rte_eth_rxseg_split {
>>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
>> from. */
>>>    	uint16_t length; /**< Segment data length, configures split point. */
>>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
>> */
>>> -	uint32_t reserved; /**< Reserved field. */
>>> +	uint16_t proto; /**< header protocol type, configures header split
>>> +point. */
>>
>> I realize that you don't want to use here enum defined above to save some
>> reserved space, but description must refer to the enum
>> rte_eth_rx_header_split_protocol_type.
> 
> Thanks for your suggestion, will fix it in next version.
> 
>>
>>> +	uint16_t reserved; /**< Reserved field. */
>>
>> As far as I can see the structure is experimental. So, it should not be the
>> problem to extend it, but it is a really good question raised by Stephen in RFC
>> v1 discussion.
>> Shouldn't we require that all reserved fields are initialized to zero and
>> ignored on processing? Frankly speaking I always thought so, but failed to
>> find the place were it is documented.
> 
> Yes, it can be documented. By default is should be zero, and we can configure
> it to enable protocol type based buffer split.
> 
>>
>> @Thomas, @David, @Ferruh?
>>
>>>    };
>>>
>>>    /**
>>> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
>>>     * A common structure used to describe Rx packet segment properties.
>>>     */
>>>    union rte_eth_rxseg {
>>> -	/* The settings for buffer split offload. */
>>> +	/* The settings for buffer split and header split offload. */
>>>    	struct rte_eth_rxseg_split split;
>>>    	/* The other features settings should be added here. */
>>>    };
>>> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
>>>    			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
>>>    #define DEV_RX_OFFLOAD_VLAN
>> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
>>> RTE_ETH_RX_OFFLOAD_VLAN
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this enum may change without prior notice.
>>> + * This enum indicates the header split protocol type  */ enum
>>> +rte_eth_rx_header_split_protocol_type {
>>> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
>>> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
>>> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
>>> +	RTE_ETH_RX_HEADER_SPLIT_L3,
>>> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_L4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
>>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
>>
>> Enumeration members should be documented. See my question in the patch
>> description.
> 
> Thanks for your detailed comments, questions are answered accordingly.
> 
> Best Regards,
> Xuan
> 
>>
>>> +};
>>> +
>>>    /*
>>>     * If new Rx offload capabilities are defined, they also must be
>>>     * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-12 16:15       ` Ding, Xuan
  2022-04-20 15:48         ` Andrew Rybchenko
@ 2022-04-21 10:27         ` Thomas Monjalon
  2022-04-25 15:05           ` Ding, Xuan
  1 sibling, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-04-21 10:27 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, Li, Xiaoyun, Singh, Aman Deep,
	Zhang, Yuying, Zhang, Qi Z, dev
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wang, YuanX,
	david.marchand, Ferruh Yigit, Ding, Xuan

12/04/2022 18:15, Ding, Xuan:
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > > From: Xuan Ding <xuan.ding@intel.com>
> > >
> > > Header split consists of splitting a received packet into two separate
> > > regions based on the packet content. The split happens after the
> > > packet header and before the packet payload. Splitting is usually
> > > between the packet header that can be posted to a dedicated buffer and
> > > the packet payload that can be posted to a different buffer.
> > >
> > > Currently, Rx buffer split supports length and offset based packet split.
> > > Although header split is a subset of buffer split, configuring buffer
> > > split based on length is not suitable for NICs that do split based on
> > > header protocol types. Because tunneling makes the conversion from
> > > length to protocol type impossible.
> > >
> > > This patch extends the current buffer split to support protocol type
> > > and offset based header split. A new proto field is introduced in the
> > > rte_eth_rxseg_split structure reserved field to specify header
> > > protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> > > enabled and protocol type configured, PMD will split the ingress
> > > packets into two separate regions. Currently, both inner and outer
> > > L2/L3/L4 level header split can be supported.
> > 
> > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> > ago to substitute bit-field header_split in struct rte_eth_rxmode. It allows to
> > enable header split offload with the header size controlled using
> > split_hdr_size in the same structure.
> > 
> > Right now I see no single PMD which actually supports
> > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> > Many examples and test apps initialize the field to 0 explicitly. The most of
> > drivers simply ignore split_hdr_size since the offload is not advertised, but
> > some double-check that its value is 0.
> > 
> > I think that it means that the field should be removed on the next LTS, and I'd
> > say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> > 
> > We should not redefine the offload meaning.
> 
> Yes, you are right. No single PMD supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> Previously, I used this flag is to distinguish buffer split and header split.
> The former supports multi-segments split by length and offset.
> The later supports two segments split by proto and offset.
> At this level, header split is a subset of buffer split.
> 
> Since we shouldn't redefine the meaning of this offload,
> I will use the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> The existence of tunnel needs to define a proto field in buffer split,
> because some PMDs do not support split based on length and offset.

Before doing anything, the first patch of this series should make
the current status clearer.
Example, this line does not explain what it does:
	uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old offload API")

If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed,
let's add a comment to start a deprecation.

> > > For example, let's suppose we configured the Rx queue with the
> > > following segments:
> > >      seg0 - pool0, off0=2B
> > >      seg1 - pool1, off1=128B
> > 
> > Corresponding feature is named Rx buffer split.
> > Does it mean that protocol type based header split requires Rx buffer split
> > feature to be supported?
> 
> Protocol type based header split does not requires Rx buffer split.
> In previous design, the header split and buffer split are exclusive.
> Because we only configure one split offload for one RX queue.

Things must be made clear and documented.

> > > With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> > > the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> > >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> > pool0
> > >      seg1 - payload @ 128 in mbuf from pool1
> > 
> > Is it always outermost UDP? Does it require both UDP over IPv4 and UDP over
> > IPv6 to be supported? What will happen if only one is supported? How
> > application can find out which protocol stack are supported?
> 
> Both inner and outer UDP are considered.
> Current design does not distinguish UDP over IPv4 or IPv6.
> If we want to support granularity like only IPv4 or IPv6 supported,
> user need add more configurations.
> 
> If application want to find out which protocol stack is supported,
> one way I think is to expose the protocol stack supported by the driver through dev_info.
> Any thoughts are welcomed :)
[...]
> > > +	uint16_t reserved; /**< Reserved field. */
> > 
> > As far as I can see the structure is experimental. So, it should not be the
> > problem to extend it, but it is a really good question raised by Stephen in RFC
> > v1 discussion.
> > Shouldn't we require that all reserved fields are initialized to zero and
> > ignored on processing? Frankly speaking I always thought so, but failed to
> > find the place were it is documented.
> 
> Yes, it can be documented. By default is should be zero, and we can configure
> it to enable protocol type based buffer split.
> 
> > @Thomas, @David, @Ferruh?

Yes that's very important to have a clear state of the reserved fields.
A value must be set and documented.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-20 14:39         ` Andrew Rybchenko
@ 2022-04-21 10:36           ` Thomas Monjalon
  2022-04-25  9:23           ` Ding, Xuan
  1 sibling, 0 replies; 88+ messages in thread
From: Thomas Monjalon @ 2022-04-21 10:36 UTC (permalink / raw)
  To: Ding, Xuan, Jerin Jacob, Wu, WenxuanX, dev
  Cc: Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z,
	dpdk-dev, Stephen Hemminger, Morten Brørup,
	Viacheslav Ovsiienko, Yu, Ping, Wang, YuanX, Andrew Rybchenko

20/04/2022 16:39, Andrew Rybchenko:
> On 4/12/22 19:40, Ding, Xuan wrote:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> >> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two separate
> >>> regions based on the packet content. The split happens after the
> >>> packet header and before the packet payload. Splitting is usually
> >>> between the packet header that can be posted to a dedicated buffer and
> >>> the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring buffer
> >>> split based on length is not suitable for NICs that do split based on
> >>> header protocol types. Because tunneling makes the conversion from
> >>> length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in the
> >>> rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>      seg0 - pool0, off0=2B
> >>>      seg1 - pool1, off1=128B
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >>
> >> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP and
> >> rte_eth_rxseg_split.offset = 2, What will be the content for seg0, Will it be,
> >> - offset as Starts atUDP Header
> >> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start from128).
> >> Right? If not, Please describe
> > 
> > Proto defines the location in packet for split.
> > Offset defines data buffer from beginning of mbuf data buffer, it can be zero.
> > With proto and offset configured, packets received will be split into two segments.
> > 
> > So in this configuration, the seg0 content is UDP header, the seg1 content is the payload.
> > Size of seg0 is size of UDP header, size of seg1 is size of payload.
> > rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than segment size.
> 
> Above discussion proves that definition of the struct
> rte_eth_rxseg_split is misleading. It is hard to catch
> from naming that length defines a maximum data amount
> to be copied, but office is a an offset in destination
> mbuf. The structure is still experimental and I think
> we should improve naming: offset -> mbuf_offset?

I agree it is confusing.
mbuf_offset could be a better name.
length could be renamed as well. Is data_length better?

But the most important is to have a clear description
in the doxygen comment of the field.
We must specify what is the starting point and the "end" for those fields.

> >> Also, I don't think we need duplate
> >> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> >> RTE_PTYPE_*  flags.
> > 
> > That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> > concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved fields.
> > If this proposal is agreed, I will use RTE_PTYPE_* instead of rte_eth_rx_header_split_protocol_type.

Yes I think RTE_PTYPE_* is appropriate.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-20 14:39         ` Andrew Rybchenko
  2022-04-21 10:36           ` Thomas Monjalon
@ 2022-04-25  9:23           ` Ding, Xuan
  1 sibling, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-04-25  9:23 UTC (permalink / raw)
  To: Andrew Rybchenko, Jerin Jacob, Wu, WenxuanX
  Cc: Thomas Monjalon, Li, Xiaoyun, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, dpdk-dev, Stephen Hemminger, Morten Brørup,
	Viacheslav Ovsiienko, Yu, Ping, Wang, YuanX

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, April 20, 2022 10:40 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Jerin Jacob <jerinjacobk@gmail.com>;
> Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> <stephen@networkplumber.org>; Morten Brørup
> <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> On 4/12/22 19:40, Ding, Xuan wrote:
> > Hi Jacob,
> >
> >> -----Original Message-----
> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> Sent: Thursday, April 7, 2022 9:27 PM
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> >> Cc: Thomas Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Li, Xiaoyun <xiaoyun.li@intel.com>;
> >> Singh, Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> >> <stephen@networkplumber.org>; Morten Brørup
> >> <mb@smartsharesystems.com>; Viacheslav Ovsiienko
> >> <viacheslavo@nvidia.com>; Yu, Ping <ping.yu@intel.com>; Ding, Xuan
> >> <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> >> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header
> >> split
> >>
> >> On Sat, Apr 2, 2022 at 4:33 PM <wenxuanx.wu@intel.com> wrote:
> >>>
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two
> >>> separate regions based on the packet content. The split happens
> >>> after the packet header and before the packet payload. Splitting is
> >>> usually between the packet header that can be posted to a dedicated
> >>> buffer and the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring
> >>> buffer split based on length is not suitable for NICs that do split
> >>> based on header protocol types. Because tunneling makes the
> >>> conversion from length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in
> >>> the rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>      seg0 - pool0, off0=2B
> >>>      seg1 - pool1, off1=128B
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >>
> >> If we set rte_eth_rxseg_split::proto = RTE_ETH_RX_HEADER_SPLIT_UDP
> >> and rte_eth_rxseg_split.offset = 2, What will be the content for
> >> seg0, Will it be,
> >> - offset as Starts atUDP Header
> >> - size of segment as MAX(size of UDP header + 2, 128(as seg 1 start
> from128).
> >> Right? If not, Please describe
> >
> > Proto defines the location in packet for split.
> > Offset defines data buffer from beginning of mbuf data buffer, it can be
> zero.
> > With proto and offset configured, packets received will be split into two
> segments.
> >
> > So in this configuration, the seg0 content is UDP header, the seg1 content is
> the payload.
> > Size of seg0 is size of UDP header, size of seg1 is size of payload.
> > rte_eth_rxseg_split.offset = 2/128 decides the mbuf offset, rather than
> segment size.
> 
> Above discussion proves that definition of the struct rte_eth_rxseg_split is
> misleading. It is hard to catch from naming that length defines a maximum
> data amount to be copied, but office is a an offset in destination mbuf. The
> structure is still experimental and I think we should improve naming: offset ->
> mbuf_offset?

Yes, you are right. In rte_eth_rxseg_split structure, even the length and offset
are documented, it is hard to understand just from the naming.

Thanks,
Xuan

> 
> >
> >>
> >> Also, I don't think we need duplate
> >> rte_eth_rx_header_split_protocol_type instead we can reuse existing
> >> RTE_PTYPE_*  flags.
> >
> > That's a good idea. Yes, I can use the RTE_PTYPE_* here. My only
> > concern is the 32-bits RTE_PTYPE_* will run out of the 32-bits reserved
> fields.
> > If this proposal is agreed, I will use RTE_PTYPE_* instead of
> rte_eth_rx_header_split_protocol_type.
> >
> > Best Regards,
> > Xuan
> >
> >>
> >>
> >>>      seg1 - payload @ 128 in mbuf from pool1
> >>>
> >>> The memory attributes for the split parts may differ either - for
> >>> example the mempool0 and mempool1 belong to dpdk memory and
> >> external
> >>> memory, respectively.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-20 15:48         ` Andrew Rybchenko
@ 2022-04-25 14:57           ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-04-25 14:57 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun, Singh,
	Aman Deep, Zhang, Yuying, Zhang, Qi Z
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wang, YuanX,
	david.marchand, Ferruh Yigit

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Wednesday, April 20, 2022 11:48 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; Singh, Aman Deep <aman.deep.singh@intel.com>;
> Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> david.marchand@redhat.com; Ferruh Yigit <ferruhy@xilinx.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> Hi Xuan,
> 
> On 4/12/22 19:15, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, April 7, 2022 6:48 PM
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
> Li,
> >> Xiaoyun <xiaoyun.li@intel.com>; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> >> Zhang, Qi Z <qi.z.zhang@intel.com>
> >> Cc: dev@dpdk.org; stephen@networkplumber.org;
> >> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> >> <ping.yu@intel.com>; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> >> <yuanx.wang@intel.com>; david.marchand@redhat.com; Ferruh Yigit
> >> <ferruhy@xilinx.com>
> >> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header
> >> split
> >>
> >> On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> >>> From: Xuan Ding <xuan.ding@intel.com>
> >>>
> >>> Header split consists of splitting a received packet into two
> >>> separate regions based on the packet content. The split happens
> >>> after the packet header and before the packet payload. Splitting is
> >>> usually between the packet header that can be posted to a dedicated
> >>> buffer and the packet payload that can be posted to a different buffer.
> >>>
> >>> Currently, Rx buffer split supports length and offset based packet split.
> >>> Although header split is a subset of buffer split, configuring
> >>> buffer split based on length is not suitable for NICs that do split
> >>> based on header protocol types. Because tunneling makes the
> >>> conversion from length to protocol type impossible.
> >>>
> >>> This patch extends the current buffer split to support protocol type
> >>> and offset based header split. A new proto field is introduced in
> >>> the rte_eth_rxseg_split structure reserved field to specify header
> >>> protocol type. With Rx offload flag
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> >>> enabled and protocol type configured, PMD will split the ingress
> >>> packets into two separate regions. Currently, both inner and outer
> >>> L2/L3/L4 level header split can be supported.
> >>
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time
> ago
> >> to substitute bit-field header_split in struct rte_eth_rxmode. It
> >> allows to enable header split offload with the header size controlled
> >> using split_hdr_size in the same structure.
> >>
> >> Right now I see no single PMD which actually supports
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> >> Many examples and test apps initialize the field to 0 explicitly. The
> >> most of drivers simply ignore split_hdr_size since the offload is not
> >> advertised, but some double-check that its value is 0.
> >>
> >> I think that it means that the field should be removed on the next
> >> LTS, and I'd say, together with the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
> offload bit.
> >>
> >> We should not redefine the offload meaning.
> >
> > Yes, you are right. No single PMD supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> > Previously, I used this flag is to distinguish buffer split and header split.
> > The former supports multi-segments split by length and offset.
> 
> offset is misleading here, since split offset is derived from segment lengths.
> Offset specified in segments is a different thing.

Yes, the length defines the segment length, and the offset defines the data  offset in mbuf.
The usage of length and offset are explained in the comments, but it is somewhat misleading
just from name.

> 
> > The later supports two segments split by proto and offset.
> > At this level, header split is a subset of buffer split.
> 
> IMHO, generic definition of the header split should not limit it by just two
> segments.

Does the header split here refer to the traditional header split?
If so, since you mentioned before we should not redefine the offload meaning,
I will use protocol and mbuf_offset based buffer split in next version.

It is worth noting that the purpose of specifying the split location by protocol is
to divide a packet into two segments. If you want to divide into multiple segments,
it should still be specified by length.

> 
> >
> > Since we shouldn't redefine the meaning of this offload, I will use
> > the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> > The existence of tunnel needs to define a proto field in buffer split,
> > because some PMDs do not support split based on length and offset.
> 
> Not sure that I fully understand, but I'm looking forward to review v5.

Thanks for your comments, I will send a v5, including these main changes:
1. Use protocol and mbuf_offset based buffer split instead of header split.
2. Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
3. Improve the description of rte_eth_rxseg_split.proto.

Your comments are welcomed. 😊

> 
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>       seg0 - pool0, off0=2B
> >>>       seg1 - pool1, off1=128B
> >>
> >> Corresponding feature is named Rx buffer split.
> >> Does it mean that protocol type based header split requires Rx buffer
> >> split feature to be supported?
> >
> > Protocol type based header split does not requires Rx buffer split.
> > In previous design, the header split and buffer split are exclusive.
> > Because we only configure one split offload for one RX queue.
> >
> >>
> >>>
> >>> With header split type configured with RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> the packet consists of MAC_IP_UDP_PAYLOAD will be split like following:
> >>>       seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> >> pool0
> >>>       seg1 - payload @ 128 in mbuf from pool1
> >>
> >> Is it always outermost UDP? Does it require both UDP over IPv4 and
> >> UDP over
> >> IPv6 to be supported? What will happen if only one is supported? How
> >> application can find out which protocol stack are supported?
> >
> > Both inner and outer UDP are considered.
> > Current design does not distinguish UDP over IPv4 or IPv6.
> > If we want to support granularity like only IPv4 or IPv6 supported,
> > user need add more configurations.

Thanks for your suggestion.
I will improve the documents about the usage of proto based buffer split.

> 
> You should make it clear for application how to use it.
> What happens if unsupported packet is received on an RxQ configured to do
> header split?


In fact, the buffer split and rte_flow are used in combination. It is expected that
the received packets will be steering to the RXQ configured with buffer split
offload. So there won't be unsupported packet received on an RXQ.

> 
> >
> > If application want to find out which protocol stack is supported, one
> > way I think is to expose the protocol stack supported by the driver through
> dev_info.
> > Any thoughts are welcomed :)
> 
> dev_info is nice, but very heavily overloaded. We can start from dev_info
> and understand if it should be factored out to a separate API or it is OK to
> have it in dev_info if it just few simple fields.

I'm also thinking exposing the protocol stack by dev_info is heavy.
We can configure all the protocol stack, and driver supports
part of the stacks. For protocols driver not supported, driver can returns the error.
What do you think of this design?

Regards,
Xuan

> 
> >>
> >>>
> >>> The memory attributes for the split parts may differ either - for
> >>> example the mempool0 and mempool1 belong to dpdk memory and
> >> external
> >>> memory, respectively.
> >>>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> >>> ---
> >>>    lib/ethdev/rte_ethdev.c | 34 ++++++++++++++++++++++-------
> >>>    lib/ethdev/rte_ethdev.h | 48
> >> +++++++++++++++++++++++++++++++++++++++--
> >>>    2 files changed, 72 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 29a3d80466..29adcdc2f0 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>    		uint32_t length = rx_seg[seg_idx].length;
> >>>    		uint32_t offset = rx_seg[seg_idx].offset;
> >>> +		uint16_t proto = rx_seg[seg_idx].proto;
> >>>
> >>>    		if (mpl == NULL) {
> >>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1694,13
> >>> +1695,29 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		}
> >>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>> -		length = length != 0 ? length : *mbp_buf_size;
> >>> -		if (*mbp_buf_size < length + offset) {
> >>> -			RTE_ETHDEV_LOG(ERR,
> >>> -				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> -				       mpl->name, *mbp_buf_size,
> >>> -				       length + offset, length, offset);
> >>> -			return -EINVAL;
> >>> +		if (proto == RTE_ETH_RX_HEADER_SPLIT_NONE) {
> >>> +			/* Check buffer split. */
> >>> +			length = length != 0 ? length : *mbp_buf_size;
> >>> +			if (*mbp_buf_size < length + offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					length + offset, length, offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +		} else {
> >>> +			/* Check header split. */
> >>> +			if (length != 0) {
> >>> +				RTE_ETHDEV_LOG(ERR, "segment length
> >> should be set to zero in header split\n");
> >>> +				return -EINVAL;
> >>> +			}
> >>> +			if (*mbp_buf_size < offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> segment offset)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>>    		}
> >>>    	}
> >>>    	return 0;
> >>> @@ -1778,7 +1795,8 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> >> uint16_t rx_queue_id,
> >>>    		rx_seg = (const struct rte_eth_rxseg_split *)rx_conf->rx_seg;
> >>>    		n_seg = rx_conf->rx_nseg;
> >>>
> >>> -		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
> >> {
> >>> +		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> >> ||
> >>> +			rx_conf->offloads &
> >> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT) {
> >>>    			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> >>>    							   &mbp_buf_size,
> >>>    							   &dev_info);
> >>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >>> 04cff8ee10..e8371b98ed 100644
> >>> --- a/lib/ethdev/rte_ethdev.h
> >>> +++ b/lib/ethdev/rte_ethdev.h
> >>> @@ -1197,12 +1197,31 @@ struct rte_eth_txmode {
> >>>     *     - pool from the last valid element
> >>>     *     - the buffer size from this pool
> >>>     *     - zero offset
> >>> + *
> >>> + * Header split is a subset of buffer split. The split happens
> >>> + after the
> >>> + * packet header and before the packet payload. For PMDs that do
> >>> + not
> >>> + * support header split configuration by length, the location of
> >>> + the split
> >>> + * needs to be specified by the header protocol type. While for
> >>> + buffer split,
> >>> + * this field should not be configured.
> >>> + *
> >>> + * If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT flag is set in offloads
> >>> + field,
> >>> + * the PMD will split the received packets into two separate regions:
> >>> + * - The header buffer will be allocated from the memory pool,
> >>> + *   specified in the first array element, the second buffer, from the
> >>> + *   pool in the second element.
> >>> + *
> >>> + * - The lengths do not need to be configured in header split.
> >>> + *
> >>> + * - The offsets from the segment description elements specify
> >>> + *   the data offset from the buffer beginning except the first mbuf.
> >>> + *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> >>>     */
> >>>    struct rte_eth_rxseg_split {
> >>>    	struct rte_mempool *mp; /**< Memory pool to allocate segment
> >> from. */
> >>>    	uint16_t length; /**< Segment data length, configures split point. */
> >>>    	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> >> */
> >>> -	uint32_t reserved; /**< Reserved field. */
> >>> +	uint16_t proto; /**< header protocol type, configures header split
> >>> +point. */
> >>
> >> I realize that you don't want to use here enum defined above to save
> >> some reserved space, but description must refer to the enum
> >> rte_eth_rx_header_split_protocol_type.
> >
> > Thanks for your suggestion, will fix it in next version.
> >
> >>
> >>> +	uint16_t reserved; /**< Reserved field. */
> >>
> >> As far as I can see the structure is experimental. So, it should not
> >> be the problem to extend it, but it is a really good question raised
> >> by Stephen in RFC
> >> v1 discussion.
> >> Shouldn't we require that all reserved fields are initialized to zero
> >> and ignored on processing? Frankly speaking I always thought so, but
> >> failed to find the place were it is documented.
> >
> > Yes, it can be documented. By default is should be zero, and we can
> > configure it to enable protocol type based buffer split.
> >
> >>
> >> @Thomas, @David, @Ferruh?
> >>
> >>>    };
> >>>
> >>>    /**
> >>> @@ -1212,7 +1231,7 @@ struct rte_eth_rxseg_split {
> >>>     * A common structure used to describe Rx packet segment properties.
> >>>     */
> >>>    union rte_eth_rxseg {
> >>> -	/* The settings for buffer split offload. */
> >>> +	/* The settings for buffer split and header split offload. */
> >>>    	struct rte_eth_rxseg_split split;
> >>>    	/* The other features settings should be added here. */
> >>>    };
> >>> @@ -1664,6 +1683,31 @@ struct rte_eth_conf {
> >>>    			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
> >>>    #define DEV_RX_OFFLOAD_VLAN
> >> RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> >>> RTE_ETH_RX_OFFLOAD_VLAN
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this enum may change without prior notice.
> >>> + * This enum indicates the header split protocol type  */ enum
> >>> +rte_eth_rx_header_split_protocol_type {
> >>> +	RTE_ETH_RX_HEADER_SPLIT_NONE = 0,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_MAC,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_IPV4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_IPV6,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_L3,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_TCP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_UDP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_SCTP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_L4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_MAC,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV4,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_IPV6,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L3,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_TCP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_UDP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_SCTP,
> >>> +	RTE_ETH_RX_HEADER_SPLIT_INNER_L4,
> >>
> >> Enumeration members should be documented. See my question in the
> >> patch description.
> >
> > Thanks for your detailed comments, questions are answered accordingly.
> >
> > Best Regards,
> > Xuan
> >
> >>
> >>> +};
> >>> +
> >>>    /*
> >>>     * If new Rx offload capabilities are defined, they also must be
> >>>     * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [v4 1/3] ethdev: introduce protocol type based header split
  2022-04-21 10:27         ` Thomas Monjalon
@ 2022-04-25 15:05           ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-04-25 15:05 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko, Wu, WenxuanX, Li,  Xiaoyun,
	Singh, Aman Deep, Zhang, Yuying, Zhang, Qi Z, dev
  Cc: dev, stephen, mb, viacheslavo, Yu, Ping, Wang, YuanX,
	david.marchand, Ferruh Yigit

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, April 21, 2022 6:28 PM
> To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>; Singh,
> Aman Deep <aman.deep.singh@intel.com>; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> dev@dpdk.org
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> mb@smartsharesystems.com; viacheslavo@nvidia.com; Yu, Ping
> <ping.yu@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> david.marchand@redhat.com; Ferruh Yigit <ferruhy@xilinx.com>; Ding, Xuan
> <xuan.ding@intel.com>
> Subject: Re: [v4 1/3] ethdev: introduce protocol type based header split
> 
> 12/04/2022 18:15, Ding, Xuan:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > On 4/2/22 13:41, wenxuanx.wu@intel.com wrote:
> > > > From: Xuan Ding <xuan.ding@intel.com>
> > > >
> > > > Header split consists of splitting a received packet into two
> > > > separate regions based on the packet content. The split happens
> > > > after the packet header and before the packet payload. Splitting
> > > > is usually between the packet header that can be posted to a
> > > > dedicated buffer and the packet payload that can be posted to a
> different buffer.
> > > >
> > > > Currently, Rx buffer split supports length and offset based packet split.
> > > > Although header split is a subset of buffer split, configuring
> > > > buffer split based on length is not suitable for NICs that do
> > > > split based on header protocol types. Because tunneling makes the
> > > > conversion from length to protocol type impossible.
> > > >
> > > > This patch extends the current buffer split to support protocol
> > > > type and offset based header split. A new proto field is
> > > > introduced in the rte_eth_rxseg_split structure reserved field to
> > > > specify header protocol type. With Rx offload flag
> > > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT enabled and protocol type
> > > > configured, PMD will split the ingress packets into two separate
> > > > regions. Currently, both inner and outer
> > > > L2/L3/L4 level header split can be supported.
> > >
> > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some
> time ago
> > > to substitute bit-field header_split in struct rte_eth_rxmode. It
> > > allows to enable header split offload with the header size
> > > controlled using split_hdr_size in the same structure.
> > >
> > > Right now I see no single PMD which actually supports
> > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT with above definition.
> > > Many examples and test apps initialize the field to 0 explicitly.
> > > The most of drivers simply ignore split_hdr_size since the offload
> > > is not advertised, but some double-check that its value is 0.
> > >
> > > I think that it means that the field should be removed on the next
> > > LTS, and I'd say, together with the
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload bit.
> > >
> > > We should not redefine the offload meaning.
> >
> > Yes, you are right. No single PMD supports
> RTE_ETH_RX_OFFLOAD_HEADER_SPLIT now.
> > Previously, I used this flag is to distinguish buffer split and header split.
> > The former supports multi-segments split by length and offset.
> > The later supports two segments split by proto and offset.
> > At this level, header split is a subset of buffer split.
> >
> > Since we shouldn't redefine the meaning of this offload, I will use
> > the RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> > The existence of tunnel needs to define a proto field in buffer split,
> > because some PMDs do not support split based on length and offset.
> 
> Before doing anything, the first patch of this series should make the current
> status clearer.
> Example, this line does not explain what it does:
> 	uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
> And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old
> offload API")
> 
> If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a comment
> to start a deprecation.

Agree, I discussed with Andrew before that RTE_ETH_RX_OFFLOAD_HEADER_SPLIT
is no longer supported by any PMDs.

I can send a separate patch of header split deprecation notice in 22.07,
and start removing the code in 22.11. What do you think?

> 
> > > > For example, let's suppose we configured the Rx queue with the
> > > > following segments:
> > > >      seg0 - pool0, off0=2B
> > > >      seg1 - pool1, off1=128B
> > >
> > > Corresponding feature is named Rx buffer split.
> > > Does it mean that protocol type based header split requires Rx
> > > buffer split feature to be supported?
> >
> > Protocol type based header split does not requires Rx buffer split.
> > In previous design, the header split and buffer split are exclusive.
> > Because we only configure one split offload for one RX queue.
> 
> Things must be made clear and documented.

Thanks for your suggestion, the documents will be improved in v5.

> 
> > > > With header split type configured with
> > > > RTE_ETH_RX_HEADER_SPLIT_UDP, the packet consists of
> MAC_IP_UDP_PAYLOAD will be split like following:
> > > >      seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> > > pool0
> > > >      seg1 - payload @ 128 in mbuf from pool1
> > >
> > > Is it always outermost UDP? Does it require both UDP over IPv4 and
> > > UDP over
> > > IPv6 to be supported? What will happen if only one is supported? How
> > > application can find out which protocol stack are supported?
> >
> > Both inner and outer UDP are considered.
> > Current design does not distinguish UDP over IPv4 or IPv6.
> > If we want to support granularity like only IPv4 or IPv6 supported,
> > user need add more configurations.
> >
> > If application want to find out which protocol stack is supported, one
> > way I think is to expose the protocol stack supported by the driver through
> dev_info.
> > Any thoughts are welcomed :)
> [...]
> > > > +	uint16_t reserved; /**< Reserved field. */
> > >
> > > As far as I can see the structure is experimental. So, it should not
> > > be the problem to extend it, but it is a really good question raised
> > > by Stephen in RFC
> > > v1 discussion.
> > > Shouldn't we require that all reserved fields are initialized to
> > > zero and ignored on processing? Frankly speaking I always thought
> > > so, but failed to find the place were it is documented.
> >
> > Yes, it can be documented. By default is should be zero, and we can
> > configure it to enable protocol type based buffer split.
> >
> > > @Thomas, @David, @Ferruh?
> 
> Yes that's very important to have a clear state of the reserved fields.
> A value must be set and documented.

Ditto, thanks for your comments. :)

Regards,
Xuan

> 
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v5 0/3] ethdev: introduce protocol based buffer split
  2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
  2022-04-07 10:47     ` Andrew Rybchenko
  2022-04-07 13:26     ` Jerin Jacob
@ 2022-04-26 11:13     ` wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 1/4] lib/ethdev: introduce protocol type " wenxuanx.wu
                         ` (2 more replies)
  2 siblings, 3 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-26 11:13 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Protocol based buffer split consists of splitting a received packet into
two separate regions based on the packet content. It is useful in some
scenarios, such as GPU acceleration. The splitting will help to enable
true zero copy and hence improve the performance significantly.

This patchset aims to support protocol split based on current buffer split.
When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into two different mempools.

v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (3):
  ethdev: introduce protocol type based buffer split
  app/testpmd: add proto based buffer split config
  net/ice: support proto based buf split in Rx path

 app/test-pmd/cmdline.c                | 118 ++++++++++++++
 app/test-pmd/testpmd.c                |   7 +-
 app/test-pmd/testpmd.h                |   2 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 217 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  36 ++++-
 lib/ethdev/rte_ethdev.h               |  21 ++-
 9 files changed, 388 insertions(+), 42 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split
  2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
@ 2022-04-26 11:13       ` wenxuanx.wu
  2022-05-17 21:12         ` Thomas Monjalon
  2022-04-26 11:13       ` [PATCH v5 2/4] app/testpmd: add proto based buffer split config wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 3/4] net/ice: support proto based buf split in Rx path wenxuanx.wu
  2 siblings, 1 reply; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-26 11:13 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Protocol based buffer split consists of splitting a received packet into two
separate regions based on its content. The split happens after the packet
protocol header and before the packet payload. Splitting is usually between
the packet protocol header that can be posted to a dedicated buffer and the
packet payload that can be posted to a different buffer.

Currently, Rx buffer split supports length and offset based packet split.
protocol split is based on buffer split, configuring length of buffer split
is not suitable for NICs that do split based on protocol types. Because
tunneling makes the conversion from length to protocol type impossible.

This patch extends the current buffer split to support protocol and offset
based buffer split. A new proto field is introduced in the rte_eth_rxseg_split
structure reserved field to specify header protocol type. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol
type configured. PMD will split the ingress packets into two separate regions.
Currently, both inner and outer L2/L3/L4 level protocol based buffer split
can be supported.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, off0=2B
    seg1 - pool1, off1=128B

With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
    seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - payload @ 128 in mbuf from pool1

The memory attributes for the split parts may differ either - for example
the mempool0 and mempool1 belong to dpdk memory and external memory,
respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..1a2bc172ab 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto = rx_seg[seg_idx].proto;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto == 0) {
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Ensure n_seg is 2 in protocol based buffer split. */
+			if (n_seg != 2)	{
+				RTE_ETHDEV_LOG(ERR, "number of buffer split protocol segments should be 2.\n");
+				return -EINVAL;
+			}
+			/* Length and protocol are exclusive here, so make sure length is 0 in protocol
+			based buffer split. */
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in buffer split\n");
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..ef7f59aae6 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ *
+ * - The proto in the elements defines the split position of received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto should not be configured in length split. Zero default.
+ *
+ * - Protocol based buffer split:
+ *     - mp, offset, proto should be configured.
+ *     - The length should not be configured in protocol split. Zero default.
+ *
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto; /**< Protocol of buffer split, determines protocol split point. */
 };
 
 /**
@@ -1664,6 +1676,7 @@ struct rte_eth_conf {
 			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
 #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
 
+
 /*
  * If new Rx offload capabilities are defined, they also must be
  * mentioned in rte_rx_offload_names in rte_ethdev.c file.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v5 2/4] app/testpmd: add proto based buffer split config
  2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 1/4] lib/ethdev: introduce protocol type " wenxuanx.wu
@ 2022-04-26 11:13       ` wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 3/4] net/ice: support proto based buf split in Rx path wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-26 11:13 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds protocol based buffer split configuration in testpmd.
The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split.

Testpmd View:
testpmd>port config <port_id> rx_offload buffer_split on
testpmd>port config <port_id> buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp|
                    l4|inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
                    inner_udp|inner_sctp|inner_l4

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/cmdline.c | 118 +++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.c |   7 +--
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..5cd4beca95 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -866,6 +866,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
+			"port config <port_id> buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp|inner_l4\n"
+			"     Configure protocol type for buffer split"
+			" on all Rx queues of a port\n\n"
+
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
@@ -16353,6 +16359,117 @@ cmdline_parse_inst_t cmd_config_per_port_rx_offload = {
 	}
 };
 
+/* config a per port buffer split protocol */
+struct cmd_config_per_port_buffer_split_protocol_result {
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t config;
+	uint16_t port_id;
+	cmdline_fixed_string_t buffer_split;
+	cmdline_fixed_string_t protocol;
+};
+
+cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_port =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_buffer_split_protocol_result,
+		 port, "port");
+cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_config =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_buffer_split_protocol_result,
+		 config, "config");
+cmdline_parse_token_num_t cmd_config_per_port_buffer_split_protocol_result_port_id =
+	TOKEN_NUM_INITIALIZER
+		(struct cmd_config_per_port_buffer_split_protocol_result,
+		 port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_buffer_split =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_buffer_split_protocol_result,
+		 buffer_split, "buffer_split");
+cmdline_parse_token_string_t cmd_config_per_port_buffer_split_protocol_result_protocol =
+	TOKEN_STRING_INITIALIZER
+		(struct cmd_config_per_port_buffer_split_protocol_result,
+		 protocol, "mac#ipv4#ipv6#l3#tcp#udp#sctp#l4#"
+			   "inner_mac#inner_ipv4#inner_ipv6#inner_l3#inner_tcp#"
+			   "inner_udp#inner_sctp#inner_l4");
+
+static void
+cmd_config_per_port_buffer_split_protocol_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct cmd_config_per_port_buffer_split_protocol_result *res = parsed_result;
+	portid_t port_id = res->port_id;
+	struct rte_port *port = &ports[port_id];
+	uint32_t protocol;
+
+	if (port_id_is_invalid(port_id, ENABLED_WARN))
+		return;
+
+	if (port->port_status != RTE_PORT_STOPPED) {
+		fprintf(stderr,
+			"Error: Can't config offload when Port %d is not stopped\n",
+			port_id);
+		return;
+	}
+
+	if (!strcmp(res->protocol, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(res->protocol, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(res->protocol, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(res->protocol, "l3"))
+		protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(res->protocol, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(res->protocol, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(res->protocol, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(res->protocol, "l4"))
+		protocol = RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(res->protocol, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(res->protocol, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(res->protocol, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(res->protocol, "inner_l3"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4|RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(res->protocol, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(res->protocol, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(res->protocol, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(res->protocol, "inner_l4"))
+		protocol = RTE_PTYPE_INNER_L4_TCP|RTE_PTYPE_INNER_L4_UDP|RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", res->protocol);
+		return;
+	}
+
+	rx_pkt_buffer_split_proto = protocol;
+	rx_pkt_nb_segs = 2;
+
+	cmd_reconfig_device_queue(port_id, 1, 1);
+}
+
+cmdline_parse_inst_t cmd_config_per_port_buffer_split_protocol = {
+	.f = cmd_config_per_port_buffer_split_protocol_parsed,
+	.data = NULL,
+	.help_str = "port config <port_id> buffer_split mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+		    "inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+		    "inner_udp|inner_sctp|inner_l4",
+	.tokens = {
+		(void *)&cmd_config_per_port_buffer_split_protocol_result_port,
+		(void *)&cmd_config_per_port_buffer_split_protocol_result_config,
+		(void *)&cmd_config_per_port_buffer_split_protocol_result_port_id,
+		(void *)&cmd_config_per_port_buffer_split_protocol_result_buffer_split,
+		(void *)&cmd_config_per_port_buffer_split_protocol_result_protocol,
+		NULL,
+	}
+};
+
 /* Enable/Disable a per queue offloading */
 struct cmd_config_per_queue_rx_offload_result {
 	cmdline_fixed_string_t port;
@@ -18071,6 +18188,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_rx_offload_get_configuration,
 	(cmdline_parse_inst_t *)&cmd_config_per_port_rx_offload,
+	(cmdline_parse_inst_t *)&cmd_config_per_port_buffer_split_protocol,
 	(cmdline_parse_inst_t *)&cmd_config_per_queue_rx_offload,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_capa,
 	(cmdline_parse_inst_t *)&cmd_tx_offload_get_configuration,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..bd77d6bf10 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -253,6 +253,8 @@ uint8_t  tx_pkt_nb_segs = 1; /**< Number of segments in TXONLY packets */
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+uint32_t rx_pkt_buffer_split_proto;
+
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
@@ -2586,12 +2588,11 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+		rx_seg->length = rx_pkt_seg_lengths[i];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto = rx_pkt_buffer_split_proto;
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..707e1781d4 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -557,6 +557,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint32_t rx_pkt_buffer_split_proto;
+
 extern uint8_t txonly_multi_flow;
 
 extern uint32_t rxq_share;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v5 3/4] net/ice: support proto based buf split in Rx path
  2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 1/4] lib/ethdev: introduce protocol type " wenxuanx.wu
  2022-04-26 11:13       ` [PATCH v5 2/4] app/testpmd: add proto based buffer split config wenxuanx.wu
@ 2022-04-26 11:13       ` wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-04-26 11:13 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds support for proto based buffer split in normal Rx data
paths. When the Rx queue is configured with specific protocol type,
packets received will be directly splitted into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 219 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 216 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ce3f49c863 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..8cbcee3543 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,52 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case 0:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +441,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +449,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +456,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +506,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +805,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1139,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1151,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1173,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1653,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1623,6 +1708,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1714,7 +1817,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1727,6 +1832,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1735,13 +1849,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2350,11 +2472,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2382,12 +2506,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2400,24 +2528,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split
  2022-04-26 11:13       ` [PATCH v5 1/4] lib/ethdev: introduce protocol type " wenxuanx.wu
@ 2022-05-17 21:12         ` Thomas Monjalon
  2022-05-19 14:40           ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-05-17 21:12 UTC (permalink / raw)
  To: xuan.ding, yuanx.wang, wenxuanx.wu
  Cc: andrew.rybchenko, xiaoyun.li, ferruh.yigit, aman.deep.singh, dev,
	yuying.zhang, qi.z.zhang, jerinjacobk, stephen, mb, viacheslavo,
	ping.yu

Hello,

It seems you didn't try to address my main comment on v4:
"
Before doing anything, the first patch of this series should make
the current status clearer.
Example, this line does not explain what it does:
        uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/
And header_split has been removed in ab3ce1e0c193 ("ethdev: remove old offload API")

If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed,
let's add a comment to start a deprecation.
"

Also the comment from Andrew about removing limitation to 2 packets
is not addressed.

All the part about the protocols capability is missing here.

It is not encouraging.

26/04/2022 13:13, wenxuanx.wu@intel.com:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Protocol based buffer split consists of splitting a received packet into two
> separate regions based on its content. The split happens after the packet
> protocol header and before the packet payload. Splitting is usually between
> the packet protocol header that can be posted to a dedicated buffer and the
> packet payload that can be posted to a different buffer.
> 
> Currently, Rx buffer split supports length and offset based packet split.
> protocol split is based on buffer split, configuring length of buffer split
> is not suitable for NICs that do split based on protocol types.

Why? Is it impossible to support length split on Intel NIC?

> Because tunneling makes the conversion from length
> to protocol type impossible.

This is not a HW issue.
I agree on the need but that a different usage than length split.

> This patch extends the current buffer split to support protocol and offset
> based buffer split. A new proto field is introduced in the rte_eth_rxseg_split
> structure reserved field to specify header protocol type. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and corresponding protocol
> type configured. PMD will split the ingress packets into two separate regions.
> Currently, both inner and outer L2/L3/L4 level protocol based buffer split
> can be supported.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, off0=2B
>     seg1 - pool1, off1=128B
> 
> With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
> consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
>     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - payload @ 128 in mbuf from pool1

Not clear what is the calculation.

> The memory attributes for the split parts may differ either - for example
> the mempool0 and mempool1 belong to dpdk memory and external memory,
> respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
>  lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
>  2 files changed, 43 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..1a2bc172ab 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>  		uint32_t length = rx_seg[seg_idx].length;
>  		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto = rx_seg[seg_idx].proto;
>  
>  		if (mpl == NULL) {
>  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,34 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>  		}
>  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto == 0) {

Add a comment here, /* split at fixed length */

> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {

Add a comment here, /* split after specified protocol header */

> +			/* Ensure n_seg is 2 in protocol based buffer split. */
> +			if (n_seg != 2)	{

(should be a space, not a tab before brace)

Why do you limit the feature to 2 segments only?

> +				RTE_ETHDEV_LOG(ERR, "number of buffer split protocol segments should be 2.\n");
> +				return -EINVAL;
> +			}
> +			/* Length and protocol are exclusive here, so make sure length is 0 in protocol
> +			based buffer split. */
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in buffer split\n");
> +				return -EINVAL;
> +			}
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
[...]
> + * - The proto in the elements defines the split position of received packets.
> + *
>   * - If the length in the segment description element is zero
>   *   the actual buffer size will be deduced from the appropriate
>   *   memory pool properties.
> @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto should not be configured in length split. Zero default.
> + *
> + * - Protocol based buffer split:
> + *     - mp, offset, proto should be configured.
> + *     - The length should not be configured in protocol split. Zero default.

What means "Zero default"?
You should just ignore the non relevant field.

>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */

How do you manage ABI compatibility?
Was the reserved field initialized to 0 in previous versions?

> +	uint32_t proto; /**< Protocol of buffer split, determines protocol split point. */

What are the values for "proto"?

> @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
>  			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)
>  #define DEV_RX_OFFLOAD_VLAN RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
>  
> +

It looks to be an useless change.

>  /*
>   * If new Rx offload capabilities are defined, they also must be
>   * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> 






^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split
  2022-05-17 21:12         ` Thomas Monjalon
@ 2022-05-19 14:40           ` Ding, Xuan
  2022-05-26 14:58             ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-05-19 14:40 UTC (permalink / raw)
  To: Thomas Monjalon, Wang, YuanX, Wu, WenxuanX
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, mb,
	viacheslavo, Yu, Ping

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, May 18, 2022 5:12 AM
> To: Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; mb@smartsharesystems.com;
> viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> Subject: Re: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer
> split
> 
> Hello,
> 
> It seems you didn't try to address my main comment on v4:
> "
> Before doing anything, the first patch of this series should make the current
> status clearer.
> Example, this line does not explain what it does:
>         uint16_t split_hdr_size;  /**< hdr buf size (header_split enabled).*/ And
> header_split has been removed in ab3ce1e0c193 ("ethdev: remove old
> offload API")
> 
> If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a comment
> to start a deprecation.
> "

Thank you for the detailed review.

First of all, I agree that header split should be deprecated.
Since it is unrelated with buffer split, I was planning to send the deprecation notice
in 22.07 sometime later and start the deprecation in 22.11.

If you think it should be the first step, I will send the deprecation notice first.

> 
> Also the comment from Andrew about removing limitation to 2 packets is not
> addressed.

Secondly, it is said that the protocol based buffer split will divide the packet into two segments.
Because I thought it will only be used in the split between header and payload.

In fact, protocol based buffer split can support multi-segment split.
That is to say, like length-based buffer split, we define a series of protos,
as the split point of protocol based buffer split. And this series of protos,
like lengths, indicate the split location.

For example, a packet consists of MAC/IPV4/UDP/payload.
If we define the buffer split proto with IPV4, and UDP, the packet will be
split into three segments:
seg0: MAC and IPV4 header, 34 bytes
seg1: UDP header, 8 bytes
seg2: Payload, the actual payload size

What do you think of this design?

> 
> All the part about the protocols capability is missing here.

Yes, I missed the protocols capability with RTE_PTYPE* now.
I will update the doc with supported protocol capability in v6.

> 
> It is not encouraging.
> 
> 26/04/2022 13:13, wenxuanx.wu@intel.com:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Protocol based buffer split consists of splitting a received packet
> > into two separate regions based on its content. The split happens
> > after the packet protocol header and before the packet payload.
> > Splitting is usually between the packet protocol header that can be
> > posted to a dedicated buffer and the packet payload that can be posted to
> a different buffer.
> >
> > Currently, Rx buffer split supports length and offset based packet split.
> > protocol split is based on buffer split, configuring length of buffer
> > split is not suitable for NICs that do split based on protocol types.
> 
> Why? Is it impossible to support length split on Intel NIC?

Yes, our NIC supports split based on protocol types. And I think there are other vendors too.
The existence of tunneling results in the composition of a packet is various.
Given a changeable length, it is impossible to tell the driver a fixed protocol type.

> 
> > Because tunneling makes the conversion from length to protocol type
> > impossible.
> 
> This is not a HW issue.
> I agree on the need but that a different usage than length split.

I think the new usage can solve this problem, so that length split
and proto split can have the same result.

> 
> > This patch extends the current buffer split to support protocol and
> > offset based buffer split. A new proto field is introduced in the
> > rte_eth_rxseg_split structure reserved field to specify header
> > protocol type. With Rx queue offload
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> > enabled and corresponding protocol type configured. PMD will split the
> ingress packets into two separate regions.
> > Currently, both inner and outer L2/L3/L4 level protocol based buffer
> > split can be supported.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, off0=2B
> >     seg1 - pool1, off1=128B
> >
> > With protocol split type configured with RTE_PTYPE_L4_UDP. The packet
> > consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
> >     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >     seg1 - payload @ 128 in mbuf from pool1
> 
> Not clear what is the calculation.

The previous usage of protocol based split is the split between header and payload.
Here for a packet composed of MAC_IP_UDP_PAYLOAD, with protocol split type RTE_PTYPE_L4_UDP
configured, it means split between the UDP header and payload.
In length configuration, the proto = RTE_PTYPE_L4_UDP means length = 42B.

> 
> > The memory attributes for the split parts may differ either - for
> > example the mempool0 and mempool1 belong to dpdk memory and
> external
> > memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >  lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
> >  lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
> >  2 files changed, 43 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..1a2bc172ab 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >  		uint32_t length = rx_seg[seg_idx].length;
> >  		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto = rx_seg[seg_idx].proto;
> >
> >  		if (mpl == NULL) {
> >  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13
> > +1695,34 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >  		}
> >  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +		if (proto == 0) {
> 
> Add a comment here, /* split at fixed length */

Thanks for the suggestion, will add it in next version.

> 
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> 
> Add a comment here, /* split after specified protocol header */

Thanks for the suggestion, will add it in next version.

> 
> > +			/* Ensure n_seg is 2 in protocol based buffer split. */
> > +			if (n_seg != 2)	{
> 
> (should be a space, not a tab before brace)

Get it.

> 
> Why do you limit the feature to 2 segments only?

Please see the new usage explained above.

> 
> > +				RTE_ETHDEV_LOG(ERR, "number of buffer
> split protocol segments should be 2.\n");
> > +				return -EINVAL;
> > +			}
> > +			/* Length and protocol are exclusive here, so make
> sure length is 0 in protocol
> > +			based buffer split. */
> > +			if (length != 0) {
> > +				RTE_ETHDEV_LOG(ERR, "segment length
> should be set to zero in buffer split\n");
> > +				return -EINVAL;
> > +			}
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> [...]
> > + * - The proto in the elements defines the split position of received packets.
> > + *
> >   * - If the length in the segment description element is zero
> >   *   the actual buffer size will be deduced from the appropriate
> >   *   memory pool properties.
> > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> >   *     - pool from the last valid element
> >   *     - the buffer size from this pool
> >   *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto should not be configured in length split. Zero default.
> > + *
> > + * - Protocol based buffer split:
> > + *     - mp, offset, proto should be configured.
> > + *     - The length should not be configured in protocol split. Zero default.
> 
> What means "Zero default"?
> You should just ignore the non relevant field.

Yes, you are right, the none relevant field should be just ignored.
I will update the doc in v6.

> 
> >  struct rte_eth_rxseg_split {
> >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >  	uint16_t length; /**< Segment data length, configures split point. */
> >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> 
> How do you manage ABI compatibility?
> Was the reserved field initialized to 0 in previous versions?

I think we reached an agreement in RFC v1. There is no document for the reserved
field in the previous release. And it is always initialized to zero in real cases.
And now splitting based on fixed length and protocol header parsing is exclusive, 
we can ignore the none relevant field.

> 
> > +	uint32_t proto; /**< Protocol of buffer split, determines protocol
> > +split point. */
> 
> What are the values for "proto"?

Yes, I missed the protocol capability here, will fix it in v6.

> 
> > @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
> >  			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)  #define
> DEV_RX_OFFLOAD_VLAN
> > RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN) RTE_ETH_RX_OFFLOAD_VLAN
> >
> > +
> 
> It looks to be an useless change.

Thanks for the catch, will fix it in next version.

Thanks again for your time.

Regards,
Xuan

> 
> >  /*
> >   * If new Rx offload capabilities are defined, they also must be
> >   * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> >
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer split
  2022-05-19 14:40           ` Ding, Xuan
@ 2022-05-26 14:58             ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-05-26 14:58 UTC (permalink / raw)
  To: Ding, Xuan, Thomas Monjalon, Wang, YuanX, Wu, WenxuanX
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, mb,
	viacheslavo, Yu, Ping

Hi,

> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Thursday, May 19, 2022 10:40 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Wang, YuanX
> <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; mb@smartsharesystems.com;
> viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> Subject: RE: [PATCH v5 1/4] lib/ethdev: introduce protocol type based buffer
> split
> 
> Hi Thomas,
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Wednesday, May 18, 2022 5:12 AM
> > To: Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> > <yuanx.wang@intel.com>; Wu, WenxuanX <wenxuanx.wu@intel.com>
> > Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> > ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> > dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> > stephen@networkplumber.org; mb@smartsharesystems.com;
> > viacheslavo@nvidia.com; Yu, Ping <ping.yu@intel.com>
> > Subject: Re: [PATCH v5 1/4] lib/ethdev: introduce protocol type based
> > buffer split
> >
> > Hello,
> >
> > It seems you didn't try to address my main comment on v4:
> > "
> > Before doing anything, the first patch of this series should make the
> > current status clearer.
> > Example, this line does not explain what it does:
> >         uint16_t split_hdr_size;  /**< hdr buf size (header_split
> > enabled).*/ And header_split has been removed in ab3ce1e0c193
> > ("ethdev: remove old offload API")
> >
> > If RTE_ETH_RX_OFFLOAD_HEADER_SPLIT is not needed, let's add a
> comment
> > to start a deprecation.
> > "
> 
> Thank you for the detailed review.
> 
> First of all, I agree that header split should be deprecated.
> Since it is unrelated with buffer split, I was planning to send the deprecation
> notice in 22.07 sometime later and start the deprecation in 22.11.
> 
> If you think it should be the first step, I will send the deprecation notice first.
> 
> >
> > Also the comment from Andrew about removing limitation to 2 packets is
> > not addressed.
> 
> Secondly, it is said that the protocol based buffer split will divide the packet
> into two segments.
> Because I thought it will only be used in the split between header and
> payload.
> 
> In fact, protocol based buffer split can support multi-segment split.
> That is to say, like length-based buffer split, we define a series of protos, as
> the split point of protocol based buffer split. And this series of protos, like
> lengths, indicate the split location.
> 
> For example, a packet consists of MAC/IPV4/UDP/payload.
> If we define the buffer split proto with IPV4, and UDP, the packet will be split
> into three segments:
> seg0: MAC and IPV4 header, 34 bytes
> seg1: UDP header, 8 bytes
> seg2: Payload, the actual payload size
> 
> What do you think of this design?
> 
> >
> > All the part about the protocols capability is missing here.
> 
> Yes, I missed the protocols capability with RTE_PTYPE* now.
> I will update the doc with supported protocol capability in v6.
> 
> >
> > It is not encouraging.
> >
> > 26/04/2022 13:13, wenxuanx.wu@intel.com:
> > > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> > >
> > > Protocol based buffer split consists of splitting a received packet
> > > into two separate regions based on its content. The split happens
> > > after the packet protocol header and before the packet payload.
> > > Splitting is usually between the packet protocol header that can be
> > > posted to a dedicated buffer and the packet payload that can be
> > > posted to
> > a different buffer.
> > >
> > > Currently, Rx buffer split supports length and offset based packet split.
> > > protocol split is based on buffer split, configuring length of
> > > buffer split is not suitable for NICs that do split based on protocol types.
> >
> > Why? Is it impossible to support length split on Intel NIC?
> 
> Yes, our NIC supports split based on protocol types. And I think there are
> other vendors too.
> The existence of tunneling results in the composition of a packet is various.
> Given a changeable length, it is impossible to tell the driver a fixed protocol
> type.
> 
> >
> > > Because tunneling makes the conversion from length to protocol type
> > > impossible.
> >
> > This is not a HW issue.
> > I agree on the need but that a different usage than length split.
> 
> I think the new usage can solve this problem, so that length split and proto
> split can have the same result.
> 
> >
> > > This patch extends the current buffer split to support protocol and
> > > offset based buffer split. A new proto field is introduced in the
> > > rte_eth_rxseg_split structure reserved field to specify header
> > > protocol type. With Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
> > > enabled and corresponding protocol type configured. PMD will split
> > > the
> > ingress packets into two separate regions.
> > > Currently, both inner and outer L2/L3/L4 level protocol based buffer
> > > split can be supported.
> > >
> > > For example, let's suppose we configured the Rx queue with the
> > > following segments:
> > >     seg0 - pool0, off0=2B
> > >     seg1 - pool1, off1=128B
> > >
> > > With protocol split type configured with RTE_PTYPE_L4_UDP. The
> > > packet consists of MAC_IP_UDP_PAYLOAD will be splitted like following:
> > >     seg0 - udp header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> > >     seg1 - payload @ 128 in mbuf from pool1
> >
> > Not clear what is the calculation.
> 
> The previous usage of protocol based split is the split between header and
> payload.
> Here for a packet composed of MAC_IP_UDP_PAYLOAD, with protocol split
> type RTE_PTYPE_L4_UDP configured, it means split between the UDP header
> and payload.
> In length configuration, the proto = RTE_PTYPE_L4_UDP means length = 42B.
> 
> >
> > > The memory attributes for the split parts may differ either - for
> > > example the mempool0 and mempool1 belong to dpdk memory and
> > external
> > > memory, respectively.
> > >
> > > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> > >  lib/ethdev/rte_ethdev.c | 36 +++++++++++++++++++++++++++++-------
> > >  lib/ethdev/rte_ethdev.h | 15 ++++++++++++++-
> > >  2 files changed, 43 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > > 29a3d80466..1a2bc172ab 100644
> > > --- a/lib/ethdev/rte_ethdev.c
> > > +++ b/lib/ethdev/rte_ethdev.c
> > > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> > >  		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> > >  		uint32_t length = rx_seg[seg_idx].length;
> > >  		uint32_t offset = rx_seg[seg_idx].offset;
> > > +		uint32_t proto = rx_seg[seg_idx].proto;
> > >
> > >  		if (mpl == NULL) {
> > >  			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> > @@ -1694,13
> > > +1695,34 @@ rte_eth_rx_queue_check_split(const struct
> > rte_eth_rxseg_split *rx_seg,
> > >  		}
> > >  		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> > >  		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > > -		length = length != 0 ? length : *mbp_buf_size;
> > > -		if (*mbp_buf_size < length + offset) {
> > > -			RTE_ETHDEV_LOG(ERR,
> > > -				       "%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > > -				       mpl->name, *mbp_buf_size,
> > > -				       length + offset, length, offset);
> > > -			return -EINVAL;
> > > +		if (proto == 0) {
> >
> > Add a comment here, /* split at fixed length */
> 
> Thanks for the suggestion, will add it in next version.
> 
> >
> > > +			length = length != 0 ? length : *mbp_buf_size;
> > > +			if (*mbp_buf_size < length + offset) {
> > > +				RTE_ETHDEV_LOG(ERR,
> > > +					"%s mbuf_data_room_size %u < %u
> > (segment length=%u + segment offset=%u)\n",
> > > +					mpl->name, *mbp_buf_size,
> > > +					length + offset, length, offset);
> > > +				return -EINVAL;
> > > +			}
> > > +		} else {
> >
> > Add a comment here, /* split after specified protocol header */
> 
> Thanks for the suggestion, will add it in next version.
> 
> >
> > > +			/* Ensure n_seg is 2 in protocol based buffer split. */
> > > +			if (n_seg != 2)	{
> >
> > (should be a space, not a tab before brace)
> 
> Get it.
> 
> >
> > Why do you limit the feature to 2 segments only?
> 
> Please see the new usage explained above.
> 
> >
> > > +				RTE_ETHDEV_LOG(ERR, "number of buffer
> > split protocol segments should be 2.\n");
> > > +				return -EINVAL;
> > > +			}
> > > +			/* Length and protocol are exclusive here, so make
> > sure length is 0 in protocol
> > > +			based buffer split. */
> > > +			if (length != 0) {
> > > +				RTE_ETHDEV_LOG(ERR, "segment length
> > should be set to zero in buffer split\n");
> > > +				return -EINVAL;
> > > +			}
> > > +			if (*mbp_buf_size < offset) {
> > > +				RTE_ETHDEV_LOG(ERR,
> > > +						"%s
> > mbuf_data_room_size %u < %u segment offset)\n",
> > > +						mpl->name, *mbp_buf_size,
> > > +						offset);
> > > +				return -EINVAL;
> > [...]
> > > + * - The proto in the elements defines the split position of received
> packets.
> > > + *
> > >   * - If the length in the segment description element is zero
> > >   *   the actual buffer size will be deduced from the appropriate
> > >   *   memory pool properties.
> > > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> > >   *     - pool from the last valid element
> > >   *     - the buffer size from this pool
> > >   *     - zero offset
> > > + *
> > > + * - Length based buffer split:
> > > + *     - mp, length, offset should be configured.
> > > + *     - The proto should not be configured in length split. Zero default.
> > > + *
> > > + * - Protocol based buffer split:
> > > + *     - mp, offset, proto should be configured.
> > > + *     - The length should not be configured in protocol split. Zero default.
> >
> > What means "Zero default"?
> > You should just ignore the non relevant field.
> 
> Yes, you are right, the none relevant field should be just ignored.
> I will update the doc in v6.

Sorry for replying myself.
After consideration, I found it is hard to just ignore the non-relevant field.
Because when length and proto both exist, it is hard to decide whether it is
length based buffer split or protocol based buffer split, which affects the
check in rte_eth_rx_queue_check_split().

So I would like to keep the current design in v6. When choosing one mode of
buffer split, the non-relevant field should not be configured.

Hope to get your feedbacks. :)

Regards,
Xuan

> 
> >
> > >  struct rte_eth_rxseg_split {
> > >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> > from. */
> > >  	uint16_t length; /**< Segment data length, configures split point. */
> > >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> > */
> > > -	uint32_t reserved; /**< Reserved field. */
> >
> > How do you manage ABI compatibility?
> > Was the reserved field initialized to 0 in previous versions?
> 
> I think we reached an agreement in RFC v1. There is no document for the
> reserved field in the previous release. And it is always initialized to zero in
> real cases.
> And now splitting based on fixed length and protocol header parsing is
> exclusive, we can ignore the none relevant field.
> 
> >
> > > +	uint32_t proto; /**< Protocol of buffer split, determines protocol
> > > +split point. */
> >
> > What are the values for "proto"?
> 
> Yes, I missed the protocol capability here, will fix it in v6.
> 
> >
> > > @@ -1664,6 +1676,7 @@ struct rte_eth_conf {
> > >  			     RTE_ETH_RX_OFFLOAD_QINQ_STRIP)  #define
> > DEV_RX_OFFLOAD_VLAN
> > > RTE_DEPRECATED(DEV_RX_OFFLOAD_VLAN)
> RTE_ETH_RX_OFFLOAD_VLAN
> > >
> > > +
> >
> > It looks to be an useless change.
> 
> Thanks for the catch, will fix it in next version.
> 
> Thanks again for your time.
> 
> Regards,
> Xuan
> 
> >
> > >  /*
> > >   * If new Rx offload capabilities are defined, they also must be
> > >   * mentioned in rte_rx_offload_names in rte_ethdev.c file.
> > >
> >
> >
> >
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v6] ethdev: introduce protocol header based buffer split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (4 preceding siblings ...)
  2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-05-27  7:54 ` xuan.ding
  2022-05-27  8:14 ` [PATCH v6 0/1] ethdev: introduce protocol " xuan.ding
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 88+ messages in thread
From: xuan.ding @ 2022-05-27  7:54 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, mdr
  Cc: stephen, mb, dev, qi.z.zhang, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 8520aec561..20c1c246ce 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v6 0/1] ethdev: introduce protocol based buffer split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (5 preceding siblings ...)
  2022-05-27  7:54 ` [PATCH v6] ethdev: introduce protocol header based buffer split xuan.ding
@ 2022-05-27  8:14 ` xuan.ding
  2022-05-27  8:14   ` [PATCH v6 1/1] ethdev: introduce protocol header " xuan.ding
  2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 88+ messages in thread
From: xuan.ding @ 2022-05-27  8:14 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, mdr; +Cc: stephen, mb, dev, qi.z.zhang, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

Protocol based buffer split consists of splitting a received packet into
several separate segments based on the packet content. It is useful in some
scenarios, such as GPU acceleration. The splitting will help to enable
true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

v5->v6:
* The header split deprecation notice is sent.
* Refine the documents, protocol header based buffer split can actually
support multi-segment split.
* Add buffer split protocol header capability.
* Fix some format issues.

v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (1):
  ethdev: introduce protocol header based buffer split

 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v6 1/1] ethdev: introduce protocol header based buffer split
  2022-05-27  8:14 ` [PATCH v6 0/1] ethdev: introduce protocol " xuan.ding
@ 2022-05-27  8:14   ` xuan.ding
  2022-05-30  9:43     ` Ray Kinsella
  0 siblings, 1 reply; 88+ messages in thread
From: xuan.ding @ 2022-05-27  8:14 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, mdr
  Cc: stephen, mb, dev, qi.z.zhang, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 8520aec561..20c1c246ce 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v6 1/1] ethdev: introduce protocol header based buffer split
  2022-05-27  8:14   ` [PATCH v6 1/1] ethdev: introduce protocol header " xuan.ding
@ 2022-05-30  9:43     ` Ray Kinsella
  0 siblings, 0 replies; 88+ messages in thread
From: Ray Kinsella @ 2022-05-30  9:43 UTC (permalink / raw)
  To: xuan.ding
  Cc: thomas, andrew.rybchenko, stephen, mb, dev, qi.z.zhang,
	Wenxuan Wu, Yuan Wang


xuan.ding@intel.com writes:

> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
>
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given a arbitrarily variable length in Rx packet
> segment, it is almost impossible to pass a fixed protocol header to PMD.
> Besides, the existence of tunneling results in the composition of a packet
> is various, which makes the situation even worse.
>
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, PMD will split the ingress packets into
> multiple segments.
>
> struct rte_eth_rxseg_split {
>
>         struct rte_mempool *mp; /* memory pools to allocate segment from */
>         uint16_t length; /* segment maximal data length,
>                             configures "split point" */
>         uint16_t offset; /* data offset from beginning
>                             of mbuf data buffer */
>         uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> 			       configures "split point" */
>     };
>
> Both inner and outer L2/L3/L4 level protocol header split can be supported.
> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
> RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>     seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>     seg2 - pool2, off1=0B
>
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
>     seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - udp header @ 128 in mbuf from pool1
>     seg2 - payload @ 0 in mbuf from pool2
>
> Now buffet split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
>
> The split limitations imposed by underlying PMD is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
>  lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
>  2 files changed, 60 insertions(+), 8 deletions(-)
>

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (6 preceding siblings ...)
  2022-05-27  8:14 ` [PATCH v6 0/1] ethdev: introduce protocol " xuan.ding
@ 2022-06-01 13:06 ` wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
                     ` (2 more replies)
  2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:06 UTC (permalink / raw)
  To: dev, qi.z.zhang, jerinjacobk, xiaoyun.li, aman.deep.singh, yuying.zhang
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Buffer split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The splitting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support multi protocol
headers split. When Rx queue is configured with buffer split feature,
packets received will be directly splitted into two different mempools.

v6->v7:
*fix supported header protocol check.
*add rxhdrs commands and parameters.
v5->v6:
*Change doc and designi of struct rte_eth_rxseg_split, support multi
segments protocol_hdr configuration.
v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (3):
  ethdev: introduce protocol header based buffer split
  net/ice: support buffer split in Rx path
  app/testpmd: add rxhdrs commands and parameters

 app/test-pmd/cmdline.c                | 127 ++++++++++++++-
 app/test-pmd/config.c                 |  81 ++++++++++
 app/test-pmd/parameters.c             |  15 +-
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   6 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  40 ++++-
 lib/ethdev/rte_ethdev.h               |  28 +++-
 11 files changed, 505 insertions(+), 47 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 1/3] ethdev: introduce protocol header based buffer split
  2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-01 13:06   ` wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:06 UTC (permalink / raw)
  To: dev, qi.z.zhang, jerinjacobk, xiaoyun.li, aman.deep.singh, yuying.zhang
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang,
	wenxuanx.wu, Ray Kinsella

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..be161ff999 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 2/3] net/ice: support buffer split in Rx path
  2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
@ 2022-06-01 13:06   ` wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:06 UTC (permalink / raw)
  To: dev, qi.z.zhang, jerinjacobk, xiaoyun.li, aman.deep.singh, yuying.zhang
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds support for proto based buffer split in normal Rx data
paths. When the Rx queue is configured with specific protocol type,
packets received will be directly splitted into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |   2 +-
 5 files changed, 218 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ce3f49c863 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..77ab258f7f 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,53 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+		case RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case 0:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +442,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +450,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +457,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +806,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1174,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1654,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1623,6 +1709,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1714,7 +1818,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1727,6 +1833,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1735,13 +1850,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2350,11 +2473,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2382,12 +2507,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2400,24 +2529,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index be161ff999..fbd55cdd9d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1707,7 +1707,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 			}
 		} else {
 			/* Split after specified protocol header. */
-			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+			if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
 				RTE_ETHDEV_LOG(ERR,
 					"Protocol header %u not supported)\n",
 					proto_hdr);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters
  2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
  2022-06-01 13:06   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
@ 2022-06-01 13:06   ` wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:06 UTC (permalink / raw)
  To: dev, qi.z.zhang, jerinjacobk, xiaoyun.li, aman.deep.singh, yuying.zhang
  Cc: stephen, mb, viacheslavo, ping.yu, xuan.ding, yuanx.wang, wenxuanx.wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interative mode command:
testpmd>set rxhdrs mac,ipv4,l3,tcp,udp,sctp
(protocol sequence and nb_segs should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split.e.g. set rxhdrs mac,ipv4
(Supported protocols: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|inner_mac|
			     inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
                    	     inner_udp|inner_sctp|inner_l4)
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/cmdline.c    | 127 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  81 ++++++++++++++++++++++++
 app/test-pmd/parameters.c |  15 ++++-
 app/test-pmd/testpmd.c    |   6 +-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 228 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..52e98e1c06 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -316,6 +316,15 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3617,6 +3626,72 @@ cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "l3"))
+		protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "l4"))
+		protocol = RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_l3"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4|RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", value);
+		return 0;
+	}
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	set_rx_pkt_hdrs(parsed_items, nb_item);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3986,6 +4061,49 @@ cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t seg_hdrs;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->seg_hdrs, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs >= 1)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+cmdline_parse_token_string_t cmd_set_rxhdrs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxhdrs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 rxhdrs, "rxhdrs");
+cmdline_parse_token_string_t cmd_set_rxhdrs_seg_hdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 seg_hdrs, NULL);
+cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_keyword,
+		(void *)&cmd_set_rxhdrs_name,
+		(void *)&cmd_set_rxhdrs_seg_hdrs,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -8058,6 +8176,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -8070,12 +8190,12 @@ cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -17833,6 +17953,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cc8e7aa138..742473456a 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4757,6 +4757,87 @@ show_rx_pkt_segments(void)
 		printf("%hu\n", rx_pkt_seg_lengths[i]);
 	}
 }
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L3_IPV6|RTE_PTYPE_L3_IPV4:
+		return "l3";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP:
+		return "l4";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unknown";
+	}
+}
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("%s\n", rx_pkt_hdr_protos[i] == 0 ? "payload" :
+						get_ptype_str(rx_pkt_hdr_protos[i]));
+	}
+}
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (!(seg_hdrs[i] & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
+			printf("ptype [%u]=%u > is not supported - give up\n",
+			       i, seg_hdrs[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t) seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = (uint8_t) nb_segs + 1;
+}
 
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index daf6a31b2b..f86d626276 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -161,6 +161,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -673,6 +674,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1327,7 +1329,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1337,6 +1338,18 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..77379b7aa9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -240,6 +240,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2586,12 +2587,11 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+		rx_seg->length = rx_pkt_seg_lengths[i];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..e791b9becd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -534,6 +534,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -864,6 +865,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 void prompt(void);
@@ -1018,6 +1022,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (7 preceding siblings ...)
  2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-01 13:22 ` wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
                     ` (2 more replies)
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
  10 siblings, 3 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:22 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Buffer split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The splitting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support multi protocol
headers split. When Rx queue is configured with buffer split feature,
packets received will be directly splitted into two different mempools.

v6->v7:
*fix supported header protocol check.
*add rxhdrs commands and parameters.
v5->v6:
*Change doc and designi of struct rte_eth_rxseg_split, support multi
segments protocol_hdr configuration.
v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (3):
  ethdev: introduce protocol header based buffer split
  net/ice: support buffer split in Rx path
  app/testpmd: add rxhdrs commands and parameters

 app/test-pmd/cmdline.c                | 127 ++++++++++++++-
 app/test-pmd/config.c                 |  81 ++++++++++
 app/test-pmd/parameters.c             |  15 +-
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   6 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  40 ++++-
 lib/ethdev/rte_ethdev.h               |  28 +++-
 11 files changed, 505 insertions(+), 47 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 1/3] ethdev: introduce protocol header based buffer split
  2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-01 13:22   ` wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:22 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang, Ray Kinsella

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..be161ff999 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 2/3] net/ice: support buffer split in Rx path
  2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
@ 2022-06-01 13:22   ` wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:22 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds support for proto based buffer split in normal Rx data
paths. When the Rx queue is configured with specific protocol type,
packets received will be directly splitted into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |   2 +-
 5 files changed, 218 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ce3f49c863 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..77ab258f7f 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,53 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+		case RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case 0:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +442,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +450,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +457,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +806,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1174,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1654,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1623,6 +1709,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1714,7 +1818,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1727,6 +1833,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1735,13 +1850,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2350,11 +2473,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2382,12 +2507,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2400,24 +2529,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index be161ff999..fbd55cdd9d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1707,7 +1707,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 			}
 		} else {
 			/* Split after specified protocol header. */
-			if (proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK) {
+			if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
 				RTE_ETHDEV_LOG(ERR,
 					"Protocol header %u not supported)\n",
 					proto_hdr);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters
  2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
  2022-06-01 13:22   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
@ 2022-06-01 13:22   ` wenxuanx.wu
  2 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:22 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interative mode command:
testpmd>set rxhdrs mac,ipv4,l3,tcp,udp,sctp
(protocol sequence and nb_segs should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split.e.g. set rxhdrs mac,ipv4
(Supported protocols: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|inner_mac|
			     inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
                    	     inner_udp|inner_sctp|inner_l4)
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/cmdline.c    | 127 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  81 ++++++++++++++++++++++++
 app/test-pmd/parameters.c |  15 ++++-
 app/test-pmd/testpmd.c    |   6 +-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 228 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..52e98e1c06 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -316,6 +316,15 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3617,6 +3626,72 @@ cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "l3"))
+		protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "l4"))
+		protocol = RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_l3"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4|RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", value);
+		return 0;
+	}
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	set_rx_pkt_hdrs(parsed_items, nb_item);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3986,6 +4061,49 @@ cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t seg_hdrs;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->seg_hdrs, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs >= 1)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+cmdline_parse_token_string_t cmd_set_rxhdrs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxhdrs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 rxhdrs, "rxhdrs");
+cmdline_parse_token_string_t cmd_set_rxhdrs_seg_hdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 seg_hdrs, NULL);
+cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_keyword,
+		(void *)&cmd_set_rxhdrs_name,
+		(void *)&cmd_set_rxhdrs_seg_hdrs,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -8058,6 +8176,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -8070,12 +8190,12 @@ cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -17833,6 +17953,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cc8e7aa138..742473456a 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4757,6 +4757,87 @@ show_rx_pkt_segments(void)
 		printf("%hu\n", rx_pkt_seg_lengths[i]);
 	}
 }
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L3_IPV6|RTE_PTYPE_L3_IPV4:
+		return "l3";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP:
+		return "l4";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unknown";
+	}
+}
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("%s\n", rx_pkt_hdr_protos[i] == 0 ? "payload" :
+						get_ptype_str(rx_pkt_hdr_protos[i]));
+	}
+}
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (!(seg_hdrs[i] & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
+			printf("ptype [%u]=%u > is not supported - give up\n",
+			       i, seg_hdrs[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t) seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = (uint8_t) nb_segs + 1;
+}
 
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index daf6a31b2b..f86d626276 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -161,6 +161,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -673,6 +674,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1327,7 +1329,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1337,6 +1338,18 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..77379b7aa9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -240,6 +240,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2586,12 +2587,11 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+		rx_seg->length = rx_pkt_seg_lengths[i];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..e791b9becd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -534,6 +534,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -864,6 +865,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 void prompt(void);
@@ -1018,6 +1022,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v8 0/3] ethdev: introduce protocol type based header split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (8 preceding siblings ...)
  2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-01 13:50 ` wenxuanx.wu
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
                     ` (4 more replies)
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
  10 siblings, 5 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:50 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Buffer split consists of splitting a received packet into two separate
regions based on the packet content. It is useful in some scenarios,
such as GPU acceleration. The splitting will help to enable true zero
copy and hence improve the performance significantly.

This patchset extends the current buffer split to support multi protocol
headers split. When Rx queue is configured with buffer split feature,
packets received will be directly splitted into two different mempools.

v6->v7:
*fix supported header protocol check.
*add rxhdrs commands and parameters.
v5->v6:
*Change doc and designi of struct rte_eth_rxseg_split, support multi
segments protocol_hdr configuration.
v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (3):
  ethdev: introduce protocol header based buffer split
  net/ice: support buffer split in Rx path
  app/testpmd: add rxhdrs commands and parameters

 app/test-pmd/cmdline.c                | 127 ++++++++++++++-
 app/test-pmd/config.c                 |  81 ++++++++++
 app/test-pmd/parameters.c             |  15 +-
 app/test-pmd/testpmd.c                |   6 +-
 app/test-pmd/testpmd.h                |   6 +
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 lib/ethdev/rte_ethdev.c               |  40 ++++-
 lib/ethdev/rte_ethdev.h               |  28 +++-
 11 files changed, 505 insertions(+), 47 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-01 13:50   ` wenxuanx.wu
  2022-06-02 13:20     ` Andrew Rybchenko
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol header " wenxuanx.wu
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:50 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang, Ray Kinsella

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..fbd55cdd9d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v8 1/3] ethdev: introduce protocol header based buffer split
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
@ 2022-06-01 13:50   ` wenxuanx.wu
  2022-06-02 13:20     ` Andrew Rybchenko
  2022-06-01 13:50   ` [PATCH v8 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:50 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang, Ray Kinsella

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given a arbitrarily variable length in Rx packet
segment, it is almost impossible to pass a fixed protocol header to PMD.
Besides, the existence of tunneling results in the composition of a packet
is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, PMD will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

Both inner and outer L2/L3/L4 level protocol header split can be supported.
Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffet split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying PMD is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
 lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..fbd55cdd9d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else {
+			/* Split after specified protocol header. */
+			if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header %u not supported)\n",
+					proto_hdr);
+				return -EINVAL;
+			}
+
+			if (length != 0) {
+				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
+					       "based buffer split\n");
+				return -EINVAL;
+			}
+
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..0cd9dd6cc0 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
  *   mbuf) the following data will be pushed to the next segment
  *   up to its own length, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - If the length in the segment description element is zero
  *   the actual buffer size will be deduced from the appropriate
  *   memory pool properties.
@@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
 };
 
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
 /**
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v8 2/3] net/ice: support buffer split in Rx path
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol header " wenxuanx.wu
@ 2022-06-01 13:50   ` wenxuanx.wu
  2022-06-01 13:50   ` [PATCH v8 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
  2022-06-02 13:20   ` [PATCH v8 0/3] ethdev: introduce protocol type based header split Andrew Rybchenko
  4 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:50 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds support for proto based buffer split in normal Rx data
paths. When the Rx queue is configured with specific protocol type,
packets received will be directly splitted into protocol header and
payload parts. And the two parts will be put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  10 +-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 217 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..ce3f49c863 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3713,7 +3713,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3726,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3795,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..77ab258f7f 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,53 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+		case RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case 0:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +442,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +450,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +457,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +806,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1174,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1654,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1623,6 +1709,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1714,7 +1818,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1727,6 +1833,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1735,13 +1850,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2350,11 +2473,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2382,12 +2507,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2400,24 +2529,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v8 3/3] app/testpmd: add rxhdrs commands and parameters
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
                     ` (2 preceding siblings ...)
  2022-06-01 13:50   ` [PATCH v8 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
@ 2022-06-01 13:50   ` wenxuanx.wu
  2022-06-02 13:20   ` [PATCH v8 0/3] ethdev: introduce protocol type based header split Andrew Rybchenko
  4 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-01 13:50 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interative mode command:
testpmd>set rxhdrs mac,ipv4,l3,tcp,udp,sctp
(protocol sequence and nb_segs should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. e.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split.e.g. set rxhdrs mac,ipv4
(Supported protocols: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|inner_mac|
			     inner_ipv4|inner_ipv6|inner_l3|inner_tcp|
                    	     inner_udp|inner_sctp|inner_l4)
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/cmdline.c    | 127 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  81 ++++++++++++++++++++++++
 app/test-pmd/parameters.c |  15 ++++-
 app/test-pmd/testpmd.c    |   6 +-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 228 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..52e98e1c06 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -316,6 +316,15 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3617,6 +3626,72 @@ cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "l3"))
+		protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "l4"))
+		protocol = RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_l3"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4|RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else {
+		fprintf(stderr, "Unknown protocol name: %s\n", value);
+		return 0;
+	}
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	set_rx_pkt_hdrs(parsed_items, nb_item);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3986,6 +4061,49 @@ cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t seg_hdrs;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->seg_hdrs, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs >= 1)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+cmdline_parse_token_string_t cmd_set_rxhdrs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxhdrs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 rxhdrs, "rxhdrs");
+cmdline_parse_token_string_t cmd_set_rxhdrs_seg_hdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 seg_hdrs, NULL);
+cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_keyword,
+		(void *)&cmd_set_rxhdrs_name,
+		(void *)&cmd_set_rxhdrs_seg_hdrs,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -8058,6 +8176,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -8070,12 +8190,12 @@ cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -17833,6 +17953,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cc8e7aa138..742473456a 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4757,6 +4757,87 @@ show_rx_pkt_segments(void)
 		printf("%hu\n", rx_pkt_seg_lengths[i]);
 	}
 }
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_L2_ETHER:
+		return "mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L3_IPV6|RTE_PTYPE_L3_IPV4:
+		return "l3";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP:
+		return "l4";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unknown";
+	}
+}
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("%s\n", rx_pkt_hdr_protos[i] == 0 ? "payload" :
+						get_ptype_str(rx_pkt_hdr_protos[i]));
+	}
+}
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (!(seg_hdrs[i] & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
+			printf("ptype [%u]=%u > is not supported - give up\n",
+			       i, seg_hdrs[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t) seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = (uint8_t) nb_segs + 1;
+}
 
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index daf6a31b2b..f86d626276 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -161,6 +161,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -673,6 +674,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1327,7 +1329,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1337,6 +1338,18 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..77379b7aa9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -240,6 +240,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2586,12 +2587,11 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
-		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+		rx_seg->length = rx_pkt_seg_lengths[i];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..e791b9becd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -534,6 +534,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -864,6 +865,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 void prompt(void);
@@ -1018,6 +1022,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v8 0/3] ethdev: introduce protocol type based header split
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
                     ` (3 preceding siblings ...)
  2022-06-01 13:50   ` [PATCH v8 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
@ 2022-06-02 13:20   ` Andrew Rybchenko
  4 siblings, 0 replies; 88+ messages in thread
From: Andrew Rybchenko @ 2022-06-02 13:20 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, ferruh.yigit, aman.deep.singh,
	dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen

On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Buffer split consists of splitting a received packet into two separate

'two' is misleading above. Buffer split supports many segments.

> regions based on the packet content.

As far as I know buffer split is not based on packet content.

> It is useful in some scenarios,
> such as GPU acceleration. The splitting will help to enable true zero
> copy and hence improve the performance significantly.
> 
> This patchset extends the current buffer split to support multi protocol
> headers split. When Rx queue is configured with buffer split feature,
> packets received will be directly splitted into two different mempools.

v8?

> 
> v6->v7:
> *fix supported header protocol check.
> *add rxhdrs commands and parameters.
> v5->v6:
> *Change doc and designi of struct rte_eth_rxseg_split, support multi
> segments protocol_hdr configuration.
> v4->v5:
> * Use protocol and mbuf_offset based buffer split instead of header split.
> * Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
> * Improve the description of rte_eth_rxseg_split.proto.
> 
> v3->v4:
> * Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.
> 
> v2->v3:
> * Fix a PMD bug.
> * Add rx queue header split check.
> * Revise the log and doc.
> 
> v1->v2:
> * Add support for all header split protocol types.
> 
> Wenxuan Wu (3):
>    ethdev: introduce protocol header based buffer split
>    net/ice: support buffer split in Rx path
>    app/testpmd: add rxhdrs commands and parameters
> 
>   app/test-pmd/cmdline.c                | 127 ++++++++++++++-
>   app/test-pmd/config.c                 |  81 ++++++++++
>   app/test-pmd/parameters.c             |  15 +-
>   app/test-pmd/testpmd.c                |   6 +-
>   app/test-pmd/testpmd.h                |   6 +
>   drivers/net/ice/ice_ethdev.c          |  10 +-
>   drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
>   drivers/net/ice/ice_rxtx.h            |  16 ++
>   drivers/net/ice/ice_rxtx_vec_common.h |   3 +
>   lib/ethdev/rte_ethdev.c               |  40 ++++-
>   lib/ethdev/rte_ethdev.h               |  28 +++-
>   11 files changed, 505 insertions(+), 47 deletions(-)
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v8 1/3] ethdev: introduce protocol header based buffer split
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol header " wenxuanx.wu
@ 2022-06-02 13:20     ` Andrew Rybchenko
  2022-06-02 13:44       ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-06-02 13:20 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, ferruh.yigit, aman.deep.singh,
	dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Xuan Ding, Yuan Wang, Ray Kinsella

There are two v8 1/3 patches in my mailbox. Which one is the right one?

On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given a arbitrarily variable length in Rx packet
> segment, it is almost impossible to pass a fixed protocol header to PMD.
> Besides, the existence of tunneling results in the composition of a packet
> is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, PMD will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
> 
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures "split point" */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> 			       configures "split point" */
>      };
> 
> Both inner and outer L2/L3/L4 level protocol header split can be supported.
> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
> RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>      seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
>      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - udp header @ 128 in mbuf from pool1
>      seg2 - payload @ 0 in mbuf from pool2
> 
> Now buffet split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
> 
> The split limitations imposed by underlying PMD is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>

[snip]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
@ 2022-06-02 13:20     ` Andrew Rybchenko
  2022-06-03 16:30       ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-06-02 13:20 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, ferruh.yigit, aman.deep.singh,
	dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Xuan Ding, Yuan Wang, Ray Kinsella

Is it the right one since it is listed in patchwork?

On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given a arbitrarily variable length in Rx packet

a -> an

> segment, it is almost impossible to pass a fixed protocol header to PMD.
> Besides, the existence of tunneling results in the composition of a packet
> is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, PMD will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
> 
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures "split point" */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> 			       configures "split point" */
>      };
> 
> Both inner and outer L2/L3/L4 level protocol header split can be supported.
> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
> RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>      seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
>      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - udp header @ 128 in mbuf from pool1
>      seg2 - payload @ 0 in mbuf from pool2

It must be defined how ICMPv4 packets will be split in such case.
And how UDP over IPv6 will be split.
> 
> Now buffet split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
> 
> The split limitations imposed by underlying PMD is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>   lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
>   lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
>   2 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..fbd55cdd9d 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>   		uint32_t length = rx_seg[seg_idx].length;
>   		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>   
>   		if (mpl == NULL) {
>   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13 +1695,38 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {
> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
> +			/* Split at fixed length. */
> +			length = length != 0 ? length : *mbp_buf_size;
> +			if (*mbp_buf_size < length + offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> +					mpl->name, *mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		} else {
> +			/* Split after specified protocol header. */
> +			if (!(proto_hdr & RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {

The condition looks suspicious. It will be true if proto_hdr has no
single bit from the mask. I guess it is not the intent.
I guess the condition should be
   proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK
i.e. there is unsupported bits in proto_hdr

IMHO we need extra field in dev_info to report supported protocols to
split on. Or a new API to get an array similar to ptype get.
May be a new API is a better choice to not overload dev_info and to
be more flexible in reporting.

> +				RTE_ETHDEV_LOG(ERR,
> +					"Protocol header %u not supported)\n",
> +					proto_hdr);

I think it would be useful to log unsupported bits only, if we say so.

> +				return -EINVAL;
> +			}
> +
> +			if (length != 0) {
> +				RTE_ETHDEV_LOG(ERR, "segment length should be set to zero in protocol header "
> +					       "based buffer split\n");
> +				return -EINVAL;
> +			}
> +
> +			if (*mbp_buf_size < offset) {
> +				RTE_ETHDEV_LOG(ERR,
> +						"%s mbuf_data_room_size %u < %u segment offset)\n",
> +						mpl->name, *mbp_buf_size,
> +						offset);
> +				return -EINVAL;
> +			}
>   		}
>   	}
>   	return 0;
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..0cd9dd6cc0 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
>    *   mbuf) the following data will be pushed to the next segment
>    *   up to its own length, and so on.
>    *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>    * - If the length in the segment description element is zero
>    *   the actual buffer size will be deduced from the appropriate
>    *   memory pool properties.
> @@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
>    *     - pool from the last valid element
>    *     - the buffer size from this pool
>    *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field should not be configured.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field should not be configured.
>    */
>   struct rte_eth_rxseg_split {
>   	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>   	uint16_t length; /**< Segment data length, configures split point. */
>   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header, configures split point. */
>   };
>   
> +/* Buffer split protocol header capability. */
> +#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
> +	RTE_PTYPE_L2_ETHER | \
> +	RTE_PTYPE_L3_IPV4 | \
> +	RTE_PTYPE_L3_IPV6 | \
> +	RTE_PTYPE_L4_TCP | \
> +	RTE_PTYPE_L4_UDP | \
> +	RTE_PTYPE_L4_SCTP | \
> +	RTE_PTYPE_INNER_L2_ETHER | \
> +	RTE_PTYPE_INNER_L3_IPV4 | \
> +	RTE_PTYPE_INNER_L3_IPV6 | \
> +	RTE_PTYPE_INNER_L4_TCP | \
> +	RTE_PTYPE_INNER_L4_UDP | \
> +	RTE_PTYPE_INNER_L4_SCTP)
> +
>   /**
>    * @warning
>    * @b EXPERIMENTAL: this structure may change without prior notice.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v8 1/3] ethdev: introduce protocol header based buffer split
  2022-06-02 13:20     ` Andrew Rybchenko
@ 2022-06-02 13:44       ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-06-02 13:44 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun,
	ferruh.yigit, Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z,
	jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, June 2, 2022 9:21 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol header based buffer
> split
> 
> There are two v8 1/3 patches in my mailbox. Which one is the right one?

Yes, you are right, the second one is the latest one, sorry for the inconvenience.

Thanks,
Xuan

> 
> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given a arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> header to PMD.
> > Besides, the existence of tunneling results in the composition of a
> > packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, PMD will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures "split point" */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> > 			       configures "split point" */
> >      };
> >
> > Both inner and outer L2/L3/L4 level protocol header split can be supported.
> > Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> > RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> > RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
> > RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
> > RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >      seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> > following:
> >      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >      seg1 - udp header @ 128 in mbuf from pool1
> >      seg2 - payload @ 0 in mbuf from pool2
> >
> > Now buffet split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field should not be configured.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field should not be configured.
> >
> > The split limitations imposed by underlying PMD is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> 
> [snip]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-02 13:20     ` Andrew Rybchenko
@ 2022-06-03 16:30       ` Ding, Xuan
  2022-06-04 14:25         ` Andrew Rybchenko
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-06-03 16:30 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun,
	ferruh.yigit, Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z,
	jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, June 2, 2022 9:21 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
> 
> Is it the right one since it is listed in patchwork?

Yes, it is.

> 
> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given a arbitrarily variable length
> > in Rx packet
> 
> a -> an

Thanks for your catch, will fix it in next version.

> 
> > segment, it is almost impossible to pass a fixed protocol header to PMD.
> > Besides, the existence of tunneling results in the composition of a
> > packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol
> > header is configured, PMD will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >
> >          struct rte_mempool *mp; /* memory pools to allocate segment from
> */
> >          uint16_t length; /* segment maximal data length,
> >                              configures "split point" */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> > 			       configures "split point" */
> >      };
> >
> > Both inner and outer L2/L3/L4 level protocol header split can be supported.
> > Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> > RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> > RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
> > RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
> > RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> RTE_PTYPE_INNER_L4_SCTP.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >      seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> > following:
> >      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >      seg1 - udp header @ 128 in mbuf from pool1
> >      seg2 - payload @ 0 in mbuf from pool2
> 
> It must be defined how ICMPv4 packets will be split in such case.
> And how UDP over IPv6 will be split.

The ICMP header type is missed, I will define the expected split behavior and
add it in next version, thanks for your catch.

In fact, the buffer split based on protocol header depends on the driver parsing result.
As long as driver can recognize this packet type, I think there is no difference between
UDP over IPV4 and UDP over IPV6?

> >
> > Now buffet split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field should not be configured.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field should not be configured.
> >
> > The split limitations imposed by underlying PMD is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory,
> respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> > ---
> >   lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
> >   lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
> >   2 files changed, 60 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..fbd55cdd9d 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >   		uint32_t length = rx_seg[seg_idx].length;
> >   		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >
> >   		if (mpl == NULL) {
> >   			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> @@ -1694,13
> > +1695,38 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
> > +			/* Split at fixed length. */
> > +			length = length != 0 ? length : *mbp_buf_size;
> > +			if (*mbp_buf_size < length + offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > +					mpl->name, *mbp_buf_size,
> > +					length + offset, length, offset);
> > +				return -EINVAL;
> > +			}
> > +		} else {
> > +			/* Split after specified protocol header. */
> > +			if (!(proto_hdr &
> RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
> 
> The condition looks suspicious. It will be true if proto_hdr has no single bit
> from the mask. I guess it is not the intent.

Actually it is the intent... Here the mask is used to check if proto_hdr
belongs to the inner/outer L2/L3/L4 capability we defined. And which
proto_hdr is supported by the NIC will be checked in the PMD later.

> I guess the condition should be
>    proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK i.e. there is
> unsupported bits in proto_hdr
> 
> IMHO we need extra field in dev_info to report supported protocols to split
> on. Or a new API to get an array similar to ptype get.
> May be a new API is a better choice to not overload dev_info and to be more
> flexible in reporting.

Thanks for your suggestion.
Here I hope to confirm the intent of dev_info or API to expose the supported proto_hdr of driver.
Is it for the pro_hdr check in the rte_eth_rx_queue_check_split()?
If so, could we just check whether pro_hdrs configured belongs to L2/L3/L4 in lib, and check the
capability in PMD? This is what the current design does.

Actually I have another question, do we need a API or dev_info to expose which buffer split the driver supports.
i.e. length based or proto_hdr based. Because it requires different fields to be configured
in RX packet segment.

Hope to get your insights. :)

> 
> > +				RTE_ETHDEV_LOG(ERR,
> > +					"Protocol header %u not
> supported)\n",
> > +					proto_hdr);
> 
> I think it would be useful to log unsupported bits only, if we say so.

The same as above.
Thanks again for your time.

Regards,
Xuan

> 
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (length != 0) {
> > +				RTE_ETHDEV_LOG(ERR, "segment length
> should be set to zero in protocol header "
> > +					       "based buffer split\n");
> > +				return -EINVAL;
> > +			}
> > +
> > +			if (*mbp_buf_size < offset) {
> > +				RTE_ETHDEV_LOG(ERR,
> > +						"%s
> mbuf_data_room_size %u < %u segment offset)\n",
> > +						mpl->name, *mbp_buf_size,
> > +						offset);
> > +				return -EINVAL;
> > +			}
> >   		}
> >   	}
> >   	return 0;
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 04cff8ee10..0cd9dd6cc0 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1187,6 +1187,9 @@ struct rte_eth_txmode {
> >    *   mbuf) the following data will be pushed to the next segment
> >    *   up to its own length, and so on.
> >    *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >    * - If the length in the segment description element is zero
> >    *   the actual buffer size will be deduced from the appropriate
> >    *   memory pool properties.
> > @@ -1197,14 +1200,37 @@ struct rte_eth_txmode {
> >    *     - pool from the last valid element
> >    *     - the buffer size from this pool
> >    *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field should not be configured.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field should not be configured.
> >    */
> >   struct rte_eth_rxseg_split {
> >   	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >   	uint16_t length; /**< Segment data length, configures split point. */
> >   	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	uint32_t proto_hdr; /**< Inner/outer L2/L3/L4 protocol header,
> > +configures split point. */
> >   };
> >
> > +/* Buffer split protocol header capability. */ #define
> > +RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
> > +	RTE_PTYPE_L2_ETHER | \
> > +	RTE_PTYPE_L3_IPV4 | \
> > +	RTE_PTYPE_L3_IPV6 | \
> > +	RTE_PTYPE_L4_TCP | \
> > +	RTE_PTYPE_L4_UDP | \
> > +	RTE_PTYPE_L4_SCTP | \
> > +	RTE_PTYPE_INNER_L2_ETHER | \
> > +	RTE_PTYPE_INNER_L3_IPV4 | \
> > +	RTE_PTYPE_INNER_L3_IPV6 | \
> > +	RTE_PTYPE_INNER_L4_TCP | \
> > +	RTE_PTYPE_INNER_L4_UDP | \
> > +	RTE_PTYPE_INNER_L4_SCTP)
> > +
> >   /**
> >    * @warning
> >    * @b EXPERIMENTAL: this structure may change without prior notice.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-03 16:30       ` Ding, Xuan
@ 2022-06-04 14:25         ` Andrew Rybchenko
  2022-06-07 10:13           ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-06-04 14:25 UTC (permalink / raw)
  To: Ding, Xuan, Wu, WenxuanX, thomas, Li, Xiaoyun, ferruh.yigit,
	Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

On 6/3/22 19:30, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Thursday, June 2, 2022 9:21 PM
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
>> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> jerinjacobk@gmail.com
>> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
>> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
>> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
>>
>> Is it the right one since it is listed in patchwork?
> 
> Yes, it is.
> 
>>
>> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
>>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>>>
>>> Currently, Rx buffer split supports length based split. With Rx queue
>>> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
>> segment
>>> configured, PMD will be able to split the received packets into
>>> multiple segments.
>>>
>>> However, length based buffer split is not suitable for NICs that do
>>> split based on protocol headers. Given a arbitrarily variable length
>>> in Rx packet
>>
>> a -> an
> 
> Thanks for your catch, will fix it in next version.
> 
>>
>>> segment, it is almost impossible to pass a fixed protocol header to PMD.
>>> Besides, the existence of tunneling results in the composition of a
>>> packet is various, which makes the situation even worse.
>>>
>>> This patch extends current buffer split to support protocol header
>>> based buffer split. A new proto_hdr field is introduced in the
>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>> header. The proto_hdr field defines the split position of packet,
>>> splitting will always happens after the protocol header defined in the
>>> Rx packet segment. When Rx queue offload
>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
>> protocol
>>> header is configured, PMD will split the ingress packets into multiple
>> segments.
>>>
>>> struct rte_eth_rxseg_split {
>>>
>>>           struct rte_mempool *mp; /* memory pools to allocate segment from
>> */
>>>           uint16_t length; /* segment maximal data length,
>>>                               configures "split point" */
>>>           uint16_t offset; /* data offset from beginning
>>>                               of mbuf data buffer */
>>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
>>> 			       configures "split point" */
>>>       };
>>>
>>> Both inner and outer L2/L3/L4 level protocol header split can be supported.
>>> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
>>> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
>>> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
>>> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
>>> RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
>> RTE_PTYPE_INNER_L4_SCTP.
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>       seg2 - pool2, off1=0B
>>>
>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>> following:
>>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>> pool0
>>>       seg1 - udp header @ 128 in mbuf from pool1
>>>       seg2 - payload @ 0 in mbuf from pool2
>>
>> It must be defined how ICMPv4 packets will be split in such case.
>> And how UDP over IPv6 will be split.
> 
> The ICMP header type is missed, I will define the expected split behavior and
> add it in next version, thanks for your catch.
> 
> In fact, the buffer split based on protocol header depends on the driver parsing result.
> As long as driver can recognize this packet type, I think there is no difference between
> UDP over IPV4 and UDP over IPV6?

We can bind it to ptypes recognized by the HW+driver, but I can
easily imagine the case when HW has no means to report recognized
packet type (i.e. ptype get returns empty list), but still could
split on it.
Also, nobody guarantees that there is no different in UDP over IPv4 vs
IPv6 recognition and split. IPv6 could have a number of extension
headers which could be not that trivial to hop in HW. So, HW could
recognize IPv6, but not protocols after it.
Also it is very interesting question how to define protocol split
for IPv6 plus extension headers. Where to stop?

> 
>>>
>>> Now buffet split can be configured in two modes. For length based
>>> buffer split, the mp, length, offset field in Rx packet segment should
>>> be configured, while the proto_hdr field should not be configured.
>>> For protocol header based buffer split, the mp, offset, proto_hdr
>>> field in Rx packet segment should be configured, while the length
>>> field should not be configured.
>>>
>>> The split limitations imposed by underlying PMD is reported in the
>>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
>>> split parts may differ either, dpdk memory and external memory,
>> respectively.
>>>
>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>> ---
>>>    lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++-------
>>>    lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
>>>    2 files changed, 60 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>> 29a3d80466..fbd55cdd9d 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>    		uint32_t length = rx_seg[seg_idx].length;
>>>    		uint32_t offset = rx_seg[seg_idx].offset;
>>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>>>
>>>    		if (mpl == NULL) {
>>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>> @@ -1694,13
>>> +1695,38 @@ rte_eth_rx_queue_check_split(const struct
>> rte_eth_rxseg_split *rx_seg,
>>>    		}
>>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>> -		length = length != 0 ? length : *mbp_buf_size;
>>> -		if (*mbp_buf_size < length + offset) {
>>> -			RTE_ETHDEV_LOG(ERR,
>>> -				       "%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> -				       mpl->name, *mbp_buf_size,
>>> -				       length + offset, length, offset);
>>> -			return -EINVAL;
>>> +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
>>> +			/* Split at fixed length. */
>>> +			length = length != 0 ? length : *mbp_buf_size;
>>> +			if (*mbp_buf_size < length + offset) {
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"%s mbuf_data_room_size %u < %u
>> (segment length=%u + segment offset=%u)\n",
>>> +					mpl->name, *mbp_buf_size,
>>> +					length + offset, length, offset);
>>> +				return -EINVAL;
>>> +			}
>>> +		} else {
>>> +			/* Split after specified protocol header. */
>>> +			if (!(proto_hdr &
>> RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
>>
>> The condition looks suspicious. It will be true if proto_hdr has no single bit
>> from the mask. I guess it is not the intent.
> 
> Actually it is the intent... Here the mask is used to check if proto_hdr
> belongs to the inner/outer L2/L3/L4 capability we defined. And which
> proto_hdr is supported by the NIC will be checked in the PMD later.

Frankly speaking I see no value in such incomplete check if
we still rely on driver. I simply see no reason to oblige the
driver to support one of these protocols.

> 
>> I guess the condition should be
>>     proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK i.e. there is
>> unsupported bits in proto_hdr
>>
>> IMHO we need extra field in dev_info to report supported protocols to split
>> on. Or a new API to get an array similar to ptype get.
>> May be a new API is a better choice to not overload dev_info and to be more
>> flexible in reporting.
> 
> Thanks for your suggestion.
> Here I hope to confirm the intent of dev_info or API to expose the supported proto_hdr of driver.
> Is it for the pro_hdr check in the rte_eth_rx_queue_check_split()?
> If so, could we just check whether pro_hdrs configured belongs to L2/L3/L4 in lib, and check the
> capability in PMD? This is what the current design does.

Look. Application needs to know what to expect from eth device.
It should know which protocols it can split on. Of course we can
enforce application to use try-fail approach which would make sense
if we have dedicated API to request Rx buffer split, but since it
is done via Rx queue configuration, it could be tricky for application
to realize which part of the configuration is wrong. It could simply
result in a too many retries with different configuration.

I.e. the information should be used by ethdev to validate request and
the information should be ued by the application to understand what is
supported.

> 
> Actually I have another question, do we need a API or dev_info to expose which buffer split the driver supports.
> i.e. length based or proto_hdr based. Because it requires different fields to be configured
> in RX packet segment.

See above. If dedicated API return -ENOTSUP or empty set of supported
protocols to split on, the answer is clear.

> 
> Hope to get your insights. :)
> 
>>
>>> +				RTE_ETHDEV_LOG(ERR,
>>> +					"Protocol header %u not
>> supported)\n",
>>> +					proto_hdr);
>>
>> I think it would be useful to log unsupported bits only, if we say so.
> 
> The same as above.
> Thanks again for your time.
> 
> Regards,
> Xuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-04 14:25         ` Andrew Rybchenko
@ 2022-06-07 10:13           ` Ding, Xuan
  2022-06-07 10:48             ` Andrew Rybchenko
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-06-07 10:13 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun,
	ferruh.yigit, Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z,
	jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Saturday, June 4, 2022 10:26 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
> 
> On 6/3/22 19:30, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Thursday, June 2, 2022 9:21 PM
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
> Li,
> >> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman
> >> Deep <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> jerinjacobk@gmail.com
> >> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> >> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> >> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> Is it the right one since it is listed in patchwork?
> >
> > Yes, it is.
> >
> >>
> >> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> >>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>>
> >>> Currently, Rx buffer split supports length based split. With Rx
> >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> packet
> >> segment
> >>> configured, PMD will be able to split the received packets into
> >>> multiple segments.
> >>>
> >>> However, length based buffer split is not suitable for NICs that do
> >>> split based on protocol headers. Given a arbitrarily variable length
> >>> in Rx packet
> >>
> >> a -> an
> >
> > Thanks for your catch, will fix it in next version.
> >
> >>
> >>> segment, it is almost impossible to pass a fixed protocol header to PMD.
> >>> Besides, the existence of tunneling results in the composition of a
> >>> packet is various, which makes the situation even worse.
> >>>
> >>> This patch extends current buffer split to support protocol header
> >>> based buffer split. A new proto_hdr field is introduced in the
> >>> reserved field of rte_eth_rxseg_split structure to specify protocol
> >>> header. The proto_hdr field defines the split position of packet,
> >>> splitting will always happens after the protocol header defined in
> >>> the Rx packet segment. When Rx queue offload
> >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >> protocol
> >>> header is configured, PMD will split the ingress packets into
> >>> multiple
> >> segments.
> >>>
> >>> struct rte_eth_rxseg_split {
> >>>
> >>>           struct rte_mempool *mp; /* memory pools to allocate
> >>> segment from
> >> */
> >>>           uint16_t length; /* segment maximal data length,
> >>>                               configures "split point" */
> >>>           uint16_t offset; /* data offset from beginning
> >>>                               of mbuf data buffer */
> >>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> >>> 			       configures "split point" */
> >>>       };
> >>>
> >>> Both inner and outer L2/L3/L4 level protocol header split can be
> supported.
> >>> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> >>> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> >>> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
> >>> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
> >>> RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> >> RTE_PTYPE_INNER_L4_SCTP.
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>       seg2 - pool2, off1=0B
> >>>
> >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>> following:
> >>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> >> pool0
> >>>       seg1 - udp header @ 128 in mbuf from pool1
> >>>       seg2 - payload @ 0 in mbuf from pool2
> >>
> >> It must be defined how ICMPv4 packets will be split in such case.
> >> And how UDP over IPv6 will be split.
> >
> > The ICMP header type is missed, I will define the expected split
> > behavior and add it in next version, thanks for your catch.

I have a question here. Since ICMP packets are mainly used to check the
connectivity of network, is it necessary for us to split ICMP packets?
And I found there is no RTE_PTYPE for ICMP.

> >
> > In fact, the buffer split based on protocol header depends on the driver
> parsing result.
> > As long as driver can recognize this packet type, I think there is no
> > difference between UDP over IPV4 and UDP over IPV6?
> 
> We can bind it to ptypes recognized by the HW+driver, but I can easily
> imagine the case when HW has no means to report recognized packet type
> (i.e. ptype get returns empty list), but still could split on it.

Get your point. But if one ptype cannot be recognized by HW+driver, is it still necessary for
us to do the split? The main purpose of buffer split is to split header and payload. Although we
add split for various protocol headers now, we should focus the ptype can be recognized.

> Also, nobody guarantees that there is no different in UDP over IPv4 vs
> IPv6 recognition and split. IPv6 could have a number of extension headers
> which could be not that trivial to hop in HW. So, HW could recognize IPv6,
> but not protocols after it.
> Also it is very interesting question how to define protocol split for IPv6 plus
> extension headers. Where to stop?

The extension header you mentioned is indeed an interesting question.
On our device, the stop would be the end of extension header. The same as
above, the main purpose of buffers split is for header and payload.
Even rte_flow, we don't list all of the extension headers. So we can't cope with
all the IPV6 extension headers.

For IPV6 extension headers, what if we treat the IPV6 header and extension
header as one layer? Because 99% of cases will not require a separate extension
header.

Hope to get your insights.

> 
> >
> >>>
> >>> Now buffet split can be configured in two modes. For length based
> >>> buffer split, the mp, length, offset field in Rx packet segment
> >>> should be configured, while the proto_hdr field should not be configured.
> >>> For protocol header based buffer split, the mp, offset, proto_hdr
> >>> field in Rx packet segment should be configured, while the length
> >>> field should not be configured.
> >>>
> >>> The split limitations imposed by underlying PMD is reported in the
> >>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> >>> split parts may differ either, dpdk memory and external memory,
> >> respectively.
> >>>
> >>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> >>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>> ---
> >>>    lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++---
> ----
> >>>    lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
> >>>    2 files changed, 60 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> >>> 29a3d80466..fbd55cdd9d 100644
> >>> --- a/lib/ethdev/rte_ethdev.c
> >>> +++ b/lib/ethdev/rte_ethdev.c
> >>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>    		uint32_t length = rx_seg[seg_idx].length;
> >>>    		uint32_t offset = rx_seg[seg_idx].offset;
> >>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >>>
> >>>    		if (mpl == NULL) {
> >>>    			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> >> @@ -1694,13
> >>> +1695,38 @@ rte_eth_rx_queue_check_split(const struct
> >> rte_eth_rxseg_split *rx_seg,
> >>>    		}
> >>>    		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >>>    		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>> -		length = length != 0 ? length : *mbp_buf_size;
> >>> -		if (*mbp_buf_size < length + offset) {
> >>> -			RTE_ETHDEV_LOG(ERR,
> >>> -				       "%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> -				       mpl->name, *mbp_buf_size,
> >>> -				       length + offset, length, offset);
> >>> -			return -EINVAL;
> >>> +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
> >>> +			/* Split at fixed length. */
> >>> +			length = length != 0 ? length : *mbp_buf_size;
> >>> +			if (*mbp_buf_size < length + offset) {
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"%s mbuf_data_room_size %u < %u
> >> (segment length=%u + segment offset=%u)\n",
> >>> +					mpl->name, *mbp_buf_size,
> >>> +					length + offset, length, offset);
> >>> +				return -EINVAL;
> >>> +			}
> >>> +		} else {
> >>> +			/* Split after specified protocol header. */
> >>> +			if (!(proto_hdr &
> >> RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
> >>
> >> The condition looks suspicious. It will be true if proto_hdr has no
> >> single bit from the mask. I guess it is not the intent.
> >
> > Actually it is the intent... Here the mask is used to check if
> > proto_hdr belongs to the inner/outer L2/L3/L4 capability we defined.
> > And which proto_hdr is supported by the NIC will be checked in the PMD
> later.

Need to correct here. You are right, I made a bug in previous implementation.

> 
> Frankly speaking I see no value in such incomplete check if we still rely on
> driver. I simply see no reason to oblige the driver to support one of these
> protocols.

With API, we can get the driver's capabilities first, and do the check in ethdev later.
In this way, we can finish the checks in once. Please see v9.

> 
> >
> >> I guess the condition should be
> >>     proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK i.e. there is
> >> unsupported bits in proto_hdr
> >>
> >> IMHO we need extra field in dev_info to report supported protocols to
> >> split on. Or a new API to get an array similar to ptype get.
> >> May be a new API is a better choice to not overload dev_info and to
> >> be more flexible in reporting.
> >
> > Thanks for your suggestion.
> > Here I hope to confirm the intent of dev_info or API to expose the
> supported proto_hdr of driver.
> > Is it for the pro_hdr check in the rte_eth_rx_queue_check_split()?
> > If so, could we just check whether pro_hdrs configured belongs to
> > L2/L3/L4 in lib, and check the capability in PMD? This is what the current
> design does.
> 
> Look. Application needs to know what to expect from eth device.
> It should know which protocols it can split on. Of course we can enforce
> application to use try-fail approach which would make sense if we have
> dedicated API to request Rx buffer split, but since it is done via Rx queue
> configuration, it could be tricky for application to realize which part of the
> configuration is wrong. It could simply result in a too many retries with
> different configuration.

Agree. To avoid the unnecessary try-fails, I will add a new API in dev_ops,
please see v9.

> 
> I.e. the information should be used by ethdev to validate request and the
> information should be ued by the application to understand what is
> supported.
> 
> >
> > Actually I have another question, do we need a API or dev_info to expose
> which buffer split the driver supports.
> > i.e. length based or proto_hdr based. Because it requires different
> > fields to be configured in RX packet segment.
> 
> See above. If dedicated API return -ENOTSUP or empty set of supported
> protocols to split on, the answer is clear.

Get your point. I totally agree with your idea of a new API.
Through the API, the logic will look like:
ret = rte_get_supported_buffer_split_proto();
	if (ret == -ENOTSUP)
		Check length based buffer split
	else
		Checked proto based buffer split

We don't need to care about the irrelevant fields in respective
buffer split anymore.

BTW, could you help to review the deprecation notice for header split?
If it gets acked, I will start the deprecation in 22.11.

Thanks,
Xuan

> 
> >
> > Hope to get your insights. :)
> >
> >>
> >>> +				RTE_ETHDEV_LOG(ERR,
> >>> +					"Protocol header %u not
> >> supported)\n",
> >>> +					proto_hdr);
> >>
> >> I think it would be useful to log unsupported bits only, if we say so.
> >
> > The same as above.
> > Thanks again for your time.
> >
> > Regards,
> > Xuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-07 10:13           ` Ding, Xuan
@ 2022-06-07 10:48             ` Andrew Rybchenko
  2022-06-10 15:04               ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-06-07 10:48 UTC (permalink / raw)
  To: Ding, Xuan, Wu, WenxuanX, thomas, Li, Xiaoyun, ferruh.yigit,
	Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

On 6/7/22 13:13, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Saturday, June 4, 2022 10:26 PM
>> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
>> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
>> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> jerinjacobk@gmail.com
>> Cc: stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
>> Ray Kinsella <mdr@ashroe.eu>
>> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
>>
>> On 6/3/22 19:30, Ding, Xuan wrote:
>>> Hi Andrew,
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> Sent: Thursday, June 2, 2022 9:21 PM
>>>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
>> Li,
>>>> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman
>>>> Deep <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
>>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>>>> jerinjacobk@gmail.com
>>>> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
>>>> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
>>>> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based
>>>> buffer split
>>>>
>>>> Is it the right one since it is listed in patchwork?
>>>
>>> Yes, it is.
>>>
>>>>
>>>> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
>>>>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>>>>>
>>>>> Currently, Rx buffer split supports length based split. With Rx
>>>>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
>> packet
>>>> segment
>>>>> configured, PMD will be able to split the received packets into
>>>>> multiple segments.
>>>>>
>>>>> However, length based buffer split is not suitable for NICs that do
>>>>> split based on protocol headers. Given a arbitrarily variable length
>>>>> in Rx packet
>>>>
>>>> a -> an
>>>
>>> Thanks for your catch, will fix it in next version.
>>>
>>>>
>>>>> segment, it is almost impossible to pass a fixed protocol header to PMD.
>>>>> Besides, the existence of tunneling results in the composition of a
>>>>> packet is various, which makes the situation even worse.
>>>>>
>>>>> This patch extends current buffer split to support protocol header
>>>>> based buffer split. A new proto_hdr field is introduced in the
>>>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>>>> header. The proto_hdr field defines the split position of packet,
>>>>> splitting will always happens after the protocol header defined in
>>>>> the Rx packet segment. When Rx queue offload
>>>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
>>>> protocol
>>>>> header is configured, PMD will split the ingress packets into
>>>>> multiple
>>>> segments.
>>>>>
>>>>> struct rte_eth_rxseg_split {
>>>>>
>>>>>            struct rte_mempool *mp; /* memory pools to allocate
>>>>> segment from
>>>> */
>>>>>            uint16_t length; /* segment maximal data length,
>>>>>                                configures "split point" */
>>>>>            uint16_t offset; /* data offset from beginning
>>>>>                                of mbuf data buffer */
>>>>>            uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
>>>>> 			       configures "split point" */
>>>>>        };
>>>>>
>>>>> Both inner and outer L2/L3/L4 level protocol header split can be
>> supported.
>>>>> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
>>>>> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
>>>>> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
>>>>> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
>>>>> RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
>>>> RTE_PTYPE_INNER_L4_SCTP.
>>>>>
>>>>> For example, let's suppose we configured the Rx queue with the
>>>>> following segments:
>>>>>        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>>>        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>>>        seg2 - pool2, off1=0B
>>>>>
>>>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>>>> following:
>>>>>        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
>>>> pool0
>>>>>        seg1 - udp header @ 128 in mbuf from pool1
>>>>>        seg2 - payload @ 0 in mbuf from pool2
>>>>
>>>> It must be defined how ICMPv4 packets will be split in such case.
>>>> And how UDP over IPv6 will be split.
>>>
>>> The ICMP header type is missed, I will define the expected split
>>> behavior and add it in next version, thanks for your catch.
> 
> I have a question here. Since ICMP packets are mainly used to check the
> connectivity of network, is it necessary for us to split ICMP packets?
> And I found there is no RTE_PTYPE for ICMP.

I'm not saying that we should split on ICMP. I'm just saying that we
must define the behaviour when happens for packets which do not match
split specification. Does it split on longest match and everything else
is put in (which?) last buffer? E.g. we configure split on ETH-IPv4-TCP.
What does happen with ETH-IPv4-UDP? ETH-IPv6?

>>>
>>> In fact, the buffer split based on protocol header depends on the driver
>> parsing result.
>>> As long as driver can recognize this packet type, I think there is no
>>> difference between UDP over IPV4 and UDP over IPV6?
>>
>> We can bind it to ptypes recognized by the HW+driver, but I can easily
>> imagine the case when HW has no means to report recognized packet type
>> (i.e. ptype get returns empty list), but still could split on it.
> 
> Get your point. But if one ptype cannot be recognized by HW+driver, is it still necessary for
> us to do the split? The main purpose of buffer split is to split header and payload. Although we
> add split for various protocol headers now, we should focus the ptype can be recognized.

Recognition and reporting is a separate things. It could be recognized,
but it can have no means to report it to the driver. ptype_get is about
reporting.

> 
>> Also, nobody guarantees that there is no different in UDP over IPv4 vs
>> IPv6 recognition and split. IPv6 could have a number of extension headers
>> which could be not that trivial to hop in HW. So, HW could recognize IPv6,
>> but not protocols after it.
>> Also it is very interesting question how to define protocol split for IPv6 plus
>> extension headers. Where to stop?
> 
> The extension header you mentioned is indeed an interesting question.
> On our device, the stop would be the end of extension header. The same as
> above, the main purpose of buffers split is for header and payload.
> Even rte_flow, we don't list all of the extension headers. So we can't cope with
> all the IPV6 extension headers.

Again, we must define the behaviour. Application needs to know what to
expect.

> 
> For IPV6 extension headers, what if we treat the IPV6 header and extension
> header as one layer? Because 99% of cases will not require a separate extension
> header.

I'd like to highlight that it is not "an extension header". It is
'extension headers' (plural). I'm not sure that we can say that
in order to split on IPv6 HW must support *all* (even future)
extension headers.

> 
> Hope to get your insights.

Unfortunately I have no solutions. Just questions to be answered...

> 
>>
>>>
>>>>>
>>>>> Now buffet split can be configured in two modes. For length based
>>>>> buffer split, the mp, length, offset field in Rx packet segment
>>>>> should be configured, while the proto_hdr field should not be configured.
>>>>> For protocol header based buffer split, the mp, offset, proto_hdr
>>>>> field in Rx packet segment should be configured, while the length
>>>>> field should not be configured.
>>>>>
>>>>> The split limitations imposed by underlying PMD is reported in the
>>>>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
>>>>> split parts may differ either, dpdk memory and external memory,
>>>> respectively.
>>>>>
>>>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
>>>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
>>>>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
>>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>>>> ---
>>>>>     lib/ethdev/rte_ethdev.c | 40 +++++++++++++++++++++++++++++++++---
>> ----
>>>>>     lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
>>>>>     2 files changed, 60 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
>>>>> 29a3d80466..fbd55cdd9d 100644
>>>>> --- a/lib/ethdev/rte_ethdev.c
>>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
>>>> rte_eth_rxseg_split *rx_seg,
>>>>>     		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
>>>>>     		uint32_t length = rx_seg[seg_idx].length;
>>>>>     		uint32_t offset = rx_seg[seg_idx].offset;
>>>>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
>>>>>
>>>>>     		if (mpl == NULL) {
>>>>>     			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
>>>> @@ -1694,13
>>>>> +1695,38 @@ rte_eth_rx_queue_check_split(const struct
>>>> rte_eth_rxseg_split *rx_seg,
>>>>>     		}
>>>>>     		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>>>>>     		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
>>>>> -		length = length != 0 ? length : *mbp_buf_size;
>>>>> -		if (*mbp_buf_size < length + offset) {
>>>>> -			RTE_ETHDEV_LOG(ERR,
>>>>> -				       "%s mbuf_data_room_size %u < %u
>>>> (segment length=%u + segment offset=%u)\n",
>>>>> -				       mpl->name, *mbp_buf_size,
>>>>> -				       length + offset, length, offset);
>>>>> -			return -EINVAL;
>>>>> +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
>>>>> +			/* Split at fixed length. */
>>>>> +			length = length != 0 ? length : *mbp_buf_size;
>>>>> +			if (*mbp_buf_size < length + offset) {
>>>>> +				RTE_ETHDEV_LOG(ERR,
>>>>> +					"%s mbuf_data_room_size %u < %u
>>>> (segment length=%u + segment offset=%u)\n",
>>>>> +					mpl->name, *mbp_buf_size,
>>>>> +					length + offset, length, offset);
>>>>> +				return -EINVAL;
>>>>> +			}
>>>>> +		} else {
>>>>> +			/* Split after specified protocol header. */
>>>>> +			if (!(proto_hdr &
>>>> RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
>>>>
>>>> The condition looks suspicious. It will be true if proto_hdr has no
>>>> single bit from the mask. I guess it is not the intent.
>>>
>>> Actually it is the intent... Here the mask is used to check if
>>> proto_hdr belongs to the inner/outer L2/L3/L4 capability we defined.
>>> And which proto_hdr is supported by the NIC will be checked in the PMD
>> later.
> 
> Need to correct here. You are right, I made a bug in previous implementation.
> 
>>
>> Frankly speaking I see no value in such incomplete check if we still rely on
>> driver. I simply see no reason to oblige the driver to support one of these
>> protocols.
> 
> With API, we can get the driver's capabilities first, and do the check in ethdev later.
> In this way, we can finish the checks in once. Please see v9.
> 
>>
>>>
>>>> I guess the condition should be
>>>>      proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK i.e. there is
>>>> unsupported bits in proto_hdr
>>>>
>>>> IMHO we need extra field in dev_info to report supported protocols to
>>>> split on. Or a new API to get an array similar to ptype get.
>>>> May be a new API is a better choice to not overload dev_info and to
>>>> be more flexible in reporting.
>>>
>>> Thanks for your suggestion.
>>> Here I hope to confirm the intent of dev_info or API to expose the
>> supported proto_hdr of driver.
>>> Is it for the pro_hdr check in the rte_eth_rx_queue_check_split()?
>>> If so, could we just check whether pro_hdrs configured belongs to
>>> L2/L3/L4 in lib, and check the capability in PMD? This is what the current
>> design does.
>>
>> Look. Application needs to know what to expect from eth device.
>> It should know which protocols it can split on. Of course we can enforce
>> application to use try-fail approach which would make sense if we have
>> dedicated API to request Rx buffer split, but since it is done via Rx queue
>> configuration, it could be tricky for application to realize which part of the
>> configuration is wrong. It could simply result in a too many retries with
>> different configuration.
> 
> Agree. To avoid the unnecessary try-fails, I will add a new API in dev_ops,
> please see v9.
> 
>>
>> I.e. the information should be used by ethdev to validate request and the
>> information should be ued by the application to understand what is
>> supported.
>>
>>>
>>> Actually I have another question, do we need a API or dev_info to expose
>> which buffer split the driver supports.
>>> i.e. length based or proto_hdr based. Because it requires different
>>> fields to be configured in RX packet segment.
>>
>> See above. If dedicated API return -ENOTSUP or empty set of supported
>> protocols to split on, the answer is clear.
> 
> Get your point. I totally agree with your idea of a new API.
> Through the API, the logic will look like:
> ret = rte_get_supported_buffer_split_proto();
> 	if (ret == -ENOTSUP)
> 		Check length based buffer split
> 	else
> 		Checked proto based buffer split
> 
> We don't need to care about the irrelevant fields in respective
> buffer split anymore.
> 
> BTW, could you help to review the deprecation notice for header split?
> If it gets acked, I will start the deprecation in 22.11.

Since the feature is definitely dead. I'd use faster track.
Deprecate in 22.07 and remove in 22.11.

> 
> Thanks,
> Xuan
> 
>>
>>>
>>> Hope to get your insights. :)
>>>
>>>>
>>>>> +				RTE_ETHDEV_LOG(ERR,
>>>>> +					"Protocol header %u not
>>>> supported)\n",
>>>>> +					proto_hdr);
>>>>
>>>> I think it would be useful to log unsupported bits only, if we say so.
>>>
>>> The same as above.
>>> Thanks again for your time.
>>>
>>> Regards,
>>> Xuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
  2022-06-07 10:48             ` Andrew Rybchenko
@ 2022-06-10 15:04               ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-06-10 15:04 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun,
	ferruh.yigit, Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z,
	jerinjacobk
  Cc: stephen, Wang, YuanX, Ray Kinsella

Hi Andrew,

Sorry for the late response, please see replies inline.

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Tuesday, June 7, 2022 6:49 PM
> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Wang, YuanX <yuanx.wang@intel.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split
> 
> On 6/7/22 13:13, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: Saturday, June 4, 2022 10:26 PM
> >> To: Ding, Xuan <xuan.ding@intel.com>; Wu, WenxuanX
> >> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> >> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> >> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> jerinjacobk@gmail.com
> >> Cc: stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>;
> >> Ray Kinsella <mdr@ashroe.eu>
> >> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> On 6/3/22 19:30, Ding, Xuan wrote:
> >>> Hi Andrew,
> >>>
> >>>> -----Original Message-----
> >>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >>>> Sent: Thursday, June 2, 2022 9:21 PM
> >>>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>;
> thomas@monjalon.net;
> >> Li,
> >>>> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh,
> >>>> Aman Deep <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang,
> Yuying
> >>>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >>>> jerinjacobk@gmail.com
> >>>> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> >>>> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> >>>> Subject: Re: [PATCH v8 1/3] ethdev: introduce protocol hdr based
> >>>> buffer split
> >>>>
> >>>> Is it the right one since it is listed in patchwork?
> >>>
> >>> Yes, it is.
> >>>
> >>>>
> >>>> On 6/1/22 16:50, wenxuanx.wu@intel.com wrote:
> >>>>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>>>>
> >>>>> Currently, Rx buffer split supports length based split. With Rx
> >>>>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> >> packet
> >>>> segment
> >>>>> configured, PMD will be able to split the received packets into
> >>>>> multiple segments.
> >>>>>
> >>>>> However, length based buffer split is not suitable for NICs that
> >>>>> do split based on protocol headers. Given a arbitrarily variable
> >>>>> length in Rx packet
> >>>>
> >>>> a -> an
> >>>
> >>> Thanks for your catch, will fix it in next version.
> >>>
> >>>>
> >>>>> segment, it is almost impossible to pass a fixed protocol header to
> PMD.
> >>>>> Besides, the existence of tunneling results in the composition of
> >>>>> a packet is various, which makes the situation even worse.
> >>>>>
> >>>>> This patch extends current buffer split to support protocol header
> >>>>> based buffer split. A new proto_hdr field is introduced in the
> >>>>> reserved field of rte_eth_rxseg_split structure to specify
> >>>>> protocol header. The proto_hdr field defines the split position of
> >>>>> packet, splitting will always happens after the protocol header
> >>>>> defined in the Rx packet segment. When Rx queue offload
> >>>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >>>> protocol
> >>>>> header is configured, PMD will split the ingress packets into
> >>>>> multiple
> >>>> segments.
> >>>>>
> >>>>> struct rte_eth_rxseg_split {
> >>>>>
> >>>>>            struct rte_mempool *mp; /* memory pools to allocate
> >>>>> segment from
> >>>> */
> >>>>>            uint16_t length; /* segment maximal data length,
> >>>>>                                configures "split point" */
> >>>>>            uint16_t offset; /* data offset from beginning
> >>>>>                                of mbuf data buffer */
> >>>>>            uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> >>>>> 			       configures "split point" */
> >>>>>        };
> >>>>>
> >>>>> Both inner and outer L2/L3/L4 level protocol header split can be
> >> supported.
> >>>>> Corresponding protocol header capability is RTE_PTYPE_L2_ETHER,
> >>>>> RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> >>>>> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
> RTE_PTYPE_INNER_L2_ETHER,
> >>>>> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6,
> >>>>> RTE_PTYPE_INNER_L4_TCP, RTE_PTYPE_INNER_L4_UDP,
> >>>> RTE_PTYPE_INNER_L4_SCTP.
> >>>>>
> >>>>> For example, let's suppose we configured the Rx queue with the
> >>>>> following segments:
> >>>>>        seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>>>        seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>>>        seg2 - pool2, off1=0B
> >>>>>
> >>>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>>>> following:
> >>>>>        seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf
> from
> >>>> pool0
> >>>>>        seg1 - udp header @ 128 in mbuf from pool1
> >>>>>        seg2 - payload @ 0 in mbuf from pool2
> >>>>
> >>>> It must be defined how ICMPv4 packets will be split in such case.
> >>>> And how UDP over IPv6 will be split.
> >>>
> >>> The ICMP header type is missed, I will define the expected split
> >>> behavior and add it in next version, thanks for your catch.
> >
> > I have a question here. Since ICMP packets are mainly used to check
> > the connectivity of network, is it necessary for us to split ICMP packets?
> > And I found there is no RTE_PTYPE for ICMP.
> 
> I'm not saying that we should split on ICMP. I'm just saying that we must
> define the behaviour when happens for packets which do not match split
> specification. Does it split on longest match and everything else is put in
> (which?) last buffer? E.g. we configure split on ETH-IPv4-TCP.
> What does happen with ETH-IPv4-UDP? ETH-IPv6?

Get your point. Firstly, our device only supports to split the packets into two segments,
So there will be an exact match for the configured protocol header. Back to this
question, for the set of proto_hdrs configured, it can have two behaviors:
1. The aggressive way is to split on longest match you mentioned, E.g. we configure split
on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on ETH-IPV4
or ETH.
2. A more conservative way is to split only when the packets meet the protocol headers
in the RX packet segment. In the above situation, it will not do split for ETH-IPV4-UDP
and ETH-IPV6.

I prefer the second behavior, because the split is usually for the inner most header and
payload, if it does not meet, the rest of the headers have no actual value.
What do you think?

> 
> >>>
> >>> In fact, the buffer split based on protocol header depends on the
> >>> driver
> >> parsing result.
> >>> As long as driver can recognize this packet type, I think there is
> >>> no difference between UDP over IPV4 and UDP over IPV6?
> >>
> >> We can bind it to ptypes recognized by the HW+driver, but I can
> >> easily imagine the case when HW has no means to report recognized
> >> packet type (i.e. ptype get returns empty list), but still could split on it.
> >
> > Get your point. But if one ptype cannot be recognized by HW+driver, is
> > it still necessary for us to do the split? The main purpose of buffer
> > split is to split header and payload. Although we add split for various
> protocol headers now, we should focus the ptype can be recognized.
> 
> Recognition and reporting is a separate things. It could be recognized, but it
> can have no means to report it to the driver. ptype_get is about reporting.
> 
> >
> >> Also, nobody guarantees that there is no different in UDP over IPv4
> >> vs
> >> IPv6 recognition and split. IPv6 could have a number of extension
> >> headers which could be not that trivial to hop in HW. So, HW could
> >> recognize IPv6, but not protocols after it.
> >> Also it is very interesting question how to define protocol split for
> >> IPv6 plus extension headers. Where to stop?
> >
> > The extension header you mentioned is indeed an interesting question.
> > On our device, the stop would be the end of extension header. The same
> > as above, the main purpose of buffers split is for header and payload.
> > Even rte_flow, we don't list all of the extension headers. So we can't
> > cope with all the IPV6 extension headers.
> 
> Again, we must define the behaviour. Application needs to know what to
> expect.

Now I understand the behavior needs to defined clearly for application.
Application need a clear expectation for each function call.

> 
> >
> > For IPV6 extension headers, what if we treat the IPV6 header and
> > extension header as one layer? Because 99% of cases will not require a
> > separate extension header.
> 
> I'd like to highlight that it is not "an extension header". It is 'extension
> headers' (plural). I'm not sure that we can say that in order to split on IPv6
> HW must support *all* (even future) extension headers.

Yes, I'm also referring to extension headers(plural) here.

Whether it's one or more layers of headers for IPV6, as long as the driver can
recognize the end of the IPV6 extension header, we split the IPV6 header once.
E.g. For ETH-IPV6-IPV6 extension-UDP-payload, we split it to ETH-IPV6-IPV6-extension,
UDP and payload with Rx segment RTE_PTYPE_IPV6, RTE_PTYPE_UDP configured.

This is the behavior I hope to define for IPV6 extension(also our device's behavior).
We don't specify the IPV6 extension header in the RX segment. If IPV6 split is configured,
the IPV6 and extension header(if have) will be tread as a layer.

> 
> >
> > Hope to get your insights.
> 
> Unfortunately I have no solutions. Just questions to be answered...
> 
> >
> >>
> >>>
> >>>>>
> >>>>> Now buffet split can be configured in two modes. For length based
> >>>>> buffer split, the mp, length, offset field in Rx packet segment
> >>>>> should be configured, while the proto_hdr field should not be
> configured.
> >>>>> For protocol header based buffer split, the mp, offset, proto_hdr
> >>>>> field in Rx packet segment should be configured, while the length
> >>>>> field should not be configured.
> >>>>>
> >>>>> The split limitations imposed by underlying PMD is reported in the
> >>>>> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> >>>>> split parts may differ either, dpdk memory and external memory,
> >>>> respectively.
> >>>>>
> >>>>> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> >>>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> >>>>> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>>>> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> >>>>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> >>>>> ---
> >>>>>     lib/ethdev/rte_ethdev.c | 40
> >>>>> +++++++++++++++++++++++++++++++++---
> >> ----
> >>>>>     lib/ethdev/rte_ethdev.h | 28 +++++++++++++++++++++++++++-
> >>>>>     2 files changed, 60 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> >>>>> index 29a3d80466..fbd55cdd9d 100644
> >>>>> --- a/lib/ethdev/rte_ethdev.c
> >>>>> +++ b/lib/ethdev/rte_ethdev.c
> >>>>> @@ -1661,6 +1661,7 @@ rte_eth_rx_queue_check_split(const struct
> >>>> rte_eth_rxseg_split *rx_seg,
> >>>>>     		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> >>>>>     		uint32_t length = rx_seg[seg_idx].length;
> >>>>>     		uint32_t offset = rx_seg[seg_idx].offset;
> >>>>> +		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
> >>>>>
> >>>>>     		if (mpl == NULL) {
> >>>>>     			RTE_ETHDEV_LOG(ERR, "null mempool
> pointer\n");
> >>>> @@ -1694,13
> >>>>> +1695,38 @@ rte_eth_rx_queue_check_split(const struct
> >>>> rte_eth_rxseg_split *rx_seg,
> >>>>>     		}
> >>>>>     		offset += seg_idx != 0 ? 0 :
> RTE_PKTMBUF_HEADROOM;
> >>>>>     		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> >>>>> -		length = length != 0 ? length : *mbp_buf_size;
> >>>>> -		if (*mbp_buf_size < length + offset) {
> >>>>> -			RTE_ETHDEV_LOG(ERR,
> >>>>> -				       "%s mbuf_data_room_size %u < %u
> >>>> (segment length=%u + segment offset=%u)\n",
> >>>>> -				       mpl->name, *mbp_buf_size,
> >>>>> -				       length + offset, length, offset);
> >>>>> -			return -EINVAL;
> >>>>> +		if (proto_hdr == RTE_PTYPE_UNKNOWN) {
> >>>>> +			/* Split at fixed length. */
> >>>>> +			length = length != 0 ? length : *mbp_buf_size;
> >>>>> +			if (*mbp_buf_size < length + offset) {
> >>>>> +				RTE_ETHDEV_LOG(ERR,
> >>>>> +					"%s
> mbuf_data_room_size %u < %u
> >>>> (segment length=%u + segment offset=%u)\n",
> >>>>> +					mpl->name, *mbp_buf_size,
> >>>>> +					length + offset, length, offset);
> >>>>> +				return -EINVAL;
> >>>>> +			}
> >>>>> +		} else {
> >>>>> +			/* Split after specified protocol header. */
> >>>>> +			if (!(proto_hdr &
> >>>> RTE_BUFFER_SPLIT_PROTO_HDR_MASK)) {
> >>>>
> >>>> The condition looks suspicious. It will be true if proto_hdr has no
> >>>> single bit from the mask. I guess it is not the intent.
> >>>
> >>> Actually it is the intent... Here the mask is used to check if
> >>> proto_hdr belongs to the inner/outer L2/L3/L4 capability we defined.
> >>> And which proto_hdr is supported by the NIC will be checked in the
> >>> PMD
> >> later.
> >
> > Need to correct here. You are right, I made a bug in previous
> implementation.
> >
> >>
> >> Frankly speaking I see no value in such incomplete check if we still
> >> rely on driver. I simply see no reason to oblige the driver to
> >> support one of these protocols.
> >
> > With API, we can get the driver's capabilities first, and do the check in
> ethdev later.
> > In this way, we can finish the checks in once. Please see v9.
> >
> >>
> >>>
> >>>> I guess the condition should be
> >>>>      proto_hdr & ~RTE_BUFFER_SPLIT_PROTO_HDR_MASK i.e. there is
> >>>> unsupported bits in proto_hdr
> >>>>
> >>>> IMHO we need extra field in dev_info to report supported protocols
> >>>> to split on. Or a new API to get an array similar to ptype get.
> >>>> May be a new API is a better choice to not overload dev_info and to
> >>>> be more flexible in reporting.
> >>>
> >>> Thanks for your suggestion.
> >>> Here I hope to confirm the intent of dev_info or API to expose the
> >> supported proto_hdr of driver.
> >>> Is it for the pro_hdr check in the rte_eth_rx_queue_check_split()?
> >>> If so, could we just check whether pro_hdrs configured belongs to
> >>> L2/L3/L4 in lib, and check the capability in PMD? This is what the
> >>> current
> >> design does.
> >>
> >> Look. Application needs to know what to expect from eth device.
> >> It should know which protocols it can split on. Of course we can
> >> enforce application to use try-fail approach which would make sense
> >> if we have dedicated API to request Rx buffer split, but since it is
> >> done via Rx queue configuration, it could be tricky for application
> >> to realize which part of the configuration is wrong. It could simply
> >> result in a too many retries with different configuration.
> >
> > Agree. To avoid the unnecessary try-fails, I will add a new API in
> > dev_ops, please see v9.
> >
> >>
> >> I.e. the information should be used by ethdev to validate request and
> >> the information should be ued by the application to understand what
> >> is supported.
> >>
> >>>
> >>> Actually I have another question, do we need a API or dev_info to
> >>> expose
> >> which buffer split the driver supports.
> >>> i.e. length based or proto_hdr based. Because it requires different
> >>> fields to be configured in RX packet segment.
> >>
> >> See above. If dedicated API return -ENOTSUP or empty set of supported
> >> protocols to split on, the answer is clear.
> >
> > Get your point. I totally agree with your idea of a new API.
> > Through the API, the logic will look like:
> > ret = rte_get_supported_buffer_split_proto();
> > 	if (ret == -ENOTSUP)
> > 		Check length based buffer split
> > 	else
> > 		Checked proto based buffer split
> >
> > We don't need to care about the irrelevant fields in respective buffer
> > split anymore.
> >
> > BTW, could you help to review the deprecation notice for header split?
> > If it gets acked, I will start the deprecation in 22.11.
> 
> Since the feature is definitely dead. I'd use faster track.
> Deprecate in 22.07 and remove in 22.11.

Does this mean I don't need to do the remove work? I'm also willing to help.
A deprecation notice has been sent in 22.07.
http://patchwork.dpdk.org/project/dpdk/patch/20220523142016.44451-1-xuan.ding@intel.com/

Thanks,
Xuan

> 
> >
> > Thanks,
> > Xuan
> >
> >>
> >>>
> >>> Hope to get your insights. :)
> >>>
> >>>>
> >>>>> +				RTE_ETHDEV_LOG(ERR,
> >>>>> +					"Protocol header %u not
> >>>> supported)\n",
> >>>>> +					proto_hdr);
> >>>>
> >>>> I think it would be useful to log unsupported bits only, if we say so.
> >>>
> >>> The same as above.
> >>> Thanks again for your time.
> >>>
> >>> Regards,
> >>> Xuan


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 0/4] add an api to support proto based buffer split
  2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
                   ` (9 preceding siblings ...)
  2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
@ 2022-06-13 10:25 ` wenxuanx.wu
  2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
                     ` (4 more replies)
  10 siblings, 5 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-13 10:25 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Protocol type based buffer split consists of splitting a received packet into
several separate segments based on the packet content. It is useful in some
scenarios, such as GPU acceleration. The splitting will help to enable
true zero copy and hence improve the performance significantly.

This patchset aims to support protocol header split based on current buffer
split. When Rx queue is configured with RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
offload and corresponding protocol, packets received will be directly split
into different mempools.

v8->v9:
* Introduce a new api rte_eth_supported_hdrs_get to retrieve supported
  ptypes mask of a pmd to split.
* Fix header protocol split check.
* Support header protocol configuration of rxhdrs by default, e.g.
  ipv4, ipv6, mac, inner_mac, outer_mac, l3, l4.
* Refine doc.

v7->v8:
* Refine ethdev doc.
* Fix header protocol split check.

v6->v7:
* Fix supported header protocol check.
* Add rxhdrs commands and parameters.

v5->v6:
* The header split deprecation notice is sent.
* Refine the documents, protocol header based buffer split can actually
  support multi-segment split.
* Add buffer split protocol header capability.
* Fix some format issues.

v4->v5:
* Use protocol and mbuf_offset based buffer split instead of header split.
* Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
* Improve the description of rte_eth_rxseg_split.proto.

v3->v4:
* Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.

v2->v3:
* Fix a PMD bug.
* Add rx queue header split check.
* Revise the log and doc.

v1->v2:
* Add support for all header split protocol types.

Wenxuan Wu (4):
  ethdev: introduce protocol header API
  ethdev: introduce protocol hdr based buffer split
  app/testpmd: add rxhdrs commands and parameters
  net/ice: support buffer split in Rx path

 app/test-pmd/cmdline.c                 | 133 ++++++++++++++-
 app/test-pmd/config.c                  |  75 +++++++++
 app/test-pmd/parameters.c              |  15 +-
 app/test-pmd/testpmd.c                 |   6 +-
 app/test-pmd/testpmd.h                 |   6 +
 doc/guides/rel_notes/release_22_07.rst |   2 +
 drivers/net/ice/ice_ethdev.c           |  38 ++++-
 drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h             |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
 lib/ethdev/ethdev_driver.h             |  18 ++
 lib/ethdev/rte_ethdev.c                |  61 +++++--
 lib/ethdev/rte_ethdev.h                |  36 +++-
 lib/ethdev/version.map                 |   3 +
 14 files changed, 582 insertions(+), 50 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
@ 2022-06-13 10:25   ` wenxuanx.wu
  2022-07-07  9:05     ` Thomas Monjalon
  2022-07-08 15:00     ` Andrew Rybchenko
  2022-06-13 10:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-13 10:25 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch added new ethdev API to retrieve supported protocol header mask
of a PMD, which helps to configure protocol header based buffer split.

Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
---
 doc/guides/rel_notes/release_22_07.rst |  2 ++
 lib/ethdev/ethdev_driver.h             | 18 +++++++++++++++
 lib/ethdev/rte_ethdev.c                | 31 +++++++++++++++++---------
 lib/ethdev/rte_ethdev.h                | 22 ++++++++++++++++++
 lib/ethdev/version.map                 |  3 +++
 5 files changed, 65 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..a9b8ed3494 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -54,7 +54,9 @@ New Features
      This section is a comment. Do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =======================================================
+* **Added new ethdev API for PMD to get buffer split supported protocol types.**
 
+  Added ``rte_eth_supported_hdrs_get()``, to get supported header protocol mask of a PMD to split.
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 69d9dc21d8..7b19842582 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1054,6 +1054,21 @@ typedef int (*eth_ip_reassembly_conf_get_t)(struct rte_eth_dev *dev,
 typedef int (*eth_ip_reassembly_conf_set_t)(struct rte_eth_dev *dev,
 		const struct rte_eth_ip_reassembly_params *conf);
 
+/**
+ * @internal
+ * Get supported protocol flags of a PMD to split.
+ *
+ * @param dev
+ *   ethdev handle of port.
+ *
+ * @param[out]  ptype mask
+ *   supported ptype mask of a PMD.
+ *
+ * @return
+ *   Negative errno value on error, zero on success.
+ */
+typedef int (*eth_buffer_split_hdr_ptype_get_t)(struct rte_eth_dev *dev, uint32_t *ptype);
+
 /**
  * @internal
  * Dump private info from device to a file.
@@ -1281,6 +1296,9 @@ struct eth_dev_ops {
 	/** Set IP reassembly configuration */
 	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
 
+	/** Get supported ptypes to split */
+	eth_buffer_split_hdr_ptype_get_t hdrs_supported_ptypes_get;
+
 	/** Dump private info from device */
 	eth_dev_priv_dump_t eth_dev_priv_dump;
 };
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 29a3d80466..e1f2a0ffe3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1636,9 +1636,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
 }
 
 static int
-rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
-			     uint16_t n_seg, uint32_t *mbp_buf_size,
-			     const struct rte_eth_dev_info *dev_info)
+rte_eth_rx_queue_check_split(uint16_t port_id,
+				const struct rte_eth_rxseg_split *rx_seg,
+				int16_t n_seg, uint32_t *mbp_buf_size,
+			    const struct rte_eth_dev_info *dev_info)
 {
 	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
 	struct rte_mempool *mp_first;
@@ -1694,13 +1695,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-		length = length != 0 ? length : *mbp_buf_size;
-		if (*mbp_buf_size < length + offset) {
-			RTE_ETHDEV_LOG(ERR,
-				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
-				       mpl->name, *mbp_buf_size,
-				       length + offset, length, offset);
-			return -EINVAL;
+
 		}
 	}
 	return 0;
@@ -1779,7 +1774,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		n_seg = rx_conf->rx_nseg;
 
 		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
-			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
+			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
 							   &mbp_buf_size,
 							   &dev_info);
 			if (ret != 0)
@@ -5844,6 +5839,20 @@ rte_eth_ip_reassembly_conf_set(uint16_t port_id,
 		       (*dev->dev_ops->ip_reassembly_conf_set)(dev, conf));
 }
 
+int
+rte_eth_supported_hdrs_get(uint16_t port_id, uint32_t *ptypes)
+{
+	struct rte_eth_dev *dev;
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hdrs_supported_ptypes_get,
+				-ENOTSUP);
+
+	return eth_err(port_id,
+		       (*dev->dev_ops->hdrs_supported_ptypes_get)(dev, ptypes));
+}
+
 int
 rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
 {
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..72cac1518e 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6152,6 +6152,28 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get supported header protocols to split supported by PMD.
+ * The API will return error if the device is not valid.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param ptype
+ *   Supported protocol headers of driver.
+ * @return
+ *   - (-ENOTSUP) if header protocol is not supported by device.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (0) on success.
+ */
+__rte_experimental
+int rte_eth_supported_hdrs_get(uint16_t port_id,
+		uint32_t *ptype);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 20391ab29e..7705c0364a 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -279,6 +279,9 @@ EXPERIMENTAL {
 	rte_flow_async_action_handle_create;
 	rte_flow_async_action_handle_destroy;
 	rte_flow_async_action_handle_update;
+
+	# added in 22.07
+	rte_eth_supported_hdrs_get;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
  2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
@ 2022-06-13 10:25   ` wenxuanx.wu
  2022-07-07  9:07     ` Thomas Monjalon
  2022-07-08 15:00     ` Andrew Rybchenko
  2022-06-13 10:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-13 10:25 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang, Ray Kinsella

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do split
based on protocol headers. Given an arbitrarily variable length in Rx
packet segment, it is almost impossible to pass a fixed protocol header to
driver. Besides, the existence of tunneling results in the composition of
a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header based
buffer split. A new proto_hdr field is introduced in the reserved field
of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
field defines the split position of packet, splitting will always happens
after the protocol header defined in the Rx packet segment. When Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
protocol header is configured, driver will split the ingress packets into
multiple segments.

struct rte_eth_rxseg_split {

        struct rte_mempool *mp; /* memory pools to allocate segment from */
        uint16_t length; /* segment maximal data length,
                            configures "split point" */
        uint16_t offset; /* data offset from beginning
                            of mbuf data buffer */
        uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
			       configures "split point" */
    };

If both inner and outer L2/L3/L4 level protocol header split can be
supported by a PMD. Corresponding protocol header capability is
RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
    seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
    seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
    seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - udp header @ 128 in mbuf from pool1
    seg2 - payload @ 0 in mbuf from pool2

Now buffer split can be configured in two modes. For length based
buffer split, the mp, length, offset field in Rx packet segment should
be configured, while the proto_hdr field should not be configured.
For protocol header based buffer split, the mp, offset, proto_hdr field
in Rx packet segment should be configured, while the length field should
not be configured.

The split limitations imposed by underlying driver is reported in the
rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
parts may differ either, dpdk memory and external memory, respectively.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
 lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index e1f2a0ffe3..b89e30296f 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1662,6 +1662,7 @@ rte_eth_rx_queue_check_split(uint16_t port_id,
 		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
 		uint32_t length = rx_seg[seg_idx].length;
 		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
 
 		if (mpl == NULL) {
 			RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
@@ -1695,7 +1696,36 @@ rte_eth_rx_queue_check_split(uint16_t port_id,
 		}
 		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
 		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
-
+		uint32_t ptypes_mask;
+		int ret = rte_eth_supported_hdrs_get(port_id, &ptypes_mask);
+
+		if (ret == ENOTSUP || ptypes_mask == RTE_PTYPE_UNKNOWN) {
+			/* Split at fixed length. */
+			length = length != 0 ? length : *mbp_buf_size;
+			if (*mbp_buf_size < length + offset) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
+					mpl->name, *mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		} else if (ret == 0) {
+			/* Split after specified protocol header. */
+			if (proto_hdr & ~ptypes_mask) {
+				RTE_ETHDEV_LOG(ERR,
+					"Protocol header 0x%x is not supported.\n",
+					proto_hdr & ~ptypes_mask);
+				return -EINVAL;
+			}
+			if (*mbp_buf_size < offset) {
+				RTE_ETHDEV_LOG(ERR,
+						"%s mbuf_data_room_size %u < %u segment offset)\n",
+						mpl->name, *mbp_buf_size,
+						offset);
+				return -EINVAL;
+			}
+		} else {
+			return ret;
 		}
 	}
 	return 0;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 72cac1518e..7df40f9f9b 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
  *   specified in the first array element, the second buffer, from the
  *   pool in the second element, and so on.
  *
+ * - The proto_hdrs in the elements define the split position of
+ *   received packets.
+ *
  * - The offsets from the segment description elements specify
  *   the data offset from the buffer beginning except the first mbuf.
  *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
@@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
  *     - pool from the last valid element
  *     - the buffer size from this pool
  *     - zero offset
+ *
+ * - Length based buffer split:
+ *     - mp, length, offset should be configured.
+ *     - The proto_hdr field should not be configured.
+ *
+ * - Protocol header based buffer split:
+ *     - mp, offset, proto_hdr should be configured.
+ *     - The length field should not be configured.
  */
 struct rte_eth_rxseg_split {
 	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
 	uint16_t length; /**< Segment data length, configures split point. */
 	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
-	uint32_t reserved; /**< Reserved field. */
+	/**< Supported ptypes mask of a specific pmd, configures split point. */
+	uint32_t proto_hdr;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
  2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
  2022-06-13 10:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
@ 2022-06-13 10:25   ` wenxuanx.wu
  2022-06-13 10:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path wenxuanx.wu
  2022-06-21  8:56   ` [PATCH v9 0/4] add an api to support proto based buffer split Ding, Xuan
  4 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-13 10:25 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

Add command line parameter:
--rxhdrs=mac,[ipv4,udp]

Set the protocol_hdr of segments to scatter packets on receiving if
split feature is engaged. And the queues with BUFFER_SPLIT flag.

Add interactive mode command:
testpmd>set rxhdrs mac,ipv4,l3,tcp,udp,sctp
(protocol sequence should be valid)

The protocol split feature is off by default. To enable protocol split,
you need:
1. Start testpmd with two mempools. E.g. --mbuf-size=2048,2048
2. Configure Rx queue with rx_offload buffer split on.
3. Set the protocol type of buffer split. E.g. set rxhdrs mac,ipv4
	(default protocols of testpmd : mac|icmp|ipv4|ipv6|l3|tcp|udp|
				sctp|l4|inner_mac|inner_ipv4|inner_ipv6|
				inner_l3|inner_tcp|inner_udp|inner_sctp|
				inner_l4)
Above protocols can be configured in testpmd. But the configuration can
only be applied when it is supported by specific pmd.

Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/cmdline.c    | 133 +++++++++++++++++++++++++++++++++++++-
 app/test-pmd/config.c     |  75 +++++++++++++++++++++
 app/test-pmd/parameters.c |  15 ++++-
 app/test-pmd/testpmd.c    |   6 +-
 app/test-pmd/testpmd.h    |   6 ++
 5 files changed, 229 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6ffea8e21a..474235bc91 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -316,6 +316,15 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Affects only the queues configured with split"
 			" offloads.\n\n"
 
+			"set rxhdrs (mac[,ipv4])*\n"
+			"	Set the protocol hdr of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+			"	Supported proto header: mac|ipv4||qinq|gre|ipv6|l3|tcp|udp|sctp|l4|"
+			"inner_mac|inner_ipv4|inner_ipv6|inner_l3|inner_tcp|"
+			"inner_udp|inner_sctp\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3617,6 +3626,78 @@ cmdline_parse_inst_t cmd_stop = {
 	},
 };
 
+static unsigned int
+get_ptype(char *value)
+{
+	uint32_t protocol;
+	if (!strcmp(value, "mac"))
+		protocol = RTE_PTYPE_L2_ETHER;
+	else if (!strcmp(value, "ipv4"))
+		protocol = RTE_PTYPE_L3_IPV4;
+	else if (!strcmp(value, "ipv6"))
+		protocol = RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "l3"))
+		protocol = RTE_PTYPE_L3_IPV4|RTE_PTYPE_L3_IPV6;
+	else if (!strcmp(value, "tcp"))
+		protocol = RTE_PTYPE_L4_TCP;
+	else if (!strcmp(value, "udp"))
+		protocol = RTE_PTYPE_L4_UDP;
+	else if (!strcmp(value, "sctp"))
+		protocol = RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "l4"))
+		protocol = RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP;
+	else if (!strcmp(value, "inner_mac"))
+		protocol = RTE_PTYPE_INNER_L2_ETHER;
+	else if (!strcmp(value, "inner_ipv4"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4;
+	else if (!strcmp(value, "inner_ipv6"))
+		protocol = RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_l3"))
+		protocol = RTE_PTYPE_INNER_L3_IPV4|RTE_PTYPE_INNER_L3_IPV6;
+	else if (!strcmp(value, "inner_tcp"))
+		protocol = RTE_PTYPE_INNER_L4_TCP;
+	else if (!strcmp(value, "inner_udp"))
+		protocol = RTE_PTYPE_INNER_L4_UDP;
+	else if (!strcmp(value, "inner_sctp"))
+		protocol = RTE_PTYPE_INNER_L4_SCTP;
+	else if (!strcmp(value, "unknown"))
+		protocol = RTE_PTYPE_UNKNOWN;
+	else if (!strcmp(value, "gre"))
+		protocol = RTE_PTYPE_TUNNEL_GRE;
+	else if (!strcmp(value, "qinq"))
+		protocol = RTE_PTYPE_L2_ETHER_QINQ;
+	else {
+		fprintf(stderr, "Unsupported protocol name: %s\n", value);
+		return 0;
+	}
+	return protocol;
+}
+/* *** SET RXHDRSLIST *** */
+
+unsigned int
+parse_hdrs_list(const char *str, const char *item_name, unsigned int max_items,
+				unsigned int *parsed_items, int check_hdrs_sequence)
+{
+	unsigned int nb_item;
+	char *cur;
+	char *tmp;
+	nb_item = 0;
+	char *str2 = strdup(str);
+	cur = strtok_r(str2, ",", &tmp);
+	while (cur != NULL) {
+		parsed_items[nb_item] = get_ptype(cur);
+		cur = strtok_r(NULL, ",", &tmp);
+		nb_item++;
+	}
+	if (nb_item > max_items)
+		fprintf(stderr, "Number of %s = %u > %u (maximum items)\n",
+			item_name, nb_item + 1, max_items);
+	set_rx_pkt_hdrs(parsed_items, nb_item);
+	free(str2);
+	if (!check_hdrs_sequence)
+		return nb_item;
+	return nb_item;
+}
 /* *** SET CORELIST and PORTLIST CONFIGURATION *** */
 
 unsigned int
@@ -3986,6 +4067,49 @@ cmdline_parse_inst_t cmd_set_rxpkts = {
 	},
 };
 
+/* *** SET SEGMENT HEADERS OF RX PACKETS SPLIT *** */
+struct cmd_set_rxhdrs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxhdrs;
+	cmdline_fixed_string_t seg_hdrs;
+};
+
+static void
+cmd_set_rxhdrs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxhdrs_result *res;
+	unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_hdrs_list(res->seg_hdrs, "segment hdrs",
+				  MAX_SEGS_BUFFER_SPLIT, seg_hdrs, 0);
+	if (nb_segs >= 1)
+		set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+	cmd_reconfig_device_queue(RTE_PORT_ALL, 0, 1);
+}
+cmdline_parse_token_string_t cmd_set_rxhdrs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxhdrs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 rxhdrs, "rxhdrs");
+cmdline_parse_token_string_t cmd_set_rxhdrs_seg_hdrs =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxhdrs_result,
+				 seg_hdrs, NULL);
+cmdline_parse_inst_t cmd_set_rxhdrs = {
+	.f = cmd_set_rxhdrs_parsed,
+	.data = NULL,
+	.help_str = "set rxhdrs <mac[,ipv4]*>",
+	.tokens = {
+		(void *)&cmd_set_rxhdrs_keyword,
+		(void *)&cmd_set_rxhdrs_name,
+		(void *)&cmd_set_rxhdrs_seg_hdrs,
+		NULL,
+	},
+};
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -8058,6 +8182,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
+	else if (!strcmp(res->what, "rxhdrs"))
+		show_rx_pkt_hdrs();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -8070,12 +8196,12 @@ cmdline_parse_token_string_t cmd_showcfg_port =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#rxhdrs#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|rxhdrs|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -17833,6 +17959,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
+	(cmdline_parse_inst_t *)&cmd_set_rxhdrs,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cc8e7aa138..90ac5cfa68 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4757,6 +4757,81 @@ show_rx_pkt_segments(void)
 		printf("%hu\n", rx_pkt_seg_lengths[i]);
 	}
 }
+static const char *get_ptype_str(uint32_t ptype)
+{
+	switch (ptype) {
+	case RTE_PTYPE_INNER_L2_ETHER_QINQ:
+		return "qinq";
+	case RTE_PTYPE_TUNNEL_GRE:
+		return "gre";
+	case RTE_PTYPE_UNKNOWN:
+		return "unknown";
+	case RTE_PTYPE_L2_ETHER:
+		return "outer_mac";
+	case RTE_PTYPE_L3_IPV4:
+		return "ipv4";
+	case RTE_PTYPE_L3_IPV6:
+		return "ipv6";
+	case RTE_PTYPE_L3_IPV6|RTE_PTYPE_L3_IPV4:
+		return "ip";
+	case RTE_PTYPE_L4_TCP:
+		return "tcp";
+	case RTE_PTYPE_L4_UDP:
+		return "udp";
+	case RTE_PTYPE_L4_SCTP:
+		return "sctp";
+	case RTE_PTYPE_L4_TCP|RTE_PTYPE_L4_UDP|RTE_PTYPE_L4_SCTP:
+		return "l4";
+	case RTE_PTYPE_INNER_L2_ETHER:
+		return "inner_mac";
+	case RTE_PTYPE_INNER_L3_IPV4:
+		return "inner_ipv4";
+	case RTE_PTYPE_INNER_L3_IPV6:
+		return "inner_ipv6";
+	case RTE_PTYPE_INNER_L4_TCP:
+		return "inner_tcp";
+	case RTE_PTYPE_INNER_L4_UDP:
+		return "inner_udp";
+	case RTE_PTYPE_INNER_L4_SCTP:
+		return "inner_sctp";
+	default:
+		return "unsupported";
+	}
+}
+void
+show_rx_pkt_hdrs(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Packet segs: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%s, ", get_ptype_str(rx_pkt_hdr_protos[i]));
+		printf("%s\n", rx_pkt_hdr_protos[i] == 0 ? "payload" :
+						get_ptype_str(rx_pkt_hdr_protos[i]));
+	}
+}
+void
+set_rx_pkt_hdrs(unsigned int *seg_hdrs, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_hdr_protos[i] = (uint32_t) seg_hdrs[i];
+	/*
+	 * We calculate the number of hdrs, but payload is not included,
+	 * so rx_pkt_nb_segs would increase 1.
+	 */
+	rx_pkt_nb_segs = (rx_pkt_nb_segs == 0) ? (uint8_t) nb_segs + 1 : rx_pkt_nb_segs;
+}
 
 void
 set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index daf6a31b2b..f86d626276 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -161,6 +161,7 @@ usage(char* progname)
 	       " Used mainly with PCAP drivers.\n");
 	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
+	printf("  --rxhdrs=mac[,ipv4]*: set RX segment protocol to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -673,6 +674,7 @@ launch_args_parse(int argc, char** argv)
 		{ "flow-isolate-all",	        0, 0, 0 },
 		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
+		{ "rxhdrs",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "rxq-share",			2, 0, 0 },
@@ -1327,7 +1329,6 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
-
 				nb_segs = parse_item_list
 						(optarg, "rxpkt segments",
 						 MAX_SEGS_BUFFER_SPLIT,
@@ -1337,6 +1338,18 @@ launch_args_parse(int argc, char** argv)
 				else
 					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxhdrs")) {
+				unsigned int seg_hdrs[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+				nb_segs = parse_hdrs_list
+						(optarg, "rxpkt segments",
+						MAX_SEGS_BUFFER_SPLIT,
+						seg_hdrs, 0);
+				if (nb_segs >= 1)
+					set_rx_pkt_hdrs(seg_hdrs, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe2ce19f99..ef679c70be 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -240,6 +240,7 @@ uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
+uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -2587,11 +2588,12 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		mpx = mbuf_pool_find(socket_id, mp_n);
 		/* Handle zero as mbuf data buffer size. */
 		rx_seg->length = rx_pkt_seg_lengths[i] ?
-				   rx_pkt_seg_lengths[i] :
-				   mbuf_data_size[mp_n];
+					rx_pkt_seg_lengths[i] :
+					mbuf_data_size[mp_n];
 		rx_seg->offset = i < rx_pkt_nb_offs ?
 				   rx_pkt_seg_offsets[i] : 0;
 		rx_seg->mp = mpx ? mpx : mp;
+		rx_seg->proto_hdr = rx_pkt_hdr_protos[i];
 	}
 	rx_conf->rx_nseg = rx_pkt_nb_segs;
 	rx_conf->rx_seg = rx_useg;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 31f766c965..e791b9becd 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -534,6 +534,7 @@ extern uint32_t max_rx_pkt_len;
  * Configuration of packet segments used to scatter received packets
  * if some of split features is configured.
  */
+extern uint32_t rx_pkt_hdr_protos[MAX_SEGS_BUFFER_SPLIT];
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
 extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
@@ -864,6 +865,9 @@ inc_tx_burst_stats(struct fwd_stream *fs, uint16_t nb_tx)
 unsigned int parse_item_list(const char *str, const char *item_name,
 			unsigned int max_items,
 			unsigned int *parsed_items, int check_unique_values);
+unsigned int parse_hdrs_list(const char *str, const char *item_name,
+			unsigned int max_item,
+			unsigned int *parsed_items, int check_unique_values);
 void launch_args_parse(int argc, char** argv);
 void cmdline_read_from_file(const char *filename);
 void prompt(void);
@@ -1018,6 +1022,8 @@ void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void set_rx_pkt_hdrs(unsigned int *seg_protos, unsigned int nb_segs);
+void show_rx_pkt_hdrs(void);
 void show_rx_pkt_segments(void);
 void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
 void show_rx_pkt_offsets(void);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v9 4/4] net/ice: support buffer split in Rx path
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
                     ` (2 preceding siblings ...)
  2022-06-13 10:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
@ 2022-06-13 10:25   ` wenxuanx.wu
  2022-06-21  8:56   ` [PATCH v9 0/4] add an api to support proto based buffer split Ding, Xuan
  4 siblings, 0 replies; 88+ messages in thread
From: wenxuanx.wu @ 2022-06-13 10:25 UTC (permalink / raw)
  To: thomas, andrew.rybchenko, xiaoyun.li, ferruh.yigit,
	aman.deep.singh, dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Wenxuan Wu, Xuan Ding, Yuan Wang

From: Wenxuan Wu <wenxuanx.wu@intel.com>

This patch adds support for protocol based buffer split in normal Rx
data paths. When the Rx queue is configured with specific protocol type,
packets received will be directly split into protocol header and
payload parts limitation of ice pmd. And the two parts will be
put into different mempools.

Currently, protocol based buffer split is not supported in vectorized
paths.

A new api ice_get_supported_split_hdrs() has been introduced, it will
return the supported header protocols of ice PMD to app for splitting.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
---
 drivers/net/ice/ice_ethdev.c          |  38 ++++-
 drivers/net/ice/ice_rxtx.c            | 220 ++++++++++++++++++++++----
 drivers/net/ice/ice_rxtx.h            |  16 ++
 drivers/net/ice/ice_rxtx_vec_common.h |   3 +
 4 files changed, 245 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 73e550f5fb..dcd4ad2eb4 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -169,6 +169,8 @@ static int ice_timesync_read_time(struct rte_eth_dev *dev,
 static int ice_timesync_write_time(struct rte_eth_dev *dev,
 				   const struct timespec *timestamp);
 static int ice_timesync_disable(struct rte_eth_dev *dev);
+static int ice_get_supported_split_hdrs(struct rte_eth_dev *dev,
+					uint32_t *ptypes);
 
 static const struct rte_pci_id pci_id_ice_map[] = {
 	{ RTE_PCI_DEVICE(ICE_INTEL_VENDOR_ID, ICE_DEV_ID_E823L_BACKPLANE) },
@@ -267,6 +269,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
 	.timesync_read_time           = ice_timesync_read_time,
 	.timesync_write_time          = ice_timesync_write_time,
 	.timesync_disable             = ice_timesync_disable,
+	.hdrs_supported_ptypes_get    = ice_get_supported_split_hdrs,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3713,7 +3716,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 			RTE_ETH_RX_OFFLOAD_VLAN_EXTEND |
 			RTE_ETH_RX_OFFLOAD_RSS_HASH |
-			RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+			RTE_ETH_RX_OFFLOAD_TIMESTAMP |
+			RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 		dev_info->tx_offload_capa |=
 			RTE_ETH_TX_OFFLOAD_QINQ_INSERT |
 			RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
@@ -3725,7 +3729,7 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->flow_type_rss_offloads |= ICE_RSS_OFFLOAD_ALL;
 	}
 
-	dev_info->rx_queue_offload_capa = 0;
+	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT;
 	dev_info->tx_queue_offload_capa = RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
 
 	dev_info->reta_size = pf->hash_lut_size;
@@ -3794,6 +3798,11 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_rxportconf.ring_size = ICE_BUF_SIZE_MIN;
 	dev_info->default_txportconf.ring_size = ICE_BUF_SIZE_MIN;
 
+	dev_info->rx_seg_capa.max_nseg = ICE_RX_MAX_NSEG;
+	dev_info->rx_seg_capa.multi_pools = 1;
+	dev_info->rx_seg_capa.offset_allowed = 0;
+	dev_info->rx_seg_capa.offset_align_log2 = 0;
+
 	return 0;
 }
 
@@ -5840,6 +5849,31 @@ ice_timesync_disable(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+ice_get_supported_split_hdrs(struct rte_eth_dev *dev, uint32_t *ptypes)
+{
+	if (!dev)
+		return -EINVAL;
+/* Buffer split protocol header capability. */
+#define RTE_BUFFER_SPLIT_PROTO_HDR_MASK ( \
+	RTE_PTYPE_L2_ETHER | \
+	RTE_PTYPE_L3_IPV4 | \
+	RTE_PTYPE_L3_IPV6 | \
+	RTE_PTYPE_L4_TCP | \
+	RTE_PTYPE_L4_UDP | \
+	RTE_PTYPE_L4_SCTP | \
+	RTE_PTYPE_INNER_L2_ETHER | \
+	RTE_PTYPE_INNER_L3_IPV4 | \
+	RTE_PTYPE_INNER_L3_IPV6 | \
+	RTE_PTYPE_INNER_L4_TCP | \
+	RTE_PTYPE_INNER_L4_UDP | \
+	RTE_PTYPE_INNER_L4_SCTP)
+
+	*ptypes = RTE_BUFFER_SPLIT_PROTO_HDR_MASK;
+
+	return 0;
+}
+
 static int
 ice_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	      struct rte_pci_device *pci_dev)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 2dd2637fbb..47ef5bbe35 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -282,7 +282,6 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	/* Set buffer size as the head split is disabled. */
 	buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mp) -
 			      RTE_PKTMBUF_HEADROOM);
-	rxq->rx_hdr_len = 0;
 	rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
 	rxq->max_pkt_len =
 		RTE_MIN((uint32_t)ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
@@ -311,11 +310,53 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
 
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		switch (rxq->rxseg[0].proto_hdr) {
+		case RTE_PTYPE_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_1 = ICE_RLAN_RX_HSPLIT_1_SPLIT_L2;
+			break;
+		case RTE_PTYPE_INNER_L2_ETHER:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_L2;
+			break;
+		case RTE_PTYPE_L3_IPV4:
+		case RTE_PTYPE_L3_IPV6:
+		case RTE_PTYPE_INNER_L3_IPV4:
+		case RTE_PTYPE_INNER_L3_IPV6:
+		case RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L3_IPV6:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_IP;
+			break;
+		case RTE_PTYPE_L4_TCP:
+		case RTE_PTYPE_L4_UDP:
+		case RTE_PTYPE_INNER_L4_TCP:
+		case RTE_PTYPE_INNER_L4_UDP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_TCP_UDP;
+			break;
+		case RTE_PTYPE_L4_SCTP:
+		case RTE_PTYPE_INNER_L4_SCTP:
+			rx_ctx.dtype = ICE_RX_DTYPE_HEADER_SPLIT;
+			rx_ctx.hsplit_0 = ICE_RLAN_RX_HSPLIT_0_SPLIT_SCTP;
+			break;
+		case RTE_PTYPE_UNKNOWN:
+			PMD_DRV_LOG(ERR, "Buffer split protocol must be configured");
+			return -EINVAL;
+		default:
+			PMD_DRV_LOG(ERR, "Buffer split protocol is not supported");
+			return -EINVAL;
+		}
+		rxq->rx_hdr_len = ICE_RX_HDR_BUF_SIZE;
+	} else {
+		rxq->rx_hdr_len = 0;
+		rx_ctx.dtype = 0; /* No Protocol Based Buffer Split mode */
+	}
+
 	rx_ctx.base = rxq->rx_ring_dma / ICE_QUEUE_BASE_ADDR_UNIT;
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 	rx_ctx.dsize = 1; /* 32B descriptors */
 #endif
@@ -401,6 +442,7 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		volatile union ice_rx_flex_desc *rxd;
+		rxd = &rxq->rx_ring[i];
 		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
 
 		if (unlikely(!mbuf)) {
@@ -408,8 +450,6 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 			return -ENOMEM;
 		}
 
-		rte_mbuf_refcnt_set(mbuf, 1);
-		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
@@ -417,9 +457,33 @@ ice_alloc_rx_queue_mbufs(struct ice_rx_queue *rxq)
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf));
 
-		rxd = &rxq->rx_ring[i];
-		rxd->read.pkt_addr = dma_addr;
-		rxd->read.hdr_addr = 0;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			struct rte_mbuf *mbuf_pay;
+			mbuf_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!mbuf_pay)) {
+				PMD_DRV_LOG(ERR, "Failed to allocate payload mbuf for RX");
+				return -ENOMEM;
+			}
+
+			mbuf_pay->next = NULL;
+			mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
+			mbuf_pay->nb_segs = 1;
+			mbuf_pay->port = rxq->port_id;
+			mbuf->next = mbuf_pay;
+
+			rxd->read.hdr_addr = dma_addr;
+			/* The LS bit should be set to zero regardless of
+			 * buffer split enablement.
+			 */
+			rxd->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mbuf_pay));
+
+		} else {
+			rte_mbuf_refcnt_set(mbuf, 1);
+			mbuf->next = NULL;
+			rxd->read.hdr_addr = 0;
+			rxd->read.pkt_addr = dma_addr;
+		}
+
 #ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC
 		rxd->read.rsvd1 = 0;
 		rxd->read.rsvd2 = 0;
@@ -443,14 +507,14 @@ _ice_rx_queue_release_mbufs(struct ice_rx_queue *rxq)
 
 	for (i = 0; i < rxq->nb_rx_desc; i++) {
 		if (rxq->sw_ring[i].mbuf) {
-			rte_pktmbuf_free_seg(rxq->sw_ring[i].mbuf);
+			rte_pktmbuf_free(rxq->sw_ring[i].mbuf);
 			rxq->sw_ring[i].mbuf = NULL;
 		}
 	}
 	if (rxq->rx_nb_avail == 0)
 		return;
 	for (i = 0; i < rxq->rx_nb_avail; i++)
-		rte_pktmbuf_free_seg(rxq->rx_stage[rxq->rx_next_avail + i]);
+		rte_pktmbuf_free(rxq->rx_stage[rxq->rx_next_avail + i]);
 
 	rxq->rx_nb_avail = 0;
 }
@@ -742,7 +806,7 @@ ice_fdir_program_hw_rx_queue(struct ice_rx_queue *rxq)
 	rx_ctx.qlen = rxq->nb_rx_desc;
 	rx_ctx.dbuf = rxq->rx_buf_len >> ICE_RLAN_CTX_DBUF_S;
 	rx_ctx.hbuf = rxq->rx_hdr_len >> ICE_RLAN_CTX_HBUF_S;
-	rx_ctx.dtype = 0; /* No Header Split mode */
+	rx_ctx.dtype = 0; /* No Buffer Split mode */
 	rx_ctx.dsize = 1; /* 32B descriptors */
 	rx_ctx.rxmax = ICE_ETH_MAX_LEN;
 	/* TPH: Transaction Layer Packet (TLP) processing hints */
@@ -1076,6 +1140,7 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len;
 	int use_def_burst_func = 1;
 	uint64_t offloads;
+	uint16_t n_seg = rx_conf->rx_nseg;
 
 	if (nb_desc % ICE_ALIGN_RING_DESC != 0 ||
 	    nb_desc > ICE_MAX_RING_DESC ||
@@ -1087,6 +1152,17 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 
 	offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
 
+	if (mp)
+		n_seg = 1;
+
+	if (n_seg > 1) {
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			PMD_INIT_LOG(ERR, "port %u queue index %u split offload not configured",
+					dev->data->port_id, queue_idx);
+			return -EINVAL;
+		}
+	}
+
 	/* Free memory if needed */
 	if (dev->data->rx_queues[queue_idx]) {
 		ice_rx_queue_release(dev->data->rx_queues[queue_idx]);
@@ -1098,12 +1174,22 @@ ice_rx_queue_setup(struct rte_eth_dev *dev,
 				 sizeof(struct ice_rx_queue),
 				 RTE_CACHE_LINE_SIZE,
 				 socket_id);
+
 	if (!rxq) {
 		PMD_INIT_LOG(ERR, "Failed to allocate memory for "
 			     "rx queue data structure");
 		return -ENOMEM;
 	}
-	rxq->mp = mp;
+
+	rxq->rxseg_nb = n_seg;
+	if (n_seg > 1) {
+		rte_memcpy(rxq->rxseg, rx_conf->rx_seg,
+			sizeof(struct rte_eth_rxseg_split) * n_seg);
+		rxq->mp = rxq->rxseg[0].mp;
+	} else {
+		rxq->mp = mp;
+	}
+
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -1568,7 +1654,7 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 	struct ice_rx_entry *rxep;
 	struct rte_mbuf *mb;
 	uint16_t stat_err0;
-	uint16_t pkt_len;
+	uint16_t pkt_len, hdr_len;
 	int32_t s[ICE_LOOK_AHEAD], nb_dd;
 	int32_t i, j, nb_rx = 0;
 	uint64_t pkt_flags = 0;
@@ -1623,6 +1709,24 @@ ice_rx_scan_hw_ring(struct ice_rx_queue *rxq)
 				   ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
+
+			if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+				mb->nb_segs = (uint16_t)(mb->nb_segs + mb->next->nb_segs);
+				mb->next->next = NULL;
+				hdr_len = rte_le_to_cpu_16(rxdp[j].wb.hdr_len_sph_flex_flags1) &
+						ICE_RX_FLEX_DESC_HEADER_LEN_M;
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = hdr_len;
+				mb->pkt_len = hdr_len + pkt_len;
+				mb->next->data_len = pkt_len;
+			} else {
+				pkt_len = (rte_le_to_cpu_16(rxdp[j].wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+				mb->data_len = pkt_len;
+				mb->pkt_len = pkt_len;
+			}
+
 			mb->ol_flags = 0;
 			stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
 			pkt_flags = ice_rxd_error_to_pkt_flags(stat_err0);
@@ -1714,7 +1818,9 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t alloc_idx, i;
 	uint64_t dma_addr;
-	int diag;
+	int diag, diag_pay;
+	uint64_t pay_addr;
+	struct rte_mbuf *mbufs_pay[rxq->rx_free_thresh];
 
 	/* Allocate buffers in bulk */
 	alloc_idx = (uint16_t)(rxq->rx_free_trigger -
@@ -1727,6 +1833,15 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 		return -ENOMEM;
 	}
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		diag_pay = rte_mempool_get_bulk(rxq->rxseg[1].mp,
+				(void *)mbufs_pay, rxq->rx_free_thresh);
+		if (unlikely(diag_pay != 0)) {
+			PMD_RX_LOG(ERR, "Failed to get payload mbufs in bulk");
+			return -ENOMEM;
+		}
+	}
+
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
 		if (likely(i < (rxq->rx_free_thresh - 1)))
@@ -1735,13 +1850,21 @@ ice_rx_alloc_bufs(struct ice_rx_queue *rxq)
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
-		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
-		rxdp[i].read.hdr_addr = 0;
-		rxdp[i].read.pkt_addr = dma_addr;
+
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			mb->next = mbufs_pay[i];
+			pay_addr = rte_mbuf_data_iova_default(mbufs_pay[i]);
+			rxdp[i].read.hdr_addr = dma_addr;
+			rxdp[i].read.pkt_addr = rte_cpu_to_le_64(pay_addr);
+		} else {
+			mb->next = NULL;
+			rxdp[i].read.hdr_addr = 0;
+			rxdp[i].read.pkt_addr = dma_addr;
+		}
 	}
 
 	/* Update Rx tail register */
@@ -2350,11 +2473,13 @@ ice_recv_pkts(void *rx_queue,
 	struct ice_rx_entry *sw_ring = rxq->sw_ring;
 	struct ice_rx_entry *rxe;
 	struct rte_mbuf *nmb; /* new allocated mbuf */
+	struct rte_mbuf *nmb_pay; /* new allocated payload mbuf */
 	struct rte_mbuf *rxm; /* pointer to store old mbuf in SW ring */
 	uint16_t rx_id = rxq->rx_tail;
 	uint16_t nb_rx = 0;
 	uint16_t nb_hold = 0;
 	uint16_t rx_packet_len;
+	uint16_t rx_header_len;
 	uint16_t rx_stat_err0;
 	uint64_t dma_addr;
 	uint64_t pkt_flags;
@@ -2382,12 +2507,16 @@ ice_recv_pkts(void *rx_queue,
 		if (!(rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
 			break;
 
-		/* allocate mbuf */
+		if (rx_stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_HBO_S))
+			break;
+
+		/* allocate header mbuf */
 		nmb = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(!nmb)) {
 			rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
 			break;
 		}
+
 		rxd = *rxdp; /* copy descriptor in ring to temp variable*/
 
 		nb_hold++;
@@ -2400,24 +2529,55 @@ ice_recv_pkts(void *rx_queue,
 		dma_addr =
 			rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb));
 
-		/**
-		 * fill the read format of descriptor with physic address in
-		 * new allocated mbuf: nmb
-		 */
-		rxdp->read.hdr_addr = 0;
-		rxdp->read.pkt_addr = dma_addr;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			/* allocate payload mbuf */
+			nmb_pay = rte_mbuf_raw_alloc(rxq->rxseg[1].mp);
+			if (unlikely(!nmb_pay)) {
+				rxq->vsi->adapter->pf.dev_data->rx_mbuf_alloc_failed++;
+				break;
+			}
+
+			nmb->next = nmb_pay;
+			nmb_pay->next = NULL;
 
-		/* calculate rx_packet_len of the received pkt */
-		rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
-				 ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = dma_addr;
+			rxdp->read.pkt_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
+		} else {
+			/**
+			 * fill the read format of descriptor with physic address in
+			 * new allocated mbuf: nmb
+			 */
+			rxdp->read.hdr_addr = 0;
+			rxdp->read.pkt_addr = dma_addr;
+		}
 
 		/* fill old mbuf with received descriptor: rxd */
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = rx_packet_len;
-		rxm->data_len = rx_packet_len;
+		if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+			rxm->nb_segs = (uint16_t)(rxm->nb_segs + rxm->next->nb_segs);
+			rxm->next->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_header_len = rte_le_to_cpu_16(rxd.wb.hdr_len_sph_flex_flags1) &
+					ICE_RX_FLEX_DESC_HEADER_LEN_M;
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_header_len;
+			rxm->pkt_len = rx_header_len + rx_packet_len;
+			rxm->next->data_len = rx_packet_len;
+		} else {
+			rxm->nb_segs = 1;
+			rxm->next = NULL;
+			/* calculate rx_packet_len of the received pkt */
+			rx_packet_len = (rte_le_to_cpu_16(rxd.wb.pkt_len) &
+					ICE_RX_FLX_DESC_PKT_LEN_M) - rxq->crc_len;
+			rxm->data_len = rx_packet_len;
+			rxm->pkt_len = rx_packet_len;
+		}
 		rxm->port = rxq->port_id;
 		rxm->packet_type = ptype_tbl[ICE_RX_FLEX_DESC_PTYPE_M &
 			rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index bb18a01951..611dbc8503 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -16,6 +16,9 @@
 #define ICE_RX_MAX_BURST 32
 #define ICE_TX_MAX_BURST 32
 
+/* Maximal number of segments to split. */
+#define ICE_RX_MAX_NSEG 2
+
 #define ICE_CHK_Q_ENA_COUNT        100
 #define ICE_CHK_Q_ENA_INTERVAL_US  100
 
@@ -43,6 +46,11 @@
 extern uint64_t ice_timestamp_dynflag;
 extern int ice_timestamp_dynfield_offset;
 
+/* Max header size can be 2K - 64 bytes */
+#define ICE_RX_HDR_BUF_SIZE    (2048 - 64)
+
+#define ICE_HEADER_SPLIT_ENA   BIT(0)
+
 typedef void (*ice_rx_release_mbufs_t)(struct ice_rx_queue *rxq);
 typedef void (*ice_tx_release_mbufs_t)(struct ice_tx_queue *txq);
 typedef void (*ice_rxd_to_pkt_fields_t)(struct ice_rx_queue *rxq,
@@ -53,6 +61,12 @@ struct ice_rx_entry {
 	struct rte_mbuf *mbuf;
 };
 
+enum ice_rx_dtype {
+	ICE_RX_DTYPE_NO_SPLIT       = 0,
+	ICE_RX_DTYPE_HEADER_SPLIT   = 1,
+	ICE_RX_DTYPE_SPLIT_ALWAYS   = 2,
+};
+
 struct ice_rx_queue {
 	struct rte_mempool *mp; /* mbuf pool to populate RX ring */
 	volatile union ice_rx_flex_desc *rx_ring;/* RX ring virtual address */
@@ -95,6 +109,8 @@ struct ice_rx_queue {
 	uint32_t time_high;
 	uint32_t hw_register_set;
 	const struct rte_memzone *mz;
+	struct rte_eth_rxseg_split rxseg[ICE_RX_MAX_NSEG];
+	uint32_t rxseg_nb;
 };
 
 struct ice_tx_entry {
diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h
index 2dd2d83650..eec6ea2134 100644
--- a/drivers/net/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/ice/ice_rxtx_vec_common.h
@@ -291,6 +291,9 @@ ice_rx_vec_queue_default(struct ice_rx_queue *rxq)
 	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
 		return -1;
 
+	if (rxq->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+		return -1;
+
 	if (rxq->offloads & ICE_RX_VECTOR_OFFLOAD)
 		return ICE_VECTOR_OFFLOAD_PATH;
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 0/4] add an api to support proto based buffer split
  2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
                     ` (3 preceding siblings ...)
  2022-06-13 10:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path wenxuanx.wu
@ 2022-06-21  8:56   ` Ding, Xuan
  2022-07-07  9:10     ` Thomas Monjalon
  4 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-06-21  8:56 UTC (permalink / raw)
  To: andrew.rybchenko
  Cc: thomas, Li, Xiaoyun, ferruh.yigit, dev, Zhang, Yuying, Zhang,
	Qi Z, jerinjacobk, stephen, Wu, WenxuanX

Hi Andrew,

> -----Original Message-----
> From: wenxuanx.wu@intel.com <wenxuanx.wu@intel.com>
> Sent: Monday, June 13, 2022 6:26 PM
> To: thomas@monjalon.net; andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Wu, WenxuanX
> <wenxuanx.wu@intel.com>
> Subject: [PATCH v9 0/4] add an api to support proto based buffer split
> 
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Protocol type based buffer split consists of splitting a received packet into
> several separate segments based on the packet content. It is useful in some
> scenarios, such as GPU acceleration. The splitting will help to enable true
> zero copy and hence improve the performance significantly.
> 
> This patchset aims to support protocol header split based on current buffer
> split. When Rx queue is configured with
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload and corresponding protocol,
> packets received will be directly split into different mempools.

This protocol based buffer split patch series have been updated to v9.
Sincerely thank you for the effort you put into this series.

Hope to know your considerations about this series now.
Do you think is it possible to get in 22.07? Or there are still some critical gaps need to be solved?
Because we don't hope the same thing happens in 22.11.

Thanks very much.

Regards,
Xuan

> 
> v8->v9:
> * Introduce a new api rte_eth_supported_hdrs_get to retrieve supported
>   ptypes mask of a pmd to split.
> * Fix header protocol split check.
> * Support header protocol configuration of rxhdrs by default, e.g.
>   ipv4, ipv6, mac, inner_mac, outer_mac, l3, l4.
> * Refine doc.
> 
> v7->v8:
> * Refine ethdev doc.
> * Fix header protocol split check.
> 
> v6->v7:
> * Fix supported header protocol check.
> * Add rxhdrs commands and parameters.
> 
> v5->v6:
> * The header split deprecation notice is sent.
> * Refine the documents, protocol header based buffer split can actually
>   support multi-segment split.
> * Add buffer split protocol header capability.
> * Fix some format issues.
> 
> v4->v5:
> * Use protocol and mbuf_offset based buffer split instead of header split.
> * Use RTE_PTYPE* instead of enum rte_eth_rx_header_split_protocol_type.
> * Improve the description of rte_eth_rxseg_split.proto.
> 
> v3->v4:
> * Use RTE_ETH_RX_HEADER_SPLIT_NONE instead of 0.
> 
> v2->v3:
> * Fix a PMD bug.
> * Add rx queue header split check.
> * Revise the log and doc.
> 
> v1->v2:
> * Add support for all header split protocol types.
> 
> Wenxuan Wu (4):
>   ethdev: introduce protocol header API
>   ethdev: introduce protocol hdr based buffer split
>   app/testpmd: add rxhdrs commands and parameters
>   net/ice: support buffer split in Rx path
> 
>  app/test-pmd/cmdline.c                 | 133 ++++++++++++++-
>  app/test-pmd/config.c                  |  75 +++++++++
>  app/test-pmd/parameters.c              |  15 +-
>  app/test-pmd/testpmd.c                 |   6 +-
>  app/test-pmd/testpmd.h                 |   6 +
>  doc/guides/rel_notes/release_22_07.rst |   2 +
>  drivers/net/ice/ice_ethdev.c           |  38 ++++-
>  drivers/net/ice/ice_rxtx.c             | 220 +++++++++++++++++++++----
>  drivers/net/ice/ice_rxtx.h             |  16 ++
>  drivers/net/ice/ice_rxtx_vec_common.h  |   3 +
>  lib/ethdev/ethdev_driver.h             |  18 ++
>  lib/ethdev/rte_ethdev.c                |  61 +++++--
>  lib/ethdev/rte_ethdev.h                |  36 +++-
>  lib/ethdev/version.map                 |   3 +
>  14 files changed, 582 insertions(+), 50 deletions(-)
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
@ 2022-07-07  9:05     ` Thomas Monjalon
  2022-08-01  7:09       ` Wang, YuanX
  2022-07-08 15:00     ` Andrew Rybchenko
  1 sibling, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-07-07  9:05 UTC (permalink / raw)
  To: Wenxuan Wu
  Cc: andrew.rybchenko, xiaoyun.li, ferruh.yigit, aman.deep.singh, dev,
	yuying.zhang, qi.z.zhang, jerinjacobk, stephen

13/06/2022 12:25, wenxuanx.wu@intel.com:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> This patch added new ethdev API to retrieve supported protocol header mask
> of a PMD, which helps to configure protocol header based buffer split.
> 
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> ---
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Get supported header protocols to split supported by PMD.
> + * The API will return error if the device is not valid.
> + *
> + * @param port_id
> + *   The port identifier of the device.
> + * @param ptype
> + *   Supported protocol headers of driver.

It doesn't say where to find the types.
Please give the prefix.

> + * @return
> + *   - (-ENOTSUP) if header protocol is not supported by device.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EIO) if device is removed.
> + *   - (0) on success.
> + */
> +__rte_experimental
> +int rte_eth_supported_hdrs_get(uint16_t port_id,
> +		uint32_t *ptype);

The function name is not precise enough.
There should be the word "split" in its name.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-06-13 10:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
@ 2022-07-07  9:07     ` Thomas Monjalon
  2022-07-11  9:54       ` Ding, Xuan
  2022-07-08 15:00     ` Andrew Rybchenko
  1 sibling, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-07-07  9:07 UTC (permalink / raw)
  To: wenxuanx.wu
  Cc: andrew.rybchenko, xiaoyun.li, ferruh.yigit, aman.deep.singh, dev,
	yuying.zhang, qi.z.zhang, jerinjacobk, stephen, Wenxuan Wu,
	Xuan Ding, Yuan Wang, Ray Kinsella

13/06/2022 12:25, wenxuanx.wu@intel.com:
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
>   *   specified in the first array element, the second buffer, from the
>   *   pool in the second element, and so on.
>   *
> + * - The proto_hdrs in the elements define the split position of
> + *   received packets.
> + *
>   * - The offsets from the segment description elements specify
>   *   the data offset from the buffer beginning except the first mbuf.
>   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
>   *     - pool from the last valid element
>   *     - the buffer size from this pool
>   *     - zero offset
> + *
> + * - Length based buffer split:
> + *     - mp, length, offset should be configured.
> + *     - The proto_hdr field should not be configured.
> + *
> + * - Protocol header based buffer split:
> + *     - mp, offset, proto_hdr should be configured.
> + *     - The length field should not be configured.
>   */
>  struct rte_eth_rxseg_split {
>  	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
>  	uint16_t length; /**< Segment data length, configures split point. */
>  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> -	uint32_t reserved; /**< Reserved field. */
> +	/**< Supported ptypes mask of a specific pmd, configures split point. */

The doxygen syntax is wrong: remove the "<" which is for post-comment.

> +	uint32_t proto_hdr;
>  };

How do we know it is a length or buffer split?
Is it based on checking some 0 value?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 0/4] add an api to support proto based buffer split
  2022-06-21  8:56   ` [PATCH v9 0/4] add an api to support proto based buffer split Ding, Xuan
@ 2022-07-07  9:10     ` Thomas Monjalon
  2022-07-11 10:08       ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-07-07  9:10 UTC (permalink / raw)
  To: Wu, WenxuanX, Ding, Xuan
  Cc: andrew.rybchenko, dev, Li, Xiaoyun, ferruh.yigit, dev, Zhang,
	Yuying, Zhang, Qi Z, jerinjacobk, stephen, qi.z.zhang,
	bruce.richardson, john.mcnamara

21/06/2022 10:56, Ding, Xuan:
> This protocol based buffer split patch series have been updated to v9.
> Sincerely thank you for the effort you put into this series.
> 
> Hope to know your considerations about this series now.
> Do you think is it possible to get in 22.07? Or there are still some critical gaps need to be solved?
> Because we don't hope the same thing happens in 22.11.

My quick comment, I think you must better care about all details.
Precise explanations are very important.
It is more encouraging to review when we see the author tried hard
to avoid any confusion or approximation.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
  2022-07-07  9:05     ` Thomas Monjalon
@ 2022-07-08 15:00     ` Andrew Rybchenko
  2022-08-01  7:17       ` Wang, YuanX
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-07-08 15:00 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, ferruh.yigit, aman.deep.singh,
	dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen

On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> This patch added new ethdev API to retrieve supported protocol header mask

This patch added -> Add

> of a PMD, which helps to configure protocol header based buffer split.

I'd like to see motivation why single mask is considered sufficient.
I.e. why don't we follow ptypes approach which is move flexible, but
a bit more complicated.

Looking at RTE_PTYPE_* defines carefully it looks like below
API simply cannot provide information that we can split after
TCP or UDP.

> 
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>

[snip]

>   /**
>    * @internal
>    * Dump private info from device to a file.
> @@ -1281,6 +1296,9 @@ struct eth_dev_ops {
>   	/** Set IP reassembly configuration */
>   	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
>   
> +	/** Get supported ptypes to split */
> +	eth_buffer_split_hdr_ptype_get_t hdrs_supported_ptypes_get;
> +

It is better to be consistent with naming. I.e. just cut prefix "eth_"
and suffix "_t".

Also the type name sounds like it get current split configuration,
not supported one.

>   	/** Dump private info from device */
>   	eth_dev_priv_dump_t eth_dev_priv_dump;
>   };
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 29a3d80466..e1f2a0ffe3 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1636,9 +1636,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
>   }
>   
>   static int
> -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> -			     const struct rte_eth_dev_info *dev_info)
> +rte_eth_rx_queue_check_split(uint16_t port_id,
> +				const struct rte_eth_rxseg_split *rx_seg,
> +				int16_t n_seg, uint32_t *mbp_buf_size,
> +			    const struct rte_eth_dev_info *dev_info)
>   {
>   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa;
>   	struct rte_mempool *mp_first;
> @@ -1694,13 +1695,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
>   		}
>   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
>   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> -		length = length != 0 ? length : *mbp_buf_size;
> -		if (*mbp_buf_size < length + offset) {

I don't understand why the check goes away completely.

> -			RTE_ETHDEV_LOG(ERR,
> -				       "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n",
> -				       mpl->name, *mbp_buf_size,
> -				       length + offset, length, offset);
> -			return -EINVAL;
> +

Unnecessary empty line

>   		}

Shouldn't the curly bracket go away as well together with its 'if'

>   	}
>   	return 0;
> @@ -1779,7 +1774,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   		n_seg = rx_conf->rx_nseg;
>   
>   		if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg,
>   							   &mbp_buf_size,
>   							   &dev_info);
>   			if (ret != 0)
> @@ -5844,6 +5839,20 @@ rte_eth_ip_reassembly_conf_set(uint16_t port_id,
>   		       (*dev->dev_ops->ip_reassembly_conf_set)(dev, conf));
>   }
>   
> +int
> +rte_eth_supported_hdrs_get(uint16_t port_id, uint32_t *ptypes)
> +{
> +	struct rte_eth_dev *dev;
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];

ptypes must be checked vs NULL

> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->hdrs_supported_ptypes_get,
> +				-ENOTSUP);
> +
> +	return eth_err(port_id,
> +		       (*dev->dev_ops->hdrs_supported_ptypes_get)(dev, ptypes));
> +}
> +
>   int
>   rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
>   {
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..72cac1518e 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -6152,6 +6152,28 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
>   	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>   }
>   
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Get supported header protocols to split supported by PMD.

"supported" twice above.
Get supported header protocols to split on Rx.

> + * The API will return error if the device is not valid.

Above sentence is obvious and does not add any value. Please, remove.

> + *
> + * @param port_id
> + *   The port identifier of the device.
> + * @param ptype

Why do you use out annotation for the callback description and does not
use it here?

> + *   Supported protocol headers of driver.
> + * @return
> + *   - (-ENOTSUP) if header protocol is not supported by device.
> + *   - (-ENODEV) if *port_id* invalid.

EINVAL in the case of invalid ptypes argument

> + *   - (-EIO) if device is removed.
> + *   - (0) on success.
> + */
> +__rte_experimental
> +int rte_eth_supported_hdrs_get(uint16_t port_id,
> +		uint32_t *ptype);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 20391ab29e..7705c0364a 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -279,6 +279,9 @@ EXPERIMENTAL {
>   	rte_flow_async_action_handle_create;
>   	rte_flow_async_action_handle_destroy;
>   	rte_flow_async_action_handle_update;
> +
> +	# added in 22.07

It hopefully will be in 22.11

> +	rte_eth_supported_hdrs_get;
>   };
>   
>   INTERNAL {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-06-13 10:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
  2022-07-07  9:07     ` Thomas Monjalon
@ 2022-07-08 15:00     ` Andrew Rybchenko
  2022-07-21  3:24       ` Ding, Xuan
  1 sibling, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-07-08 15:00 UTC (permalink / raw)
  To: wenxuanx.wu, thomas, xiaoyun.li, ferruh.yigit, aman.deep.singh,
	dev, yuying.zhang, qi.z.zhang, jerinjacobk
  Cc: stephen, Xuan Ding, Yuan Wang, Ray Kinsella

On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> Currently, Rx buffer split supports length based split. With Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment
> configured, PMD will be able to split the received packets into
> multiple segments.
> 
> However, length based buffer split is not suitable for NICs that do split
> based on protocol headers. Given an arbitrarily variable length in Rx
> packet segment, it is almost impossible to pass a fixed protocol header to
> driver. Besides, the existence of tunneling results in the composition of
> a packet is various, which makes the situation even worse.
> 
> This patch extends current buffer split to support protocol header based
> buffer split. A new proto_hdr field is introduced in the reserved field
> of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr
> field defines the split position of packet, splitting will always happens
> after the protocol header defined in the Rx packet segment. When Rx queue
> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> protocol header is configured, driver will split the ingress packets into
> multiple segments.
> 
> struct rte_eth_rxseg_split {
> 
>          struct rte_mempool *mp; /* memory pools to allocate segment from */
>          uint16_t length; /* segment maximal data length,
>                              configures "split point" */
>          uint16_t offset; /* data offset from beginning
>                              of mbuf data buffer */
>          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> 			       configures "split point" */

There is a big problem here that using RTE_PTYPE_* defines I can't
request split after either TCP or UDP header.

>      };
> 
> If both inner and outer L2/L3/L4 level protocol header split can be
> supported by a PMD. Corresponding protocol header capability is
> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6, RTE_PTYPE_L4_TCP,
> RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_INNER_L2_ETHER,
> RTE_PTYPE_INNER_L3_IPV4, RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.

I think there is no point to list above defines here if it is not
the only supported defines.

> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>      seg2 - pool2, off1=0B
> 
> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> following:
>      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - udp header @ 128 in mbuf from pool1
>      seg2 - payload @ 0 in mbuf from pool2

Sorry, but I still see no definition what should happen with, for
example, ARP packet with above config.

> 
> Now buffer split can be configured in two modes. For length based
> buffer split, the mp, length, offset field in Rx packet segment should
> be configured, while the proto_hdr field should not be configured.
> For protocol header based buffer split, the mp, offset, proto_hdr field
> in Rx packet segment should be configured, while the length field should
> not be configured.
> 
> The split limitations imposed by underlying driver is reported in the
> rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split
> parts may differ either, dpdk memory and external memory, respectively.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>   lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
>   lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
>   2 files changed, 44 insertions(+), 2 deletions(-)

Do we need a dedicated feature in doc/guides/nics/features.rst?
Or should be just update buffer split to refer to a new supported
header split API and callback?

Also the feature definitely deserves entry in the release notes.

[snip]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-07-07  9:07     ` Thomas Monjalon
@ 2022-07-11  9:54       ` Ding, Xuan
  2022-07-11 10:12         ` Thomas Monjalon
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-07-11  9:54 UTC (permalink / raw)
  To: Thomas Monjalon, Wu, WenxuanX
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, Wu,
	WenxuanX, Wang, YuanX, Ray Kinsella

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 7, 2022 5:08 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; Wu, WenxuanX <wenxuanx.wu@intel.com>;
> Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX <yuanx.wang@intel.com>;
> Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
> >   *   specified in the first array element, the second buffer, from the
> >   *   pool in the second element, and so on.
> >   *
> > + * - The proto_hdrs in the elements define the split position of
> > + *   received packets.
> > + *
> >   * - The offsets from the segment description elements specify
> >   *   the data offset from the buffer beginning except the first mbuf.
> >   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> >   *     - pool from the last valid element
> >   *     - the buffer size from this pool
> >   *     - zero offset
> > + *
> > + * - Length based buffer split:
> > + *     - mp, length, offset should be configured.
> > + *     - The proto_hdr field should not be configured.
> > + *
> > + * - Protocol header based buffer split:
> > + *     - mp, offset, proto_hdr should be configured.
> > + *     - The length field should not be configured.
> >   */
> >  struct rte_eth_rxseg_split {
> >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> from. */
> >  	uint16_t length; /**< Segment data length, configures split point. */
> >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> */
> > -	uint32_t reserved; /**< Reserved field. */
> > +	/**< Supported ptypes mask of a specific pmd, configures split point.
> */
> 
> The doxygen syntax is wrong: remove the "<" which is for post-comment.

Thanks for your catch.

> 
> > +	uint32_t proto_hdr;
> >  };
> 
> How do we know it is a length or buffer split?
> Is it based on checking some 0 value?

Yes, as Andrew suggests, we introduced the API rte_eth_supported_hdrs_get() in v9.
It will report the driver supported protocol headers to be split.
If the API returns ENOTSUP, it means driver supports length based buffer split.

Of course, no matter what kind of buffer split it is, we need to check
RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT first.

Thanks,
Xuan

> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 0/4] add an api to support proto based buffer split
  2022-07-07  9:10     ` Thomas Monjalon
@ 2022-07-11 10:08       ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-07-11 10:08 UTC (permalink / raw)
  To: Thomas Monjalon, Wu, WenxuanX
  Cc: andrew.rybchenko, dev, Li, Xiaoyun, ferruh.yigit, dev, Zhang,
	Yuying, Zhang, Qi Z, jerinjacobk, stephen, Zhang, Qi Z,
	Richardson, Bruce, Mcnamara, John

Hi,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 7, 2022 5:10 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; Ding, Xuan
> <xuan.ding@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; dev@dpdk.org; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; dev@dpdk.org; Zhang,
> Yuying <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com; stephen@networkplumber.org; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>
> Subject: Re: [PATCH v9 0/4] add an api to support proto based buffer split
> 
> 21/06/2022 10:56, Ding, Xuan:
> > This protocol based buffer split patch series have been updated to v9.
> > Sincerely thank you for the effort you put into this series.
> >
> > Hope to know your considerations about this series now.
> > Do you think is it possible to get in 22.07? Or there are still some critical
> gaps need to be solved?
> > Because we don't hope the same thing happens in 22.11.
> 
> My quick comment, I think you must better care about all details.
> Precise explanations are very important.
> It is more encouraging to review when we see the author tried hard to avoid
> any confusion or approximation.

Thanks a lot to all the reviewers for the effort on this series.
So far, although we tried to make the explanation or documentation detailed,
there must have some places that are not clearly explained or the doc is not good enough.

We will do more self-checks and try to refine where it might not be clear.
Your comments are welcome.

Regards,
Xuan

> 
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-07-11  9:54       ` Ding, Xuan
@ 2022-07-11 10:12         ` Thomas Monjalon
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Monjalon @ 2022-07-11 10:12 UTC (permalink / raw)
  To: Wu, WenxuanX, Ding, Xuan
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, Wu,
	WenxuanX, Wang, YuanX, Ray Kinsella

11/07/2022 11:54, Ding, Xuan:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -1176,6 +1176,9 @@ struct rte_eth_txmode {
> > >   *   specified in the first array element, the second buffer, from the
> > >   *   pool in the second element, and so on.
> > >   *
> > > + * - The proto_hdrs in the elements define the split position of
> > > + *   received packets.
> > > + *
> > >   * - The offsets from the segment description elements specify
> > >   *   the data offset from the buffer beginning except the first mbuf.
> > >   *   The first segment offset is added with RTE_PKTMBUF_HEADROOM.
> > > @@ -1197,12 +1200,21 @@ struct rte_eth_txmode {
> > >   *     - pool from the last valid element
> > >   *     - the buffer size from this pool
> > >   *     - zero offset
> > > + *
> > > + * - Length based buffer split:
> > > + *     - mp, length, offset should be configured.
> > > + *     - The proto_hdr field should not be configured.
> > > + *
> > > + * - Protocol header based buffer split:
> > > + *     - mp, offset, proto_hdr should be configured.
> > > + *     - The length field should not be configured.
> > >   */
> > >  struct rte_eth_rxseg_split {
> > >  	struct rte_mempool *mp; /**< Memory pool to allocate segment
> > from. */
> > >  	uint16_t length; /**< Segment data length, configures split point. */
> > >  	uint16_t offset; /**< Data offset from beginning of mbuf data buffer.
> > */
> > > -	uint32_t reserved; /**< Reserved field. */
> > > +	/**< Supported ptypes mask of a specific pmd, configures split point.
> > */
> > 
> > The doxygen syntax is wrong: remove the "<" which is for post-comment.
> 
> Thanks for your catch.
> 
> > 
> > > +	uint32_t proto_hdr;
> > >  };
> > 
> > How do we know it is a length or buffer split?
> > Is it based on checking some 0 value?
> 
> Yes, as Andrew suggests, we introduced the API rte_eth_supported_hdrs_get() in v9.
> It will report the driver supported protocol headers to be split.
> If the API returns ENOTSUP, it means driver supports length based buffer split.
> 
> Of course, no matter what kind of buffer split it is, we need to check
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT first.

So you need to talk about RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in the comment of this struct.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-07-08 15:00     ` Andrew Rybchenko
@ 2022-07-21  3:24       ` Ding, Xuan
  2022-08-01 14:28         ` Andrew Rybchenko
  0 siblings, 1 reply; 88+ messages in thread
From: Ding, Xuan @ 2022-07-21  3:24 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, stephen, Wang, YuanX, Ray Kinsella, Wu, WenxuanX, thomas,
	Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, jerinjacobk, viacheslavo

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 2022年7月8日 23:01
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > Currently, Rx buffer split supports length based split. With Rx queue
> > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
> segment
> > configured, PMD will be able to split the received packets into
> > multiple segments.
> >
> > However, length based buffer split is not suitable for NICs that do
> > split based on protocol headers. Given an arbitrarily variable length
> > in Rx packet segment, it is almost impossible to pass a fixed protocol
> > header to driver. Besides, the existence of tunneling results in the
> > composition of a packet is various, which makes the situation even worse.
> >
> > This patch extends current buffer split to support protocol header
> > based buffer split. A new proto_hdr field is introduced in the
> > reserved field of rte_eth_rxseg_split structure to specify protocol
> > header. The proto_hdr field defines the split position of packet,
> > splitting will always happens after the protocol header defined in the
> > Rx packet segment. When Rx queue offload
> > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol
> > header is configured, driver will split the ingress packets into multiple
> segments.
> >
> > struct rte_eth_rxseg_split {
> >
> >          struct rte_mempool *mp; /* memory pools to allocate segment from */
> >          uint16_t length; /* segment maximal data length,
> >                              configures "split point" */
> >          uint16_t offset; /* data offset from beginning
> >                              of mbuf data buffer */
> >          uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> > 			       configures "split point" */
> 
> There is a big problem here that using RTE_PTYPE_* defines I can't request split
> after either TCP or UDP header.

Sorry, for some reason I missed your reply.

Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers (both outer and inner).
Do you mean that we should support higher layer protocols after L4?

I think tunnel and L2/L3/L4 protocol headers are enough.
In DPDK, we don't parse higher level protocols after L4.
And the higher layer protocols are richer, we can't list all of them.
What do you think?

> 
> >      };
> >
> > If both inner and outer L2/L3/L4 level protocol header split can be
> > supported by a PMD. Corresponding protocol header capability is
> > RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
> > RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
> > RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> > RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
> 
> I think there is no point to list above defines here if it is not the only supported
> defines.

Yes, since we use a API to return the protocol header driver supported to split,
there is no need to list the incomplete RTE_PTYPE* here. Please see next version.

> 
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >      seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >      seg2 - pool2, off1=0B
> >
> > The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> > following:
> >      seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >      seg1 - udp header @ 128 in mbuf from pool1
> >      seg2 - payload @ 0 in mbuf from pool2
> 
> Sorry, but I still see no definition what should happen with, for example, ARP
> packet with above config.

Thanks, because the following reply was not answered in v8, 
the definition has not been added in v9 yet.

"
Our NIC only supports to split the packets into two segments,
so there will be an exact match for the only one protocol header configured. Back to this
question, for the set of proto_hdrs configured, it can have two behaviors:
1. The aggressive way is to split on longest match you mentioned, E.g. we configure split
on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on ETH-IPV4
or ETH.
2. A more conservative way is to split only when the packets meet the all protocol headers
in the Rx packet segment. In the above situation, it will not do split for ETH-IPV4-UDP
and ETH-IPV6.

I prefer the second behavior, because the split is usually for the inner most header and
payload, if it does not meet, the rest of the headers have no actual value.
"

Hope to get your insights.
And we will update the doc to define the behavior in next version.

> 
> >
> > Now buffer split can be configured in two modes. For length based
> > buffer split, the mp, length, offset field in Rx packet segment should
> > be configured, while the proto_hdr field should not be configured.
> > For protocol header based buffer split, the mp, offset, proto_hdr
> > field in Rx packet segment should be configured, while the length
> > field should not be configured.
> >
> > The split limitations imposed by underlying driver is reported in the
> > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the
> > split parts may differ either, dpdk memory and external memory, respectively.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> > ---
> >   lib/ethdev/rte_ethdev.c | 32 +++++++++++++++++++++++++++++++-
> >   lib/ethdev/rte_ethdev.h | 14 +++++++++++++-
> >   2 files changed, 44 insertions(+), 2 deletions(-)
> 
> Do we need a dedicated feature in doc/guides/nics/features.rst?
> Or should be just update buffer split to refer to a new supported header split API
> and callback?
> 
> Also the feature definitely deserves entry in the release notes.
 
Regarding the newly introduced protocol based buffer split, it is definitely worth a doc update.
The reason why we didn't do it before is because it is under discussion.

Before we send a new version, we will put more efforts to clean current patch to make
doc more comprehensive and easier to understand.

Thanks,
Xuan

> 
> [snip]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-07-07  9:05     ` Thomas Monjalon
@ 2022-08-01  7:09       ` Wang, YuanX
  2022-08-01 10:01         ` Thomas Monjalon
  0 siblings, 1 reply; 88+ messages in thread
From: Wang, YuanX @ 2022-08-01  7:09 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, Wu,
	WenxuanX, Ding, Xuan

Hi Thomas,

Sorry so long to response your email.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, July 7, 2022 5:05 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org
> Subject: Re: [PATCH v9 1/4] ethdev: introduce protocol header API
> 
> 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > This patch added new ethdev API to retrieve supported protocol header
> > mask of a PMD, which helps to configure protocol header based buffer split.
> >
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > ---
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Get supported header protocols to split supported by PMD.
> > + * The API will return error if the device is not valid.
> > + *
> > + * @param port_id
> > + *   The port identifier of the device.
> > + * @param ptype
> > + *   Supported protocol headers of driver.
> 
> It doesn't say where to find the types.
> Please give the prefix.

Sorry I didn't catch your point, are you referring the ptype should be composed of RTE_PTYPE_*?
Could you explain it in more detail?

> 
> > + * @return
> > + *   - (-ENOTSUP) if header protocol is not supported by device.
> > + *   - (-ENODEV) if *port_id* invalid.
> > + *   - (-EIO) if device is removed.
> > + *   - (0) on success.
> > + */
> > +__rte_experimental
> > +int rte_eth_supported_hdrs_get(uint16_t port_id,
> > +		uint32_t *ptype);
> 
> The function name is not precise enough.
> There should be the word "split" in its name.

Thanks for the suggestion, it will be revised in the next version.

Thanks,
Yuan
> 



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-07-08 15:00     ` Andrew Rybchenko
@ 2022-08-01  7:17       ` Wang, YuanX
  0 siblings, 0 replies; 88+ messages in thread
From: Wang, YuanX @ 2022-08-01  7:17 UTC (permalink / raw)
  To: Andrew Rybchenko, Wu, WenxuanX, thomas, Li, Xiaoyun,
	ferruh.yigit, Singh, Aman Deep, dev, Zhang, Yuying, Zhang, Qi Z,
	jerinjacobk, Ding, Xuan
  Cc: stephen

Hi Andrew,

Apologies for the delay in getting back to you.

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Friday, July 8, 2022 11:01 PM
> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> jerinjacobk@gmail.com
> Cc: stephen@networkplumber.org
> Subject: Re: [PATCH v9 1/4] ethdev: introduce protocol header API
> 
> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >
> > This patch added new ethdev API to retrieve supported protocol header
> > mask
> 
> This patch added -> Add

Thanks for your catch, will fix in the next version.

> 
> > of a PMD, which helps to configure protocol header based buffer split.
> 
> I'd like to see motivation why single mask is considered sufficient.
> I.e. why don't we follow ptypes approach which is move flexible, but a bit
> more complicated.
> 
> Looking at RTE_PTYPE_* defines carefully it looks like below API simply
> cannot provide information that we can split after TCP or UDP.

As Xuan replied in the patch 2, we think maybe RTE_PTYPE_* is enough.
Any insights are welcome.

> 
> >
> > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> 
> [snip]
> 
> >   /**
> >    * @internal
> >    * Dump private info from device to a file.
> > @@ -1281,6 +1296,9 @@ struct eth_dev_ops {
> >   	/** Set IP reassembly configuration */
> >   	eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
> >
> > +	/** Get supported ptypes to split */
> > +	eth_buffer_split_hdr_ptype_get_t hdrs_supported_ptypes_get;
> > +
> 
> It is better to be consistent with naming. I.e. just cut prefix "eth_"
> and suffix "_t".
> 
> Also the type name sounds like it get current split configuration, not
> supported one.

Thank you for your suggestion, will fix in the next version.

> 
> >   	/** Dump private info from device */
> >   	eth_dev_priv_dump_t eth_dev_priv_dump;
> >   };
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..e1f2a0ffe3 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1636,9 +1636,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -			     uint16_t n_seg, uint32_t *mbp_buf_size,
> > -			     const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +				const struct rte_eth_rxseg_split *rx_seg,
> > +				int16_t n_seg, uint32_t *mbp_buf_size,
> > +			    const struct rte_eth_dev_info *dev_info)
> >   {
> >   	const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> >   	struct rte_mempool *mp_first;
> > @@ -1694,13 +1695,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> >   		}
> >   		offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> >   		*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -		length = length != 0 ? length : *mbp_buf_size;
> > -		if (*mbp_buf_size < length + offset) {
> 
> I don't understand why the check goes away completely.

Thanks for your catch, it should be in the patch 2, will fix in the next version.

> 
> > -			RTE_ETHDEV_LOG(ERR,
> > -				       "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -				       mpl->name, *mbp_buf_size,
> > -				       length + offset, length, offset);
> > -			return -EINVAL;
> > +
> 
> Unnecessary empty line
> 
> >   		}
> 
> Shouldn't the curly bracket go away as well together with its 'if'

Thanks for your catch, will fix in the next version.

> 
> >   	}
> >   	return 0;
> > @@ -1779,7 +1774,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >   		n_seg = rx_conf->rx_nseg;
> >
> >   		if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> > -			ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +			ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >   							   &mbp_buf_size,
> >   							   &dev_info);
> >   			if (ret != 0)
> > @@ -5844,6 +5839,20 @@ rte_eth_ip_reassembly_conf_set(uint16_t
> port_id,
> >   		       (*dev->dev_ops->ip_reassembly_conf_set)(dev, conf));
> >   }
> >
> > +int
> > +rte_eth_supported_hdrs_get(uint16_t port_id, uint32_t *ptypes) {
> > +	struct rte_eth_dev *dev;
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	dev = &rte_eth_devices[port_id];
> 
> ptypes must be checked vs NULL

Thanks for your catch, will fix in the next version.

> 
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >hdrs_supported_ptypes_get,
> > +				-ENOTSUP);
> > +
> > +	return eth_err(port_id,
> > +		       (*dev->dev_ops->hdrs_supported_ptypes_get)(dev,
> ptypes)); }
> > +
> >   int
> >   rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
> >   {
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 04cff8ee10..72cac1518e 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -6152,6 +6152,28 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t
> queue_id,
> >   	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >   }
> >
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Get supported header protocols to split supported by PMD.
> 
> "supported" twice above.
> Get supported header protocols to split on Rx.

Thank you for your suggestion, will fix in the next version.

> 
> > + * The API will return error if the device is not valid.
> 
> Above sentence is obvious and does not add any value. Please, remove.
> 
> > + *
> > + * @param port_id
> > + *   The port identifier of the device.
> > + * @param ptype
> 
> Why do you use out annotation for the callback description and does not use
> it here?

Thank you for your suggestion, will fix in the next version.

> 
> > + *   Supported protocol headers of driver.
> > + * @return
> > + *   - (-ENOTSUP) if header protocol is not supported by device.
> > + *   - (-ENODEV) if *port_id* invalid.
> 
> EINVAL in the case of invalid ptypes argument

Thank you for your suggestion, will fix in the next version.

> 
> > + *   - (-EIO) if device is removed.
> > + *   - (0) on success.
> > + */
> > +__rte_experimental
> > +int rte_eth_supported_hdrs_get(uint16_t port_id,
> > +		uint32_t *ptype);
> > +
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> > 20391ab29e..7705c0364a 100644
> > --- a/lib/ethdev/version.map
> > +++ b/lib/ethdev/version.map
> > @@ -279,6 +279,9 @@ EXPERIMENTAL {
> >   	rte_flow_async_action_handle_create;
> >   	rte_flow_async_action_handle_destroy;
> >   	rte_flow_async_action_handle_update;
> > +
> > +	# added in 22.07
> 
> It hopefully will be in 22.11

Sure, it should be targeted for 22.11.

Thanks,
Yuan

> 
> > +	rte_eth_supported_hdrs_get;
> >   };
> >
> >   INTERNAL {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-08-01  7:09       ` Wang, YuanX
@ 2022-08-01 10:01         ` Thomas Monjalon
  2022-08-02 10:12           ` Wang, YuanX
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Monjalon @ 2022-08-01 10:01 UTC (permalink / raw)
  To: Wang, YuanX
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, Wu,
	WenxuanX, Ding, Xuan

01/08/2022 09:09, Wang, YuanX:
> Hi Thomas,
> 
> Sorry so long to response your email.
> 
> From: Thomas Monjalon <thomas@monjalon.net>
> > 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> > >
> > > This patch added new ethdev API to retrieve supported protocol header
> > > mask of a PMD, which helps to configure protocol header based buffer split.
> > >
> > > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > > ---
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Get supported header protocols to split supported by PMD.
> > > + * The API will return error if the device is not valid.
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the device.
> > > + * @param ptype
> > > + *   Supported protocol headers of driver.
> > 
> > It doesn't say where to find the types.
> > Please give the prefix.
> 
> Sorry I didn't catch your point, are you referring the ptype should be composed of RTE_PTYPE_*?
> Could you explain it in more detail?

Yes just give to the user the required info to use the function.
If ptype must be composed with RTE_PTYPE_*, it must be said.

Thanks



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-07-21  3:24       ` Ding, Xuan
@ 2022-08-01 14:28         ` Andrew Rybchenko
  2022-08-02  7:22           ` Ding, Xuan
  0 siblings, 1 reply; 88+ messages in thread
From: Andrew Rybchenko @ 2022-08-01 14:28 UTC (permalink / raw)
  To: Ding, Xuan
  Cc: dev, stephen, Wang, YuanX, Ray Kinsella, Wu, WenxuanX, thomas,
	Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, jerinjacobk, viacheslavo

On 7/21/22 06:24, Ding, Xuan wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: 2022年7月8日 23:01
>> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li,
>> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
>> <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
>> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
>> jerinjacobk@gmail.com
>> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>; Wang,
>> YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
>> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
>>
>> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
>>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
>>>
>>> Currently, Rx buffer split supports length based split. With Rx queue
>>> offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet
>> segment
>>> configured, PMD will be able to split the received packets into
>>> multiple segments.
>>>
>>> However, length based buffer split is not suitable for NICs that do
>>> split based on protocol headers. Given an arbitrarily variable length
>>> in Rx packet segment, it is almost impossible to pass a fixed protocol
>>> header to driver. Besides, the existence of tunneling results in the
>>> composition of a packet is various, which makes the situation even worse.
>>>
>>> This patch extends current buffer split to support protocol header
>>> based buffer split. A new proto_hdr field is introduced in the
>>> reserved field of rte_eth_rxseg_split structure to specify protocol
>>> header. The proto_hdr field defines the split position of packet,
>>> splitting will always happens after the protocol header defined in the
>>> Rx packet segment. When Rx queue offload
>>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol
>>> header is configured, driver will split the ingress packets into multiple
>> segments.
>>>
>>> struct rte_eth_rxseg_split {
>>>
>>>           struct rte_mempool *mp; /* memory pools to allocate segment from */
>>>           uint16_t length; /* segment maximal data length,
>>>                               configures "split point" */
>>>           uint16_t offset; /* data offset from beginning
>>>                               of mbuf data buffer */
>>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
>>> 			       configures "split point" */
>>
>> There is a big problem here that using RTE_PTYPE_* defines I can't request split
>> after either TCP or UDP header.
> 
> Sorry, for some reason I missed your reply.
> 
> Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers (both outer and inner).
> Do you mean that we should support higher layer protocols after L4?
> 
> I think tunnel and L2/L3/L4 protocol headers are enough.
> In DPDK, we don't parse higher level protocols after L4.
> And the higher layer protocols are richer, we can't list all of them.
> What do you think?

It looks like you don't get my point. You simply cannot say:
RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP since it is numerically equal to
RTE_PTYPE_L4_FRAG. May be the design limitation is acceptable.
I have no strong opinion, but it must be clear for all that the
limitation exists.

>>
>>>       };
>>>
>>> If both inner and outer L2/L3/L4 level protocol header split can be
>>> supported by a PMD. Corresponding protocol header capability is
>>> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
>>> RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
>>> RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
>>> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
>> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
>>
>> I think there is no point to list above defines here if it is not the only supported
>> defines.
> 
> Yes, since we use a API to return the protocol header driver supported to split,
> there is no need to list the incomplete RTE_PTYPE* here. Please see next version.
> 
>>
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
>>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
>>>       seg2 - pool2, off1=0B
>>>
>>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
>>> following:
>>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>>>       seg1 - udp header @ 128 in mbuf from pool1
>>>       seg2 - payload @ 0 in mbuf from pool2
>>
>> Sorry, but I still see no definition what should happen with, for example, ARP
>> packet with above config.
> 
> Thanks, because the following reply was not answered in v8,
> the definition has not been added in v9 yet.
> 
> "
> Our NIC only supports to split the packets into two segments,
> so there will be an exact match for the only one protocol header configured. Back to this
> question, for the set of proto_hdrs configured, it can have two behaviors:
> 1. The aggressive way is to split on longest match you mentioned, E.g. we configure split
> on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on ETH-IPV4
> or ETH.
> 2. A more conservative way is to split only when the packets meet the all protocol headers
> in the Rx packet segment. In the above situation, it will not do split for ETH-IPV4-UDP
> and ETH-IPV6.
> 
> I prefer the second behavior, because the split is usually for the inner most header and
> payload, if it does not meet, the rest of the headers have no actual value.
> "
> 
> Hope to get your insights.
> And we will update the doc to define the behavior in next version.

I'm OK with (2) as well. Please, define it in the documentation. Also it
must be clear which segment/mempool is used if a packet is not split.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
  2022-08-01 14:28         ` Andrew Rybchenko
@ 2022-08-02  7:22           ` Ding, Xuan
  0 siblings, 0 replies; 88+ messages in thread
From: Ding, Xuan @ 2022-08-02  7:22 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, stephen, Wang, YuanX, Ray Kinsella, Wu, WenxuanX, thomas,
	Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep, Zhang, Yuying,
	Zhang, Qi Z, jerinjacobk, viacheslavo

Hi,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, August 1, 2022 10:28 PM
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; Wang, YuanX
> <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Wu, WenxuanX
> <wenxuanx.wu@intel.com>; thomas@monjalon.net; Li, Xiaoyun
> <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman Deep
> <aman.deep.singh@intel.com>; Zhang, Yuying <yuying.zhang@intel.com>;
> Zhang, Qi Z <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> viacheslavo@nvidia.com
> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split
> 
> On 7/21/22 06:24, Ding, Xuan wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> Sent: 2022年7月8日 23:01
> >> To: Wu, WenxuanX <wenxuanx.wu@intel.com>; thomas@monjalon.net;
> Li,
> >> Xiaoyun <xiaoyun.li@intel.com>; ferruh.yigit@xilinx.com; Singh, Aman
> >> Deep <aman.deep.singh@intel.com>; dev@dpdk.org; Zhang, Yuying
> >> <yuying.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> >> jerinjacobk@gmail.com
> >> Cc: stephen@networkplumber.org; Ding, Xuan <xuan.ding@intel.com>;
> >> Wang, YuanX <yuanx.wang@intel.com>; Ray Kinsella <mdr@ashroe.eu>
> >> Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based
> >> buffer split
> >>
> >> On 6/13/22 13:25, wenxuanx.wu@intel.com wrote:
> >>> From: Wenxuan Wu <wenxuanx.wu@intel.com>
> >>>
> >>> Currently, Rx buffer split supports length based split. With Rx
> >>> queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx
> packet
> >> segment
> >>> configured, PMD will be able to split the received packets into
> >>> multiple segments.
> >>>
> >>> However, length based buffer split is not suitable for NICs that do
> >>> split based on protocol headers. Given an arbitrarily variable
> >>> length in Rx packet segment, it is almost impossible to pass a fixed
> >>> protocol header to driver. Besides, the existence of tunneling
> >>> results in the composition of a packet is various, which makes the
> situation even worse.
> >>>
> >>> This patch extends current buffer split to support protocol header
> >>> based buffer split. A new proto_hdr field is introduced in the
> >>> reserved field of rte_eth_rxseg_split structure to specify protocol
> >>> header. The proto_hdr field defines the split position of packet,
> >>> splitting will always happens after the protocol header defined in
> >>> the Rx packet segment. When Rx queue offload
> >>> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding
> >>> protocol header is configured, driver will split the ingress packets
> >>> into multiple
> >> segments.
> >>>
> >>> struct rte_eth_rxseg_split {
> >>>
> >>>           struct rte_mempool *mp; /* memory pools to allocate segment
> from */
> >>>           uint16_t length; /* segment maximal data length,
> >>>                               configures "split point" */
> >>>           uint16_t offset; /* data offset from beginning
> >>>                               of mbuf data buffer */
> >>>           uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
> >>> 			       configures "split point" */
> >>
> >> There is a big problem here that using RTE_PTYPE_* defines I can't
> >> request split after either TCP or UDP header.
> >
> > Sorry, for some reason I missed your reply.
> >
> > Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers
> (both outer and inner).
> > Do you mean that we should support higher layer protocols after L4?
> >
> > I think tunnel and L2/L3/L4 protocol headers are enough.
> > In DPDK, we don't parse higher level protocols after L4.
> > And the higher layer protocols are richer, we can't list all of them.
> > What do you think?
> 
> It looks like you don't get my point. You simply cannot say:
> RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP since it is numerically equal to
> RTE_PTYPE_L4_FRAG. May be the design limitation is acceptable.
> I have no strong opinion, but it must be clear for all that the limitation exists.

Thanks for your correction.
Similarly, RTE_PTYPE_INNER_L4_TCP and RTE_PTYPE_INNER_L4_UDP
also exists this situation.

I will try to solve this limitation by following ptypes_get approach.

> 
> >>
> >>>       };
> >>>
> >>> If both inner and outer L2/L3/L4 level protocol header split can be
> >>> supported by a PMD. Corresponding protocol header capability is
> >>> RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
> >>> RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
> >>> RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
> >>> RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,
> >> RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.
> >>
> >> I think there is no point to list above defines here if it is not the
> >> only supported defines.
> >
> > Yes, since we use a API to return the protocol header driver supported
> > to split, there is no need to list the incomplete RTE_PTYPE* here. Please
> see next version.
> >
> >>
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>       seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
> >>>       seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
> >>>       seg2 - pool2, off1=0B
> >>>
> >>> The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
> >>> following:
> >>>       seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from
> pool0
> >>>       seg1 - udp header @ 128 in mbuf from pool1
> >>>       seg2 - payload @ 0 in mbuf from pool2
> >>
> >> Sorry, but I still see no definition what should happen with, for
> >> example, ARP packet with above config.
> >
> > Thanks, because the following reply was not answered in v8, the
> > definition has not been added in v9 yet.
> >
> > "
> > Our NIC only supports to split the packets into two segments, so there
> > will be an exact match for the only one protocol header configured.
> > Back to this question, for the set of proto_hdrs configured, it can have two
> behaviors:
> > 1. The aggressive way is to split on longest match you mentioned, E.g.
> > we configure split on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or
> > ETH-IPV6, it can also split on ETH-IPV4 or ETH.
> > 2. A more conservative way is to split only when the packets meet the
> > all protocol headers in the Rx packet segment. In the above situation,
> > it will not do split for ETH-IPV4-UDP and ETH-IPV6.
> >
> > I prefer the second behavior, because the split is usually for the
> > inner most header and payload, if it does not meet, the rest of the headers
> have no actual value.
> > "
> >
> > Hope to get your insights.
> > And we will update the doc to define the behavior in next version.
> 
> I'm OK with (2) as well. Please, define it in the documentation. Also it must
> be clear which segment/mempool is used if a packet is not split.

Get your point. Will fix it in next version.

Thanks,
Xuan

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [PATCH v9 1/4] ethdev: introduce protocol header API
  2022-08-01 10:01         ` Thomas Monjalon
@ 2022-08-02 10:12           ` Wang, YuanX
  0 siblings, 0 replies; 88+ messages in thread
From: Wang, YuanX @ 2022-08-02 10:12 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: andrew.rybchenko, Li, Xiaoyun, ferruh.yigit, Singh, Aman Deep,
	dev, Zhang, Yuying, Zhang, Qi Z, jerinjacobk, stephen, Wu,
	WenxuanX, Ding, Xuan

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, August 1, 2022 6:01 PM
> To: Wang, YuanX <yuanx.wang@intel.com>
> Cc: andrew.rybchenko@oktetlabs.ru; Li, Xiaoyun <xiaoyun.li@intel.com>;
> ferruh.yigit@xilinx.com; Singh, Aman Deep <aman.deep.singh@intel.com>;
> dev@dpdk.org; Zhang, Yuying <yuying.zhang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; jerinjacobk@gmail.com;
> stephen@networkplumber.org; Wu, WenxuanX <wenxuanx.wu@intel.com>;
> Ding, Xuan <xuan.ding@intel.com>
> Subject: Re: [PATCH v9 1/4] ethdev: introduce protocol header API
> 
> 01/08/2022 09:09, Wang, YuanX:
> > Hi Thomas,
> >
> > Sorry so long to response your email.
> >
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 13/06/2022 12:25, wenxuanx.wu@intel.com:
> > > > From: Wenxuan Wu <wenxuanx.wu@intel.com>
> > > >
> > > > This patch added new ethdev API to retrieve supported protocol
> > > > header mask of a PMD, which helps to configure protocol header based
> buffer split.
> > > >
> > > > Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
> > > > ---
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Get supported header protocols to split supported by PMD.
> > > > + * The API will return error if the device is not valid.
> > > > + *
> > > > + * @param port_id
> > > > + *   The port identifier of the device.
> > > > + * @param ptype
> > > > + *   Supported protocol headers of driver.
> > >
> > > It doesn't say where to find the types.
> > > Please give the prefix.
> >
> > Sorry I didn't catch your point, are you referring the ptype should be
> composed of RTE_PTYPE_*?
> > Could you explain it in more detail?
> 
> Yes just give to the user the required info to use the function.
> If ptype must be composed with RTE_PTYPE_*, it must be said.

Thanks for your explanation, will fix in the next version.

Thanks,
Yuan

> 
> Thanks
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2022-08-02 10:12 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03  6:01 [RFC] ethdev: introduce protocol type based header split xuan.ding
2022-03-03  8:55 ` Thomas Monjalon
2022-03-08  7:48   ` Ding, Xuan
2022-03-03 16:15 ` Stephen Hemminger
2022-03-04  9:58   ` Zhang, Qi Z
2022-03-04 11:54     ` Morten Brørup
2022-03-04 17:32     ` Stephen Hemminger
2022-03-22  3:56 ` [RFC,v2 0/3] " xuan.ding
2022-03-22  3:56   ` [RFC,v2 1/3] " xuan.ding
2022-03-22  7:14     ` Zhang, Qi Z
2022-03-22  7:43       ` Ding, Xuan
2022-03-22  3:56   ` [RFC,v2 2/3] app/testpmd: add header split configuration xuan.ding
2022-03-22  3:56   ` [RFC,v2 3/3] net/ice: support header split in Rx data path xuan.ding
2022-03-29  6:49 ` [RFC,v3 0/3] ethdev: introduce protocol type based header split xuan.ding
2022-03-29  6:49   ` [RFC,v3 1/3] " xuan.ding
2022-03-29  7:56     ` Zhang, Qi Z
2022-03-29  8:18       ` Ding, Xuan
2022-03-29  6:49   ` [RFC,v3 2/3] app/testpmd: add header split configuration xuan.ding
2022-03-29  6:49   ` [RFC,v3 3/3] net/ice: support header split in Rx data path xuan.ding
2022-04-02 10:41 ` [v4 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
2022-04-02 10:41   ` [v4 1/3] " wenxuanx.wu
2022-04-07 10:47     ` Andrew Rybchenko
2022-04-12 16:15       ` Ding, Xuan
2022-04-20 15:48         ` Andrew Rybchenko
2022-04-25 14:57           ` Ding, Xuan
2022-04-21 10:27         ` Thomas Monjalon
2022-04-25 15:05           ` Ding, Xuan
2022-04-07 13:26     ` Jerin Jacob
2022-04-12 16:40       ` Ding, Xuan
2022-04-20 14:39         ` Andrew Rybchenko
2022-04-21 10:36           ` Thomas Monjalon
2022-04-25  9:23           ` Ding, Xuan
2022-04-26 11:13     ` [PATCH v5 0/3] ethdev: introduce protocol based buffer split wenxuanx.wu
2022-04-26 11:13       ` [PATCH v5 1/4] lib/ethdev: introduce protocol type " wenxuanx.wu
2022-05-17 21:12         ` Thomas Monjalon
2022-05-19 14:40           ` Ding, Xuan
2022-05-26 14:58             ` Ding, Xuan
2022-04-26 11:13       ` [PATCH v5 2/4] app/testpmd: add proto based buffer split config wenxuanx.wu
2022-04-26 11:13       ` [PATCH v5 3/4] net/ice: support proto based buf split in Rx path wenxuanx.wu
2022-04-02 10:41   ` [v4 2/3] app/testpmd: add header split configuration wenxuanx.wu
2022-04-02 10:41   ` [v4 3/3] net/ice: support header split in Rx data path wenxuanx.wu
2022-05-27  7:54 ` [PATCH v6] ethdev: introduce protocol header based buffer split xuan.ding
2022-05-27  8:14 ` [PATCH v6 0/1] ethdev: introduce protocol " xuan.ding
2022-05-27  8:14   ` [PATCH v6 1/1] ethdev: introduce protocol header " xuan.ding
2022-05-30  9:43     ` Ray Kinsella
2022-06-01 13:06 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
2022-06-01 13:06   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
2022-06-01 13:06   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
2022-06-01 13:06   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
2022-06-01 13:22 ` [PATCH v7 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
2022-06-01 13:22   ` [PATCH v7 1/3] ethdev: introduce protocol header based buffer split wenxuanx.wu
2022-06-01 13:22   ` [PATCH v7 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
2022-06-01 13:22   ` [PATCH v7 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
2022-06-01 13:50 ` [PATCH v8 0/3] ethdev: introduce protocol type based header split wenxuanx.wu
2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
2022-06-02 13:20     ` Andrew Rybchenko
2022-06-03 16:30       ` Ding, Xuan
2022-06-04 14:25         ` Andrew Rybchenko
2022-06-07 10:13           ` Ding, Xuan
2022-06-07 10:48             ` Andrew Rybchenko
2022-06-10 15:04               ` Ding, Xuan
2022-06-01 13:50   ` [PATCH v8 1/3] ethdev: introduce protocol header " wenxuanx.wu
2022-06-02 13:20     ` Andrew Rybchenko
2022-06-02 13:44       ` Ding, Xuan
2022-06-01 13:50   ` [PATCH v8 2/3] net/ice: support buffer split in Rx path wenxuanx.wu
2022-06-01 13:50   ` [PATCH v8 3/3] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
2022-06-02 13:20   ` [PATCH v8 0/3] ethdev: introduce protocol type based header split Andrew Rybchenko
2022-06-13 10:25 ` [PATCH v9 0/4] add an api to support proto based buffer split wenxuanx.wu
2022-06-13 10:25   ` [PATCH v9 1/4] ethdev: introduce protocol header API wenxuanx.wu
2022-07-07  9:05     ` Thomas Monjalon
2022-08-01  7:09       ` Wang, YuanX
2022-08-01 10:01         ` Thomas Monjalon
2022-08-02 10:12           ` Wang, YuanX
2022-07-08 15:00     ` Andrew Rybchenko
2022-08-01  7:17       ` Wang, YuanX
2022-06-13 10:25   ` [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split wenxuanx.wu
2022-07-07  9:07     ` Thomas Monjalon
2022-07-11  9:54       ` Ding, Xuan
2022-07-11 10:12         ` Thomas Monjalon
2022-07-08 15:00     ` Andrew Rybchenko
2022-07-21  3:24       ` Ding, Xuan
2022-08-01 14:28         ` Andrew Rybchenko
2022-08-02  7:22           ` Ding, Xuan
2022-06-13 10:25   ` [PATCH v9 3/4] app/testpmd: add rxhdrs commands and parameters wenxuanx.wu
2022-06-13 10:25   ` [PATCH v9 4/4] net/ice: support buffer split in Rx path wenxuanx.wu
2022-06-21  8:56   ` [PATCH v9 0/4] add an api to support proto based buffer split Ding, Xuan
2022-07-07  9:10     ` Thomas Monjalon
2022-07-11 10:08       ` Ding, Xuan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).