DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
@ 2020-08-17 17:49 Slava Ovsiienko
  2020-09-17 16:55 ` Andrew Rybchenko
                   ` (13 more replies)
  0 siblings, 14 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-08-17 17:49 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, stephen, ferruh.yigit, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	arybchenko, Asaf Penso

From 7f7052d8b85ff3ff7011bd844b6d3169c6e51923 Mon Sep 17 00:00:00 2001
From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Date: Mon, 17 Aug 2020 16:57:43 +0000
Subject: [RFC] ethdev: introduce Rx buffer split

The DPDK datapath in the transmit direction is very flexible.
An application can build the multisegment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended infoirmation how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offeset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer spllit feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 lib/librte_ethdev/rte_ethdev.c      | 166 ++++++++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h      |  15 ++++
 lib/librte_ethdev/rte_ethdev_core.h |  10 +++
 3 files changed, 191 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7858ad5..638e42d 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1933,6 +1933,172 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+	uint16_t seg_idx;
+	uint32_t mbp_buf_size;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP);
+
+	/*
+	 * Check the size of the mbuf data buffer.
+	 * This value must be provided in the private data of the memory pool.
+	 * First check that the memory pool has a valid private data.
+	 */
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+
+		if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size < rx_seg[seg_idx].length + rx_seg[seg_idx].offset) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %d < %d"
+				" (segment length=%d + segment offset=%d)\n",
+				mp->name, (int)mbp_buf_size,
+				(int)(rx_seg[seg_idx].length + rx_seg[seg_idx].offset),
+				(int)rx_seg[seg_idx].length,
+				(int)rx_seg[seg_idx].offset);
+			return -EINVAL;
+		}
+	}
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, new added offloads 0x%"PRIx64" must be "
+			"within per-queue offload capabilities 0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * If LRO is enabled, check that the maximum aggregated packet
+	 * size is supported by the configured device.
+	 */
+	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
+		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
+			dev->data->dev_conf.rxmode.max_lro_pkt_size =
+				dev->data->dev_conf.rxmode.max_rx_pkt_len;
+		int ret = check_lro_pkt_size(port_id,
+				dev->data->dev_conf.rxmode.max_lro_pkt_size,
+				dev->data->dev_conf.rxmode.max_rx_pkt_len,
+				dev_info.max_lro_pkt_size);
+		if (ret != 0)
+			return ret;
+	}
+
+	ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg);
+	if (!ret) {
+		if (!dev->data->min_rx_buf_size ||
+		    dev->data->min_rx_buf_size > mbp_buf_size)
+			dev->data->min_rx_buf_size = mbp_buf_size;
+	}
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 70295d7..701264a 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -938,6 +938,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment maximal data length */
+	uint16_t offset; /**< Data offset from beggining of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1988,6 +1998,11 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
 
+int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 32407dd..27018de 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -265,6 +265,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_queue_setup_ex_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -659,6 +668,7 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_queue_setup_ex_t    rx_queue_setup_ex;/**< Set up device RX queue. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 	eth_rx_queue_count_t       rx_queue_count;
 	/**< Get the number of used RX descriptors. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
@ 2020-09-17 16:55 ` Andrew Rybchenko
  2020-10-01  8:54   ` Slava Ovsiienko
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-09-17 16:55 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Thomas Monjalon, stephen, ferruh.yigit, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	Asaf Penso

On 8/17/20 8:49 PM, Slava Ovsiienko wrote:
>>From 7f7052d8b85ff3ff7011bd844b6d3169c6e51923 Mon Sep 17 00:00:00 2001
> From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Date: Mon, 17 Aug 2020 16:57:43 +0000
> Subject: [RFC] ethdev: introduce Rx buffer split
> 
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multisegment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended infoirmation how to split the packets being received.

infoirmation -> information

> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>      struct rte_mempool *mp; /* memory pools to allocate segment from */
>      uint16_t length; /* segment maximal data length */
>      uint16_t offset; /* data offset from beginning of mbuf data buffer */
>      uint32_t reserved; /* reserved field */
> };
> 
> The new routine rte_eth_rx_queue_setup_ex() is introduced to
> setup the given Rx queue using the new extended Rx packet segment
> description:
> 
> int
> rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
>                            uint16_t nb_rx_desc, unsigned int socket_id,
>                            const struct rte_eth_rxconf *rx_conf,
> 		          const struct rte_eth_rxseg *rx_seg,
>                            uint16_t n_seg)
> 
> This routine presents the two new parameters:
>      rx_seg - pointer the array of segment descriptions, each element
>               describes the memory pool, maximal data length, initial
>               data offset from the beginning of data buffer in mbuf
>      n_seg - number of elements in the array
> 
> The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
> application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will allocate the first mbuf
> from the pool specified in the first segment descriptor and puts
> the data staring at specified offeset in the allocated mbuf data

offeset -> offset

> buffer. If packet length exceeds the specified segment length
> the next mbuf will be allocated according to the next segment
> descriptor (if any) and data will be put in its data buffer at
> specified offset and not exceeding specified length. If there is
> no next descriptor the next mbuf will be allocated and filled in the
> same way (from the same pool and with the same buffer offset/length)
> as the current one.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>      seg1 - pool1, len1=20B, off1=0B
>      seg2 - pool2, len2=20B, off2=0B
>      seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>      seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>      seg1 - 20B long @ 0 in mbuf from pool1
>      seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>      seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>      seg1 - 20B @ 0 in mbuf from pool1
>      seg2 - 20B @ 0 in mbuf from pool2
>      seg3 - 512B @ 0 in mbuf from pool3
>      seg4 - 512B @ 0 in mbuf from pool3
>      seg5 - 422B @ 0 in mbuf from pool3

The behaviour is logical, but what to do if HW can't do it,
i.e. use the last segment many times. Should it reject
configuration if provided segments are insufficient to fit
MTU packet? How to report the limitation?
(I'm still trying to convince that SCATTER and BUFFER_SPLIT
should be independent).

> 
> The offload DEV_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer spllit feature (if n_seg

spllit -> split

> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Also, the proposed segment description might be used to specify
> Rx packet split for some other features. For example, provide
> the way to specify the extra memory pool for the Header Split
> feature of some Intel PMD.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>   lib/librte_ethdev/rte_ethdev.c      | 166 ++++++++++++++++++++++++++++++++++++
>   lib/librte_ethdev/rte_ethdev.h      |  15 ++++
>   lib/librte_ethdev/rte_ethdev_core.h |  10 +++
>   3 files changed, 191 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 7858ad5..638e42d 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -1933,6 +1933,172 @@ struct rte_eth_dev *
>   }
>   
>   int
> +rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> +			  uint16_t nb_rx_desc, unsigned int socket_id,
> +			  const struct rte_eth_rxconf *rx_conf,
> +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
> +{
> +	int ret;
> +	uint16_t seg_idx;
> +	uint32_t mbp_buf_size;
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_rxconf local_conf;
> +	void **rxq;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}
> +
> +	if (rx_seg == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
> +		return -EINVAL;
> +	}
> +
> +	if (n_seg == 0) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP);
> +
> +	/*
> +	 * Check the size of the mbuf data buffer.
> +	 * This value must be provided in the private data of the memory pool.
> +	 * First check that the memory pool has a valid private data.
> +	 */
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
> +
> +		if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> +				mp->name, (int)mp->private_data_size,
> +				(int)sizeof(struct rte_pktmbuf_pool_private));
> +			return -ENOSPC;
> +		}
> +
> +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +		if (mbp_buf_size < rx_seg[seg_idx].length + rx_seg[seg_idx].offset) {
> +			RTE_ETHDEV_LOG(ERR,
> +				"%s mbuf_data_room_size %d < %d"
> +				" (segment length=%d + segment offset=%d)\n",
> +				mp->name, (int)mbp_buf_size,
> +				(int)(rx_seg[seg_idx].length + rx_seg[seg_idx].offset),
> +				(int)rx_seg[seg_idx].length,
> +				(int)rx_seg[seg_idx].offset);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0) {
> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> +		/* If driver default is also zero, fall back on EAL default */
> +		if (nb_rx_desc == 0)
> +			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> +	}
> +
> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> +
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for nb_rx_desc(=%hu), should be: <= %hu, >= %hu, and a product of %hu\n",
> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> +			dev_info.rx_desc_lim.nb_min,
> +			dev_info.rx_desc_lim.nb_align);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> +		return -EBUSY;
> +
> +	if (dev->data->dev_started &&
> +		(dev->data->rx_queue_state[rx_queue_id] !=
> +			RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id]) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +
> +	if (rx_conf == NULL)
> +		rx_conf = &dev_info.default_rxconf;
> +
> +	local_conf = *rx_conf;
> +
> +	/*
> +	 * If an offloading has already been enabled in
> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> +	 * so there is no need to enable it in this queue again.
> +	 * The local_conf.offloads input to underlying PMD only carries
> +	 * those offloadings which are only enabled on this queue and
> +	 * not enabled on all queues.
> +	 */
> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> +
> +	/*
> +	 * New added offloadings for this queue are those not enabled in
> +	 * rte_eth_dev_configure() and they must be per-queue type.
> +	 * A pure per-port offloading can't be enabled on a queue while
> +	 * disabled on another queue. A pure per-port offloading can't
> +	 * be enabled for any queue as new added one if it hasn't been
> +	 * enabled in rte_eth_dev_configure().
> +	 */
> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> +	     local_conf.offloads) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port_id=%d rx_queue_id=%d, new added offloads 0x%"PRIx64" must be "
> +			"within per-queue offload capabilities 0x%"PRIx64" in %s()\n",
> +			port_id, rx_queue_id, local_conf.offloads,
> +			dev_info.rx_queue_offload_capa,
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * If LRO is enabled, check that the maximum aggregated packet
> +	 * size is supported by the configured device.
> +	 */
> +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
> +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
> +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
> +				dev->data->dev_conf.rxmode.max_rx_pkt_len;
> +		int ret = check_lro_pkt_size(port_id,
> +				dev->data->dev_conf.rxmode.max_lro_pkt_size,
> +				dev->data->dev_conf.rxmode.max_rx_pkt_len,
> +				dev_info.max_lro_pkt_size);
> +		if (ret != 0)
> +			return ret;
> +	}
> +
> +	ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc,
> +						 socket_id, &local_conf,
> +						 rx_seg, n_seg);
> +	if (!ret) {
> +		if (!dev->data->min_rx_buf_size ||
> +		    dev->data->min_rx_buf_size > mbp_buf_size)
> +			dev->data->min_rx_buf_size = mbp_buf_size;
> +	}
> +
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>   rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>   			       uint16_t nb_rx_desc,
>   			       const struct rte_eth_hairpin_conf *conf)

I dislike the idea to introduce new device operation.
rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will
mean that PMD looks at the split configuration location there.

Above duplication is simply not acceptable, but even if we
factor our shared code into helper function, I still see no
point to introduce new device operation.

[snip]


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-09-17 16:55 ` Andrew Rybchenko
@ 2020-10-01  8:54   ` Slava Ovsiienko
  2020-10-12  8:45     ` Andrew Rybchenko
  0 siblings, 1 reply; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-01  8:54 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Thomas Monjalon, stephen, ferruh.yigit, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	Asaf Penso

Hi, Andrew

Thank you for the comments, please see my replies below.

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Thursday, September 17, 2020 19:55
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: Thomas Monjalon <thomasm@mellanox.com>;
> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>
> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
[snip]
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
> >      seg1 - pool1, len1=20B, off1=0B
> >      seg2 - pool2, len2=20B, off2=0B
> >      seg3 - pool3, len3=512B, off3=0B
> >
> > The packet 46 bytes long will look like the following:
> >      seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >      seg1 - 20B long @ 0 in mbuf from pool1
> >      seg2 - 12B long @ 0 in mbuf from pool2
> >
> > The packet 1500 bytes long will look like the following:
> >      seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >      seg1 - 20B @ 0 in mbuf from pool1
> >      seg2 - 20B @ 0 in mbuf from pool2
> >      seg3 - 512B @ 0 in mbuf from pool3
> >      seg4 - 512B @ 0 in mbuf from pool3
> >      seg5 - 422B @ 0 in mbuf from pool3
> 
> The behaviour is logical, but what to do if HW can't do it, i.e. use the last
> segment many times. Should it reject configuration if provided segments are
> insufficient to fit MTU packet? How to report the limitation?
> (I'm still trying to convince that SCATTER and BUFFER_SPLIT should be
> independent).

BUFFER_SPLIT is rather the way to tune SCATTER. Currently scattering
happens on unconditional mbuf data buffer boundaries (we have reserved
HEAD space in the first mbuf and fill this one to the buffer end,
the next mbuf buffers might be filled completely). BUFFER_SPLIT provides
the way to specify the desired points to split packet, not just blindly
follow buffer boundaries. There is the check inplemented in common part
if each split segment fits the mbuf allocated from appropriate pool.
PMD should do extra check internally whether it supports the requested
split settings, if not - call will be rejected.

[snip]
> 
> I dislike the idea to introduce new device operation.
> rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will mean that
> PMD looks at the split configuration location there.
> 
We considered the approach of pushing split setting to the rxconf structure.
[http://patches.dpdk.org/patch/75205/]
But it seems there are some issues:

- the split configuration description requires the variable length array (due
  to variations in number of segments), so rte_eth_rxconf structure would
  have the variable length (not nice, IMO).

  We could push pointers to the array of rte_eth_rxseg, but we would lost
  the single structure (and contiguous memory) simplicity, this approach has
  no advantages over the specifying the split configuration as parameters
  of setup_ex().

- it would introduces the ambiguity, rte_eth_rx_queue_setup() specifies the single
  mbuf pool as parameter. What should we do with it? Set to NULL? 
  Treat as the first  pool? I would prefer to specify all split segments in
  uniform fashion, i.e. as array or rte_eth_rxseg structures (and it can be
  easily updated with some extra segment attributes if needed). So, in my
  opinion, we should remove/replace the pool parameter in rx_queue_setup
 (by introducing new func).

- specifying the new extended setup roiutine has an advantage that we should
  not update any PMDs code in part of existing implementations of
  rte_eth_rx_queue_setup().

  If PMD supports BUFFER_SPLIT (or other related feature) it just should provide
  rte_eth_rx_queue_setup_ex() and check the DEV_RX_OFFLOAD_BUFFER_SPLIT
  (or HEADER_SPLIT, or ever feature) it supports. The common code does
  not check the feature flags - it is on PMDs' own. In order to configure PMD
  to perfrom arbitrary desired Rx spliting the application should check
  DEV_RX_OFFLOAD_BUFFER_SPLIT in port capabilites, if found - set
  DEV_RX_OFFLOAD_BUFFER_SPLIT in configuration and call rte_eth_rx_queue_setup_ex().
  And this approach can be followed for any other split related feature.

With best regards, Slava


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 0/5] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
  2020-09-17 16:55 ` Andrew Rybchenko
@ 2020-10-05  6:26 ` " Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 1/5] " Viacheslav Ovsiienko
                     ` (4 more replies)
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (11 subsequent siblings)
  13 siblings, 5 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
                          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

[RFC]: http://patches.dpdk.org/patch/75582/
Related deprecation note (revoked): http://patches.dpdk.org/patch/75205/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Viacheslav Ovsiienko (5):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add extended Rx queue setup

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      |  96 ++++++++++++----
 app/test-pmd/config.c                       |  63 +++++++++-
 app/test-pmd/parameters.c                   |  39 ++++++-
 app/test-pmd/testpmd.c                      | 108 ++++++++++++-----
 app/test-pmd/testpmd.h                      |  41 ++++++-
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/release_20_11.rst      |   6 +
 doc/guides/testpmd_app_ug/run_app.rst       |  16 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 +++-
 lib/librte_ethdev/rte_ethdev.c              | 172 ++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h              |  16 +++
 lib/librte_ethdev/rte_ethdev_driver.h       |  10 ++
 13 files changed, 535 insertions(+), 72 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 1/5] ethdev: introduce Rx buffer split
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
@ 2020-10-05  6:26   ` " Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 2/5] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst           |  15 +++
 doc/guides/rel_notes/release_20_11.rst |   6 ++
 lib/librte_ethdev/rte_ethdev.c         | 172 +++++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h         |  16 +++
 lib/librte_ethdev/rte_ethdev_driver.h  |  10 ++
 5 files changed, 219 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..ac9dfd7 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rx_queue_setup_ex()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 4bcf220..8da5cc9 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduced extended buffer description for receiving.**
+
+  * Added extended Rx queue setup routine
+  * Added description for Rx segment sizes
+  * Added capability to specify the memory pool for each segment
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index dfe5c1b..ace7567 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -128,6 +128,7 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
@@ -1933,6 +1934,177 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+	uint16_t seg_idx;
+	uint32_t mbp_buf_size;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP);
+
+	/*
+	 * Check the size of the mbuf data buffer.
+	 * This value must be provided in the private data of the memory pool.
+	 * First check that the memory pool has a valid private data.
+	 */
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size <
+			 rx_seg[seg_idx].length + rx_seg[seg_idx].offset) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %d < %d"
+				" (segment length=%d + segment offset=%d)\n",
+				mp->name, (int)mbp_buf_size,
+				(int)(rx_seg[seg_idx].length +
+				      rx_seg[seg_idx].offset),
+				(int)rx_seg[seg_idx].length,
+				(int)rx_seg[seg_idx].offset);
+			return -EINVAL;
+		}
+	}
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: "
+			"<= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
+			" 0x%"PRIx64" must be within per-queue offload"
+			" capabilities 0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * If LRO is enabled, check that the maximum aggregated packet
+	 * size is supported by the configured device.
+	 */
+	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
+		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
+			dev->data->dev_conf.rxmode.max_lro_pkt_size =
+				dev->data->dev_conf.rxmode.max_rx_pkt_len;
+		int ret = check_lro_pkt_size(port_id,
+				dev->data->dev_conf.rxmode.max_lro_pkt_size,
+				dev->data->dev_conf.rxmode.max_rx_pkt_len,
+				dev_info.max_lro_pkt_size);
+		if (ret != 0)
+			return ret;
+	}
+
+	ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg);
+	if (!ret) {
+		if (!dev->data->min_rx_buf_size ||
+		    dev->data->min_rx_buf_size > mbp_buf_size)
+			dev->data->min_rx_buf_size = mbp_buf_size;
+	}
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 645a186..553900b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment maximal data length */
+	uint16_t offset; /**< Data offset from beggining of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1260,6 +1270,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define DEV_RX_OFFLOAD_BUFFER_SPLIT     0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2020,6 +2031,11 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
 
+int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 04ac8e9..de4d7de 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -264,6 +264,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_queue_setup_ex_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal extended Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -630,6 +639,7 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_queue_setup_ex_t    rx_queue_setup_ex;/**< Extended RX setup. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 2/5] app/testpmd: add multiple pools per core creation
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 1/5] " Viacheslav Ovsiienko
@ 2020-10-05  6:26   ` Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 3/5] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 0f984cc..d8bb7ca 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 08e123f..3f57182 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2898,7 +2898,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 17a6efe..7048288 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -625,7 +625,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3124,9 +3124,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1ead595..1f40d73 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -890,12 +892,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe6450c..f5060ee 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], 0);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, 0);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2928,13 +2935,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2978,7 +2985,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c7e7e41..e5cdd12 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -604,17 +613,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index e2539f6..2d5a263 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 3/5] app/testpmd: add buffer split offload configuration
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 1/5] " Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 2/5] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-05  6:26   ` Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 4/5] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 5/5] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  4 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for DEV_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 3f57182..24ca56a 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -874,16 +874,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18399,7 +18399,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18479,8 +18480,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18529,7 +18530,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18585,8 +18586,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7048288..395ea6b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    DEV_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 4/5] app/testpmd: add rxpkts commands and parameters
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 3/5] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-05  6:26   ` Viacheslav Ovsiienko
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 5/5] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  4 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set txpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 24ca56a..e0ac76e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -288,6 +288,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3880,6 +3886,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7499,6 +7551,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7511,12 +7565,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19569,6 +19623,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 395ea6b..ff09ead 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3096,6 +3096,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3113,10 +3157,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs >= (unsigned) nb_txd) {
 		printf("nb segments per TX packets=%u >= nb_txd=%u - ignored\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1f40d73..99f0223 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -661,6 +662,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1270,6 +1272,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f5060ee..3c88ca7 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index e5cdd12..0576b7c 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -815,7 +822,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 2d5a263..9286281 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 7f067af..0466920 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -760,6 +762,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH 5/5] app/testpmd: add extended Rx queue setup
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-05  6:26   ` [dpdk-dev] [PATCH 4/5] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-05  6:26   ` Viacheslav Ovsiienko
  4 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-05  6:26 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 38 ++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  6 ++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index e0ac76e..1c65499 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2912,12 +2912,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 3c88ca7..cd17cb0 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2412,6 +2412,40 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & DEV_RX_OFFLOAD_BUFFER_SPLIT) == 0)
+		return rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					      nb_rx_desc, socket_id,
+					      rx_conf, mp);
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	return rte_eth_rx_queue_setup_ex(port_id, rx_queue_id,
+					 nb_rx_desc, socket_id, rx_conf,
+					 rx_seg, rx_pkt_nb_segs);
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2520,7 +2554,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2536,7 +2570,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 0576b7c..1953c11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -869,6 +869,12 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
  2020-09-17 16:55 ` Andrew Rybchenko
  2020-10-05  6:26 ` [dpdk-dev] [PATCH 0/5] " Viacheslav Ovsiienko
@ 2020-10-07 15:06 ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 1/9] " Viacheslav Ovsiienko
                     ` (8 more replies)
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (10 subsequent siblings)
  13 siblings, 9 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
                          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
[RFC]: http://patches.dpdk.org/patch/75582/
Related deprecation note (revoked): http://patches.dpdk.org/patch/75205/

v1: http://patches.dpdk.org/patch/79594/
v2: add feature supoport to mlx5 PMD

Viacheslav Ovsiienko (9):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add extended Rx queue setup
  net/mlx5: add extended Rx queue setup routine
  net/mlx5: configure Rx queue to support split
  net/mlx5: register multiple pool for Rx queue
  net/mlx5: update Rx datapath to support split

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      |  96 +++++++++++---
 app/test-pmd/config.c                       |  63 ++++++++-
 app/test-pmd/parameters.c                   |  39 +++++-
 app/test-pmd/testpmd.c                      | 108 +++++++++++-----
 app/test-pmd/testpmd.h                      |  41 +++++-
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/release_20_11.rst      |   6 +
 doc/guides/testpmd_app_ug/run_app.rst       |  16 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 ++-
 drivers/net/mlx5/linux/mlx5_os.c            |   2 +
 drivers/net/mlx5/mlx5.h                     |   3 +
 drivers/net/mlx5/mlx5_mr.c                  |   3 +
 drivers/net/mlx5/mlx5_rxq.c                 | 194 +++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.c                |   3 +-
 drivers/net/mlx5/mlx5_rxtx.h                |  10 +-
 drivers/net/mlx5/mlx5_trigger.c             |  20 +--
 lib/librte_ethdev/rte_ethdev.c              | 174 +++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h              |  16 +++
 lib/librte_ethdev/rte_ethdev_driver.h       |  10 ++
 20 files changed, 733 insertions(+), 111 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` " Viacheslav Ovsiienko
  2020-10-11 22:17     ` Thomas Monjalon
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length */
    uint16_t offset; /* data offset from beginning of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rx_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf
    n_seg - number of elements in the array

The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will allocate the first mbuf
from the pool specified in the first segment descriptor and puts
the data staring at specified offset in the allocated mbuf data
buffer. If packet length exceeds the specified segment length
the next mbuf will be allocated according to the next segment
descriptor (if any) and data will be put in its data buffer at
specified offset and not exceeding specified length. If there is
no next descriptor the next mbuf will be allocated and filled in the
same way (from the same pool and with the same buffer offset/length)
as the current one.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
    seg1 - pool1, len1=20B, off1=0B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B long @ 0 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
    seg1 - 20B @ 0 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload DEV_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Also, the proposed segment description might be used to specify
Rx packet split for some other features. For example, provide
the way to specify the extra memory pool for the Header Split
feature of some Intel PMD.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst           |  15 +++
 doc/guides/rel_notes/release_20_11.rst |   6 ++
 lib/librte_ethdev/rte_ethdev.c         | 174 +++++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h         |  16 +++
 lib/librte_ethdev/rte_ethdev_driver.h  |  10 ++
 5 files changed, 221 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..ac9dfd7 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rx_queue_setup_ex()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 4bcf220..8da5cc9 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduced extended buffer description for receiving.**
+
+  * Added extended Rx queue setup routine
+  * Added description for Rx segment sizes
+  * Added capability to specify the memory pool for each segment
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index dfe5c1b..c626afa 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -128,6 +128,7 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
@@ -1933,6 +1934,179 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+	uint16_t seg_idx;
+	uint32_t mbp_buf_size;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP);
+
+	/*
+	 * Check the size of the mbuf data buffer.
+	 * This value must be provided in the private data of the memory pool.
+	 * First check that the memory pool has a valid private data.
+	 */
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size < rx_seg[seg_idx].length +
+				   rx_seg[seg_idx].offset +
+				   (seg_idx ? 0 :
+				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %d < %d"
+				" (segment length=%d + segment offset=%d)\n",
+				mp->name, (int)mbp_buf_size,
+				(int)(rx_seg[seg_idx].length +
+				      rx_seg[seg_idx].offset),
+				(int)rx_seg[seg_idx].length,
+				(int)rx_seg[seg_idx].offset);
+			return -EINVAL;
+		}
+	}
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: "
+			"<= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
+			" 0x%"PRIx64" must be within per-queue offload"
+			" capabilities 0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * If LRO is enabled, check that the maximum aggregated packet
+	 * size is supported by the configured device.
+	 */
+	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
+		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
+			dev->data->dev_conf.rxmode.max_lro_pkt_size =
+				dev->data->dev_conf.rxmode.max_rx_pkt_len;
+		int ret = check_lro_pkt_size(port_id,
+				dev->data->dev_conf.rxmode.max_lro_pkt_size,
+				dev->data->dev_conf.rxmode.max_rx_pkt_len,
+				dev_info.max_lro_pkt_size);
+		if (ret != 0)
+			return ret;
+	}
+
+	ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg);
+	if (!ret) {
+		if (!dev->data->min_rx_buf_size ||
+		    dev->data->min_rx_buf_size > mbp_buf_size)
+			dev->data->min_rx_buf_size = mbp_buf_size;
+	}
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 645a186..553900b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment maximal data length */
+	uint16_t offset; /**< Data offset from beggining of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1260,6 +1270,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define DEV_RX_OFFLOAD_BUFFER_SPLIT     0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2020,6 +2031,11 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
 
+int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 04ac8e9..de4d7de 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -264,6 +264,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rx_queue_setup_ex_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal extended Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -630,6 +639,7 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rx_queue_setup_ex_t    rx_queue_setup_ex;/**< Extended RX setup. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 2/9] app/testpmd: add multiple pools per core creation
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 1/9] " Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 0f984cc..d8bb7ca 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 08e123f..3f57182 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2898,7 +2898,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 17a6efe..7048288 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -625,7 +625,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3124,9 +3124,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1ead595..1f40d73 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -890,12 +892,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe6450c..f5060ee 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], 0);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, 0);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2928,13 +2935,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2978,7 +2985,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c7e7e41..e5cdd12 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -604,17 +613,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index e2539f6..2d5a263 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 3/9] app/testpmd: add buffer split offload configuration
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 1/9] " Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for DEV_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 3f57182..24ca56a 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -874,16 +874,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18399,7 +18399,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18479,8 +18480,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18529,7 +18530,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18585,8 +18586,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7048288..395ea6b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    DEV_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 4/9] app/testpmd: add rxpkts commands and parameters
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set txpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 24ca56a..e0ac76e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -288,6 +288,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3880,6 +3886,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7499,6 +7551,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7511,12 +7565,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19569,6 +19623,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 395ea6b..ff09ead 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3096,6 +3096,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3113,10 +3157,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs >= (unsigned) nb_txd) {
 		printf("nb segments per TX packets=%u >= nb_txd=%u - ignored\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1f40d73..99f0223 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -661,6 +662,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1270,6 +1272,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f5060ee..3c88ca7 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index e5cdd12..0576b7c 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -815,7 +822,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 2d5a263..9286281 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 7f067af..0466920 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -760,6 +762,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 5/9] app/testpmd: add extended Rx queue setup
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 38 ++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  6 ++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index e0ac76e..1c65499 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2912,12 +2912,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 3c88ca7..cd17cb0 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2412,6 +2412,40 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & DEV_RX_OFFLOAD_BUFFER_SPLIT) == 0)
+		return rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					      nb_rx_desc, socket_id,
+					      rx_conf, mp);
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	return rte_eth_rx_queue_setup_ex(port_id, rx_queue_id,
+					 nb_rx_desc, socket_id, rx_conf,
+					 rx_seg, rx_pkt_nb_segs);
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2520,7 +2554,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2536,7 +2570,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 0576b7c..1953c11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -869,6 +869,12 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 6/9] net/mlx5: add extended Rx queue setup routine
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The routine to provide Rx queue setup with specifying
extended receiving buffer description is added.
It allows application to specify desired segment
lengths, data position offsets in the buffer
and dedicated memory pool for each segment.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  2 +
 drivers/net/mlx5/mlx5.h          |  3 ++
 drivers/net/mlx5/mlx5_rxq.c      | 91 +++++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h     | 10 ++++-
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 81a2e99..11826c3 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2417,6 +2417,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_queue_setup_ex = mlx5_rx_queue_setup_ex,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
@@ -2500,6 +2501,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rx_queue_setup_ex = mlx5_rx_queue_setup_ex,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0907506..606f6c6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,6 +162,9 @@ struct mlx5_stats_ctrl {
 /* Maximal size of aggregated LRO packet. */
 #define MLX5_MAX_LRO_SIZE (UINT8_MAX * MLX5_LRO_SEG_CHUNK_SIZE)
 
+/* Maximal number of segments to split. */
+#define MLX5_MAX_RXQ_NSEG (1u << MLX5_MAX_LOG_RQ_SEGS)
+
 /* LRO configurations structure. */
 struct mlx5_lro_config {
 	uint32_t supported:1; /* Whether LRO is supported. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a9ccc2b..24a247c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -390,6 +390,7 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
 	uint64_t offloads = (DEV_RX_OFFLOAD_SCATTER |
+			     DEV_RX_OFFLOAD_BUFFER_SPLIT |
 			     DEV_RX_OFFLOAD_TIMESTAMP |
 			     DEV_RX_OFFLOAD_JUMBO_FRAME |
 			     DEV_RX_OFFLOAD_RSS_HASH);
@@ -715,16 +716,20 @@
  *   NUMA socket on which memory must be allocated.
  * @param[in] conf
  *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
+ * @param rx_seg
+ *   Pointer the array of segment descriptions, each element
+ *   describes the memory pool, maximal data length, initial
+ *   data offset from the beginning of data buffer in mbuf
+ * @param n_seg
+ *   Number of elements in the segment descriptions array
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+mlx5_rx_queue_setup_ex(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		       unsigned int socket, const struct rte_eth_rxconf *conf,
+		       const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
@@ -732,10 +737,43 @@
 		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 	int res;
 
+	if (!n_seg || !rx_seg) {
+		DRV_LOG(ERR, "port %u queue index %u invalid "
+			      "split description",
+			      dev->data->port_id, idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (n_seg > 1) {
+		uint64_t offloads = conf->offloads |
+				    dev->data->dev_conf.rxmode.offloads;
+
+		if (!(offloads & DEV_RX_OFFLOAD_SCATTER)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "configuration requires scattering",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (!(offloads & DEV_RX_OFFLOAD_BUFFER_SPLIT)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "offload not configured",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (n_seg > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR, "port %u queue index %u too many "
+				     "segments %u to split",
+				     dev->data->port_id, idx, n_seg);
+			rte_errno = EOVERFLOW;
+			return -rte_errno;
+		}
+	}
 	res = mlx5_rx_queue_pre_setup(dev, idx, &desc);
 	if (res)
 		return res;
-	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
+	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, rx_seg, n_seg);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
 			dev->data->port_id, idx);
@@ -756,6 +794,39 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg = {
+		.mp = mp,
+		/*
+		 * All other fields are zeroed, zero segment length
+		 * means the pool buffer size should be used by PMD.
+		 */
+	};
+	return mlx5_rx_queue_setup_ex(dev, idx, desc, socket, conf, &rx_seg, 1);
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
  * @param hairpin_conf
  *   Hairpin configuration parameters.
  *
@@ -1328,11 +1399,11 @@
 struct mlx5_rxq_ctrl *
 mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	     unsigned int socket, const struct rte_eth_rxconf *conf,
-	     struct rte_mempool *mp)
+	     const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *tmpl;
-	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
+	unsigned int mb_len = rte_pktmbuf_data_room_size(rx_seg[0].mp);
 	unsigned int mprq_stride_nums;
 	unsigned int mprq_stride_size;
 	unsigned int mprq_stride_cap;
@@ -1346,7 +1417,7 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1531,7 +1602,7 @@ struct mlx5_rxq_ctrl *
 		(!!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS));
 	tmpl->rxq.port_id = dev->data->port_id;
 	tmpl->priv = priv;
-	tmpl->rxq.mp = mp;
+	tmpl->rxq.mp = rx_seg[0].mp;
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.rq_repl_thresh =
 		MLX5_VPMD_RXQ_RPLNSH_THRESH(1 << tmpl->rxq.elts_n);
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 6876c1b..949a0ba 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -150,6 +150,9 @@ struct mlx5_rxq_data {
 	rte_spinlock_t *uar_lock_cq;
 	/* CQ (UAR) access lock required for 32bit implementations */
 #endif
+	struct rte_eth_rxseg rxseg[MLX5_MAX_RXQ_NSEG];
+	/* Buffer split segment descriptions - sizes, offsets, pools. */
+	uint32_t rxseg_n; /* Number of split segment descriptions. */
 	uint32_t tunnel; /* Tunnel information. */
 	uint64_t flow_meta_mask;
 	int32_t flow_meta_offset;
@@ -344,6 +347,10 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rx_queue_setup_ex
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_rxconf *conf,
+	 const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
 int mlx5_rx_hairpin_queue_setup
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
@@ -356,7 +363,8 @@ int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
-				   struct rte_mempool *mp);
+				   const struct rte_eth_rxseg *rx_seg,
+				   uint16_t n_seg);
 struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 7/9] net/mlx5: configure Rx queue to support split
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The scatter-gather elements should be configured
accordingly to support the buffer split feature.
The application provides the desired settings for
the segments at the beginning of the packets and
PMD pads the buffer chain (if needed) with attributes
of last specified segment to accommodate the packet
of maximal length.

There are some limitations are implied. The MPRQ
feature should be disengaged if split is requested,
due to MPRQ neither supports pushing data to the
dedicated pools nor follows the flexible buffer sizes.
The vectorized rx_burst routines does not support
the scattering (these ones are extremely simplified
and work over the single segment only) and can't
handle split as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 94 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 24a247c..44be6df 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1417,7 +1417,8 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1 &&
+			    !rx_seg[0].offset && !rx_seg[0].length;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1425,22 +1426,87 @@ struct mlx5_rxq_ctrl *
 							RTE_PKTMBUF_HEADROOM;
 	unsigned int max_lro_size = 0;
 	unsigned int first_mb_free_size = mb_len - RTE_PKTMBUF_HEADROOM;
+	const struct rte_eth_rxseg *qs_seg = rx_seg;
+	unsigned int tail_len;
 
-	if (non_scatter_min_mbuf_size > mb_len && !(offloads &
-						    DEV_RX_OFFLOAD_SCATTER)) {
+	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
+			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_ASSERT(n_seg && n_seg <= MLX5_MAX_RXQ_NSEG);
+	/*
+	 * Build the array of actual buffer offsets and lengths.
+	 * Pad with the buffers from the last memory pool if
+	 * needed to handle max size packets, replace zero length
+	 * with the buffer length from the pool.
+	 */
+	tail_len = max_rx_pkt_len;
+	do {
+		struct rte_eth_rxseg *hw_seg =
+					&tmpl->rxq.rxseg[tmpl->rxq.rxseg_n];
+		uint32_t buf_len = rte_pktmbuf_data_room_size(qs_seg->mp);
+		uint32_t offset, seg_len;
+
+		/*
+		 * For the buffers beyond descriptions offset is zero,
+		 * the first buffer contains head room.
+		 */
+		offset = (tmpl->rxq.rxseg_n >= n_seg ? 0 : qs_seg->offset) +
+			 (tmpl->rxq.rxseg_n ? 0 : RTE_PKTMBUF_HEADROOM);
+		/*
+		 * For the buffers beyond descriptions the length is
+		 * pool buffer length, zero lengths are replaced with
+		 * pool buffer length either.
+		 */
+		seg_len = tmpl->rxq.rxseg_n >= n_seg ? buf_len :
+			  qs_seg->length ? qs_seg->length : (buf_len - offset);
+		/* Check is done in long int, now overflows. */
+		if (buf_len < seg_len + offset) {
+			DRV_LOG(ERR, "port %u Rx queue %u: Split offset/length "
+				     "%u/%u can't be satisfied",
+				     dev->data->port_id, idx,
+				     qs_seg->length, qs_seg->offset);
+			rte_errno = EINVAL;
+			goto error;
+		}
+		if (seg_len > tail_len)
+			seg_len = buf_len - offset;
+		if (++tmpl->rxq.rxseg_n > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR,
+				"port %u too many SGEs (%u) needed to handle"
+				" requested maximum packet size %u, the maximum"
+				" supported are %u", dev->data->port_id,
+				tmpl->rxq.rxseg_n, max_rx_pkt_len,
+				MLX5_MAX_RXQ_NSEG);
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		/* Build the actual scattering element in the queue object. */
+		hw_seg->mp = qs_seg->mp;
+		MLX5_ASSERT(offset <= UINT16_MAX);
+		MLX5_ASSERT(seg_len <= UINT16_MAX);
+		hw_seg->offset = (uint16_t)offset;
+		hw_seg->length = (uint16_t)seg_len;
+		/*
+		 * Advance the segment descriptor, the padding is the based
+		 * on the atrtributes of the last descriptor.
+		 */
+		if (tmpl->rxq.rxseg_n < n_seg)
+			qs_seg++;
+		tail_len -= RTE_MIN(tail_len, seg_len);
+	} while (tail_len || !rte_is_power_of_2(tmpl->rxq.rxseg_n));
+	MLX5_ASSERT(tmpl->rxq.rxseg_n &&
+		    tmpl->rxq.rxseg_n <= MLX5_MAX_RXQ_NSEG);
+	if (tmpl->rxq.rxseg_n > 1 && !(offloads & DEV_RX_OFFLOAD_SCATTER)) {
 		DRV_LOG(ERR, "port %u Rx queue %u: Scatter offload is not"
 			" configured and no enough mbuf space(%u) to contain "
 			"the maximum RX packet length(%u) with head-room(%u)",
 			dev->data->port_id, idx, mb_len, max_rx_pkt_len,
 			RTE_PKTMBUF_HEADROOM);
 		rte_errno = ENOSPC;
-		return NULL;
-	}
-	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
-			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
-	if (!tmpl) {
-		rte_errno = ENOMEM;
-		return NULL;
+		goto error;
 	}
 	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
@@ -1467,7 +1533,7 @@ struct mlx5_rxq_ctrl *
 	 *  - The number of descs is more than the number of strides.
 	 *  - max_rx_pkt_len plus overhead is less than the max size
 	 *    of a stride or mprq_stride_size is specified by a user.
-	 *    Need to nake sure that there are enough stides to encap
+	 *    Need to make sure that there are enough stides to encap
 	 *    the maximum packet size in case mprq_stride_size is set.
 	 *  Otherwise, enable Rx scatter if necessary.
 	 */
@@ -1497,11 +1563,11 @@ struct mlx5_rxq_ctrl *
 			" strd_num_n = %u, strd_sz_n = %u",
 			dev->data->port_id, idx,
 			tmpl->rxq.strd_num_n, tmpl->rxq.strd_sz_n);
-	} else if (max_rx_pkt_len <= first_mb_free_size) {
+	} else if (tmpl->rxq.rxseg_n == 1) {
+		MLX5_ASSERT(max_rx_pkt_len <= first_mb_free_size);
 		tmpl->rxq.sges_n = 0;
 		max_lro_size = max_rx_pkt_len;
 	} else if (offloads & DEV_RX_OFFLOAD_SCATTER) {
-		unsigned int size = non_scatter_min_mbuf_size;
 		unsigned int sges_n;
 
 		if (lro_on_queue && first_mb_free_size <
@@ -1516,7 +1582,7 @@ struct mlx5_rxq_ctrl *
 		 * Determine the number of SGEs needed for a full packet
 		 * and round it to the next power of two.
 		 */
-		sges_n = log2above((size / mb_len) + !!(size % mb_len));
+		sges_n = log2above(tmpl->rxq.rxseg_n);
 		if (sges_n > MLX5_MAX_LOG_RQ_SEGS) {
 			DRV_LOG(ERR,
 				"port %u too many SGEs (%u) needed to handle"
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 8/9] net/mlx5: register multiple pool for Rx queue
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The split feature for receiving packets was added to the mlx5
PMD, now Rx queue can receive the data to the buffers belonging
to the different pools and the memory of all the involved pool
must be registered for DMA operations in order to allow hardware
to store the data.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_mr.c      |  3 +++
 drivers/net/mlx5/mlx5_trigger.c | 20 ++++++++++++--------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index dbcf0aa..c308ecc 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -536,6 +536,9 @@ struct mr_update_mp_data {
 		.ret = 0,
 	};
 
+	DRV_LOG(DEBUG, "Port %u Rx queue registering mp %s "
+		       "having %u chunks.", dev->data->port_id,
+		       mp->name, mp->nb_mem_chunks);
 	rte_mempool_mem_iter(mp, mlx5_mr_update_mp_cb, &data);
 	if (data.ret < 0 && rte_errno == ENXIO) {
 		/* Mempool may have externally allocated memory. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 0f4d031..e25a2b7 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -123,18 +123,22 @@
 		dev->data->port_id, priv->sh->device_attr.max_sge);
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_get(dev, i);
-		struct rte_mempool *mp;
 
 		if (!rxq_ctrl)
 			continue;
 		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD) {
-			/* Pre-register Rx mempool. */
-			mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
-			     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-			DRV_LOG(DEBUG, "Port %u Rx queue %u registering mp %s"
-				" having %u chunks.", dev->data->port_id,
-				rxq_ctrl->rxq.idx, mp->name, mp->nb_mem_chunks);
-			mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl, mp);
+			/* Pre-register Rx mempools. */
+			if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) {
+				mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl,
+						  rxq_ctrl->rxq.mprq_mp);
+			} else {
+				uint32_t s;
+
+				for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++)
+					mlx5_mr_update_mp
+						(dev, &rxq_ctrl->rxq.mr_ctrl,
+						rxq_ctrl->rxq.rxseg[s].mp);
+			}
 			ret = rxq_alloc_elts(rxq_ctrl);
 			if (ret)
 				goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2 9/9] net/mlx5: update Rx datapath to support split
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
@ 2020-10-07 15:06   ` Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-07 15:06 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Only the regular rx_burst routine is updated to support split,
because the vectorized ones does not support scatter and MPRQ
does not support split at all.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c  | 11 +++++------
 drivers/net/mlx5/mlx5_rxtx.c |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 44be6df..5a035cc 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -210,9 +210,10 @@
 
 	/* Iterate on segments. */
 	for (i = 0; (i != elts_n); ++i) {
+		struct rte_eth_rxseg *seg = &rxq_ctrl->rxq.rxseg[i % sges_n];
 		struct rte_mbuf *buf;
 
-		buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
+		buf = rte_pktmbuf_alloc(seg->mp);
 		if (buf == NULL) {
 			DRV_LOG(ERR, "port %u empty mbuf pool",
 				PORT_ID(rxq_ctrl->priv));
@@ -225,12 +226,10 @@
 		MLX5_ASSERT(rte_pktmbuf_data_len(buf) == 0);
 		MLX5_ASSERT(rte_pktmbuf_pkt_len(buf) == 0);
 		MLX5_ASSERT(!buf->next);
-		/* Only the first segment keeps headroom. */
-		if (i % sges_n)
-			SET_DATA_OFF(buf, 0);
+		SET_DATA_OFF(buf, seg->offset);
 		PORT(buf) = rxq_ctrl->rxq.port_id;
-		DATA_LEN(buf) = rte_pktmbuf_tailroom(buf);
-		PKT_LEN(buf) = DATA_LEN(buf);
+		DATA_LEN(buf) = seg->length;
+		PKT_LEN(buf) = seg->length;
 		NB_SEGS(buf) = 1;
 		(*rxq_ctrl->rxq.elts)[i] = buf;
 	}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 101555e..ad4da09 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1430,7 +1430,8 @@ enum mlx5_txcmp_code {
 		rte_prefetch0(seg);
 		rte_prefetch0(cqe);
 		rte_prefetch0(wqe);
-		rep = rte_mbuf_raw_alloc(rxq->mp);
+		/* Allocate the buf from the same pool. */
+		rep = rte_mbuf_raw_alloc(seg->pool);
 		if (unlikely(rep == NULL)) {
 			++rxq->stats.rx_nombuf;
 			if (!pkt) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
  2020-10-07 15:06   ` [dpdk-dev] [PATCH v2 1/9] " Viacheslav Ovsiienko
@ 2020-10-11 22:17     ` Thomas Monjalon
  2020-10-12  9:40       ` Slava Ovsiienko
  0 siblings, 1 reply; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-11 22:17 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

07/10/2020 17:06, Viacheslav Ovsiienko:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length */

The "length" parameter is configuring a split point.
Worth to note in the comment I think.

>     uint16_t offset; /* data offset from beginning of mbuf data buffer */

Is it replacing RTE_PKTMBUF_HEADROOM?

>     uint32_t reserved; /* reserved field */
> };
> 
> The new routine rte_eth_rx_queue_setup_ex() is introduced to
> setup the given Rx queue using the new extended Rx packet segment
> description:
> 
> int
> rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
>                           uint16_t nb_rx_desc, unsigned int socket_id,
>                           const struct rte_eth_rxconf *rx_conf,
> 		          const struct rte_eth_rxseg *rx_seg,
>                           uint16_t n_seg)

An alternative name for this function:
	rte_eth_rxseg_queue_setup

> This routine presents the two new parameters:
>     rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf
>     n_seg - number of elements in the array

Not clear why we need an array.
I suggest writing here that each segment of the same packet
can have different properties, the array representing the full packet.

> The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device

The name should start with RTE_ prefix.

> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
> application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will allocate the first mbuf
> from the pool specified in the first segment descriptor and puts
> the data staring at specified offset in the allocated mbuf data
> buffer. If packet length exceeds the specified segment length
> the next mbuf will be allocated according to the next segment
> descriptor (if any) and data will be put in its data buffer at
> specified offset and not exceeding specified length. If there is
> no next descriptor the next mbuf will be allocated and filled in the
> same way (from the same pool and with the same buffer offset/length)
> as the current one.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>     seg1 - pool1, len1=20B, off1=0B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B long @ 0 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>     seg1 - 20B @ 0 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload DEV_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if n_seg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Also, the proposed segment description might be used to specify
> Rx packet split for some other features. For example, provide
> the way to specify the extra memory pool for the Header Split
> feature of some Intel PMD.

I don't understand what you are referring in this last paragraph.
I think explanation above is enough to demonstrate the flexibility.

> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Thank you, I like this feature.
More minor comments below.

[...]
> +* **Introduced extended buffer description for receiving.**

Rewording:
	Introduced extended setup of Rx queue

> +  * Added extended Rx queue setup routine
> +  * Added description for Rx segment sizes

not only "sizes", but also offset and mempool.

> +  * Added capability to specify the memory pool for each segment

This one can be merged with the above, or offset should be added.

[...]
The doxygen comment is missing here.

> +int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> +		uint16_t nb_rx_desc, unsigned int socket_id,
> +		const struct rte_eth_rxconf *rx_conf,
> +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);

This new function should be experimental and it should be added to the .map file.



^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-01  8:54   ` Slava Ovsiienko
@ 2020-10-12  8:45     ` Andrew Rybchenko
  2020-10-12  9:56       ` Slava Ovsiienko
  0 siblings, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-12  8:45 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Thomas Monjalon, stephen, ferruh.yigit, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	Asaf Penso

Hi Slava,

I'm sorry for late reply. See my notes below.

On 10/1/20 11:54 AM, Slava Ovsiienko wrote:
> Hi, Andrew
>
> Thank you for the comments, please see my replies below.
>
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Thursday, September 17, 2020 19:55
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>> Cc: Thomas Monjalon <thomasm@mellanox.com>;
>> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
>> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
>> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
>> <asafp@nvidia.com>
>> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
>>
> [snip]
>>>
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>> seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>>> seg1 - pool1, len1=20B, off1=0B
>>> seg2 - pool2, len2=20B, off2=0B
>>> seg3 - pool3, len3=512B, off3=0B
>>>
>>> The packet 46 bytes long will look like the following:
>>> seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>>> seg1 - 20B long @ 0 in mbuf from pool1
>>> seg2 - 12B long @ 0 in mbuf from pool2
>>>
>>> The packet 1500 bytes long will look like the following:
>>> seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>>> seg1 - 20B @ 0 in mbuf from pool1
>>> seg2 - 20B @ 0 in mbuf from pool2
>>> seg3 - 512B @ 0 in mbuf from pool3
>>> seg4 - 512B @ 0 in mbuf from pool3
>>> seg5 - 422B @ 0 in mbuf from pool3
>>
>> The behaviour is logical, but what to do if HW can't do it, i.e. use
>> the last
>> segment many times. Should it reject configuration if provided
>> segments are
>> insufficient to fit MTU packet? How to report the limitation?
>> (I'm still trying to convince that SCATTER and BUFFER_SPLIT should be
>> independent).
>
> BUFFER_SPLIT is rather the way to tune SCATTER. Currently scattering
> happens on unconditional mbuf data buffer boundaries (we have reserved
> HEAD space in the first mbuf and fill this one to the buffer end,
> the next mbuf buffers might be filled completely). BUFFER_SPLIT provides
> the way to specify the desired points to split packet, not just blindly
> follow buffer boundaries. There is the check inplemented in common part
> if each split segment fits the mbuf allocated from appropriate pool.
> PMD should do extra check internally whether it supports the requested
> split settings, if not - call will be rejected.
>

@Thomas, @Ferruh: I'd like to hear what other ethdev
maintainers think about it.

> [snip]
>>
>> I dislike the idea to introduce new device operation.
>> rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will mean that
>> PMD looks at the split configuration location there.
>>
> We considered the approach of pushing split setting to the rxconf
> structure.
> [http://patches.dpdk.org/patch/75205/]
> But it seems there are some issues:
>
> - the split configuration description requires the variable length
> array (due
> to variations in number of segments), so rte_eth_rxconf structure would
> have the variable length (not nice, IMO).
>
> We could push pointers to the array of rte_eth_rxseg, but we would lost
> the single structure (and contiguous memory) simplicity, this approach has
> no advantages over the specifying the split configuration as parameters
> of setup_ex().
>

I think it has a huge advantage to avoid extra device
operation.

> - it would introduces the ambiguity, rte_eth_rx_queue_setup()
> specifies the single
> mbuf pool as parameter. What should we do with it? Set to NULL? Treat
> as the first pool? I would prefer to specify all split segments in
> uniform fashion, i.e. as array or rte_eth_rxseg structures (and it can be
> easily updated with some extra segment attributes if needed). So, in my
> opinion, we should remove/replace the pool parameter in rx_queue_setup
> (by introducing new func).
>

I'm trying to resolve the ambiguity as described above
(see BUFFER_SPLIT vs SCATTER). Use the pointer for
tail segments with respect to SCATTER capability.

> - specifying the new extended setup roiutine has an advantage that we
> should
> not update any PMDs code in part of existing implementations of
> rte_eth_rx_queue_setup().

It is not required since it is controlled by the new offload
flags. If the offload is not supported, the new field is
invisible for PMD (it simply ignores).

>
> If PMD supports BUFFER_SPLIT (or other related feature) it just should
> provide
> rte_eth_rx_queue_setup_ex() and check the DEV_RX_OFFLOAD_BUFFER_SPLIT
> (or HEADER_SPLIT, or ever feature) it supports. The common code does
> not check the feature flags - it is on PMDs' own. In order to
> configure PMD
> to perfrom arbitrary desired Rx spliting the application should check
> DEV_RX_OFFLOAD_BUFFER_SPLIT in port capabilites, if found - set
> DEV_RX_OFFLOAD_BUFFER_SPLIT in configuration and call
> rte_eth_rx_queue_setup_ex().
> And this approach can be followed for any other split related feature.
>
> With best regards, Slava
>



^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
  2020-10-11 22:17     ` Thomas Monjalon
@ 2020-10-12  9:40       ` Slava Ovsiienko
  2020-10-12 10:09         ` Thomas Monjalon
  0 siblings, 1 reply; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12  9:40 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Hi, Thomas

Thank you for the comments, please, see my answers below.

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, October 12, 2020 1:18
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; ferruh.yigit@intel.com;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
> 
> 07/10/2020 17:06, Viacheslav Ovsiienko:
> > The DPDK datapath in the transmit direction is very flexible.
> > An application can build the multi-segment packet and manages almost
> > all data aspects - the memory pools where segments are allocated from,
> > the segment lengths, the memory attributes like external buffers,
> > registered for DMA, etc.
> >
> > In the receiving direction, the datapath is much less flexible, an
> > application can only specify the memory pool to configure the
> > receiving queue and nothing more. In order to extend receiving
> > datapath capabilities it is proposed to add the way to provide
> > extended information how to split the packets being received.
> >
> > The following structure is introduced to specify the Rx packet
> > segment:
> >
> > struct rte_eth_rxseg {
> >     struct rte_mempool *mp; /* memory pools to allocate segment from */
> >     uint16_t length; /* segment maximal data length */
> 
> The "length" parameter is configuring a split point.
> Worth to note in the comment I think.

OK, got it.

> 
> >     uint16_t offset; /* data offset from beginning of mbuf data buffer
> > */
> 
> Is it replacing RTE_PKTMBUF_HEADROOM?
> 
Actually adding to HEAD_ROOM. We should keep HEAD_ROOM intact,
so actual data offset in the firtst mbuf must be the sum HEAD_ROOM + offset.
mlx5 PMD Imlementation follows this approach, documentation will be updated in v3.

> >     uint32_t reserved; /* reserved field */ };
> >
> > The new routine rte_eth_rx_queue_setup_ex() is introduced to setup the
> > given Rx queue using the new extended Rx packet segment
> > description:
> >
> > int
> > rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> >                           uint16_t nb_rx_desc, unsigned int socket_id,
> >                           const struct rte_eth_rxconf *rx_conf,
> > 		          const struct rte_eth_rxseg *rx_seg,
> >                           uint16_t n_seg)
> 
> An alternative name for this function:
> 	rte_eth_rxseg_queue_setup
M-m-m... Routine name follows patter object_verb:
rx_queue is an object, setup is an action.
rxseg_queue is not an object.
What about "rte_eth_rx_queue_setup_seg"?

> 
> > This routine presents the two new parameters:
> >     rx_seg - pointer the array of segment descriptions, each element
> >              describes the memory pool, maximal data length, initial
> >              data offset from the beginning of data buffer in mbuf
> >     n_seg - number of elements in the array
> 
> Not clear why we need an array.
> I suggest writing here that each segment of the same packet can have
> different properties, the array representing the full packet.
OK, will write.

> 
> > The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
> 
> The name should start with RTE_ prefix.
It is an existing pattern for DEV_RX_OFFLOAD_xxxx, no RTE_ for the case.

> 
> > capabilities is introduced to present the way for PMD to report to
> > application about supporting Rx packet split to configurable segments.
> > Prior invoking the rte_eth_rx_queue_setup_ex() routine application
> > should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag.
> >
> > If the Rx queue is configured with new routine the packets being
> > received will be split into multiple segments pushed to the mbufs with
> > specified attributes. The PMD will allocate the first mbuf from the
> > pool specified in the first segment descriptor and puts the data
> > staring at specified offset in the allocated mbuf data buffer. If
> > packet length exceeds the specified segment length the next mbuf will
> > be allocated according to the next segment descriptor (if any) and
> > data will be put in its data buffer at specified offset and not
> > exceeding specified length. If there is no next descriptor the next
> > mbuf will be allocated and filled in the same way (from the same pool
> > and with the same buffer offset/length) as the current one.
> >
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
> >     seg1 - pool1, len1=20B, off1=0B
> >     seg2 - pool2, len2=20B, off2=0B
> >     seg3 - pool3, len3=512B, off3=0B
> >
> > The packet 46 bytes long will look like the following:
> >     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >     seg1 - 20B long @ 0 in mbuf from pool1
> >     seg2 - 12B long @ 0 in mbuf from pool2
> >
> > The packet 1500 bytes long will look like the following:
> >     seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >     seg1 - 20B @ 0 in mbuf from pool1
> >     seg2 - 20B @ 0 in mbuf from pool2
> >     seg3 - 512B @ 0 in mbuf from pool3
> >     seg4 - 512B @ 0 in mbuf from pool3
> >     seg5 - 422B @ 0 in mbuf from pool3
> >
> > The offload DEV_RX_OFFLOAD_SCATTER must be present and configured
> to
> > support new buffer split feature (if n_seg is greater than one).
> >
> > The new approach would allow splitting the ingress packets into
> > multiple parts pushed to the memory with different attributes.
> > For example, the packet headers can be pushed to the embedded data
> > buffers within mbufs and the application data into the external
> > buffers attached to mbufs allocated from the different memory pools.
> > The memory attributes for the split parts may differ either - for
> > example the application data may be pushed into the external memory
> > located on the dedicated physical device, say GPU or NVMe. This would
> > improve the DPDK receiving datapath flexibility with preserving
> > compatibility with existing API.
> >
> > Also, the proposed segment description might be used to specify Rx
> > packet split for some other features. For example, provide the way to
> > specify the extra memory pool for the Header Split feature of some
> > Intel PMD.
> 
> I don't understand what you are referring in this last paragraph.
> I think explanation above is enough to demonstrate the flexibility.
> 
Just noted the segment description is common thing and could be
promoted to be used in some other features. 

> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> Thank you, I like this feature.
> More minor comments below.
> 
> [...]
> > +* **Introduced extended buffer description for receiving.**
> 
> Rewording:
> 	Introduced extended setup of Rx queue
OK, sounds better.

> 

> > +  * Added extended Rx queue setup routine
> > +  * Added description for Rx segment sizes
> 
> not only "sizes", but also offset and mempool.
> 
> > +  * Added capability to specify the memory pool for each segment
> 
> This one can be merged with the above, or offset should be added.
> 
> [...]
> The doxygen comment is missing here.
Yes, thank you. Also noted that, updating.

> 
> > +int rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> > +		uint16_t nb_rx_desc, unsigned int socket_id,
> > +		const struct rte_eth_rxconf *rx_conf,
> > +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
> 
> This new function should be experimental and it should be added to the
> .map file.
> 
OK.

With best regards, Slava


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12  8:45     ` Andrew Rybchenko
@ 2020-10-12  9:56       ` Slava Ovsiienko
  2020-10-12 15:14         ` Thomas Monjalon
  2020-10-13 21:59         ` Ferruh Yigit
  0 siblings, 2 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12  9:56 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Thomas Monjalon, stephen, ferruh.yigit, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	Asaf Penso

Hi, Andrew

Thank you for the comments.

We have two approaches how to specify multiple segments to split Rx packets:
1. update queue configuration structure
2. introduce new rx_queue_setup_ex() routine with extra parameters.

For [1] my only actual dislike is that we would have multiple places to specify
the pool - in rx_queue_setup() and in the config structure. So, we should
implement some checking (if we have offload flag set we should check
whether mp parameter is NULL and segment descriptions array pointer/size
is provided, if no offload flag set - we must check the description array is empty). 

> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
> about it.

Yes, it would be very nice to hear extra opinions. Do we think the providing
of extra API function is worse than extending existing structure, introducing
some conditional ambiguity and complicating the parameter compliance
check?

Now I'm updating the existing version on the patch based on rx_queue_ex()
and then could prepare the version for configuration structure,
it is not a problem - approaches are very similar, we just should choose
the most relevant one.

With best regards, Slava

> -----Original Message-----
> From: Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
> Sent: Monday, October 12, 2020 11:45
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: Thomas Monjalon <thomasm@mellanox.com>;
> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>
> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
> Hi Slava,
> 
> I'm sorry for late reply. See my notes below.
> 
> On 10/1/20 11:54 AM, Slava Ovsiienko wrote:
> > Hi, Andrew
> >
> > Thank you for the comments, please see my replies below.
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybchenko@solarflare.com>
> >> Sent: Thursday, September 17, 2020 19:55
> >> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> >> Cc: Thomas Monjalon <thomasm@mellanox.com>;
> >> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
> >> <shahafs@nvidia.com>; olivier.matz@6wind.com;
> jerinjacobk@gmail.com;
> >> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf
> Penso
> >> <asafp@nvidia.com>
> >> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> >>
> > [snip]
> >>>
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>> seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
> >>> seg1 - pool1, len1=20B, off1=0B
> >>> seg2 - pool2, len2=20B, off2=0B
> >>> seg3 - pool3, len3=512B, off3=0B
> >>>
> >>> The packet 46 bytes long will look like the following:
> >>> seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >>> seg1 - 20B long @ 0 in mbuf from pool1
> >>> seg2 - 12B long @ 0 in mbuf from pool2
> >>>
> >>> The packet 1500 bytes long will look like the following:
> >>> seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
> >>> seg1 - 20B @ 0 in mbuf from pool1
> >>> seg2 - 20B @ 0 in mbuf from pool2
> >>> seg3 - 512B @ 0 in mbuf from pool3
> >>> seg4 - 512B @ 0 in mbuf from pool3
> >>> seg5 - 422B @ 0 in mbuf from pool3
> >>
> >> The behaviour is logical, but what to do if HW can't do it, i.e. use
> >> the last segment many times. Should it reject configuration if
> >> provided segments are insufficient to fit MTU packet? How to report
> >> the limitation?
> >> (I'm still trying to convince that SCATTER and BUFFER_SPLIT should be
> >> independent).
> >
> > BUFFER_SPLIT is rather the way to tune SCATTER. Currently scattering
> > happens on unconditional mbuf data buffer boundaries (we have reserved
> > HEAD space in the first mbuf and fill this one to the buffer end, the
> > next mbuf buffers might be filled completely). BUFFER_SPLIT provides
> > the way to specify the desired points to split packet, not just
> > blindly follow buffer boundaries. There is the check inplemented in
> > common part if each split segment fits the mbuf allocated from
> appropriate pool.
> > PMD should do extra check internally whether it supports the requested
> > split settings, if not - call will be rejected.
> >
> 
> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
> about it.
> 
> > [snip]
> >>
> >> I dislike the idea to introduce new device operation.
> >> rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will mean
> >> that PMD looks at the split configuration location there.
> >>
> > We considered the approach of pushing split setting to the rxconf
> > structure.
> >
> [https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatc
> >
> hes.dpdk.org%2Fpatch%2F75205%2F&amp;data=02%7C01%7Cviacheslavo%
> 40nvidi
> >
> a.com%7C97a49cb62028432610ea08d86e8b3283%7C43083d15727340c1b7
> db39efd9c
> >
> cc17a%7C0%7C0%7C637380891414182285&amp;sdata=liII5DHGlJAL8wEwV
> Vika79tp
> > 8R9faTZ0lXrlfvQGZE%3D&amp;reserved=0]
> > But it seems there are some issues:
> >
> > - the split configuration description requires the variable length
> > array (due to variations in number of segments), so rte_eth_rxconf
> > structure would have the variable length (not nice, IMO).
> >
> > We could push pointers to the array of rte_eth_rxseg, but we would
> > lost the single structure (and contiguous memory) simplicity, this
> > approach has no advantages over the specifying the split configuration
> > as parameters of setup_ex().
> >
> 
> I think it has a huge advantage to avoid extra device operation.
> 
> > - it would introduces the ambiguity, rte_eth_rx_queue_setup()
> > specifies the single mbuf pool as parameter. What should we do with
> > it? Set to NULL? Treat as the first pool? I would prefer to specify
> > all split segments in uniform fashion, i.e. as array or rte_eth_rxseg
> > structures (and it can be easily updated with some extra segment
> > attributes if needed). So, in my opinion, we should remove/replace the
> > pool parameter in rx_queue_setup (by introducing new func).
> >
> 
> I'm trying to resolve the ambiguity as described above (see BUFFER_SPLIT vs
> SCATTER). Use the pointer for tail segments with respect to SCATTER
> capability.
> 
> > - specifying the new extended setup roiutine has an advantage that we
> > should not update any PMDs code in part of existing implementations of
> > rte_eth_rx_queue_setup().
> 
> It is not required since it is controlled by the new offload flags. If the offload
> is not supported, the new field is invisible for PMD (it simply ignores).
> 
> >
> > If PMD supports BUFFER_SPLIT (or other related feature) it just should
> > provide
> > rte_eth_rx_queue_setup_ex() and check the
> DEV_RX_OFFLOAD_BUFFER_SPLIT
> > (or HEADER_SPLIT, or ever feature) it supports. The common code does
> > not check the feature flags - it is on PMDs' own. In order to
> > configure PMD to perfrom arbitrary desired Rx spliting the application
> > should check DEV_RX_OFFLOAD_BUFFER_SPLIT in port capabilites, if found
> > - set DEV_RX_OFFLOAD_BUFFER_SPLIT in configuration and call
> > rte_eth_rx_queue_setup_ex().
> > And this approach can be followed for any other split related feature.
> >
> > With best regards, Slava
> >
> 


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/9] ethdev: introduce Rx buffer split
  2020-10-12  9:40       ` Slava Ovsiienko
@ 2020-10-12 10:09         ` Thomas Monjalon
  0 siblings, 0 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-12 10:09 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

12/10/2020 11:40, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > > int
> > > rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
> > >                           uint16_t nb_rx_desc, unsigned int socket_id,
> > >                           const struct rte_eth_rxconf *rx_conf,
> > > 		          const struct rte_eth_rxseg *rx_seg,
> > >                           uint16_t n_seg)
> > 
> > An alternative name for this function:
> > 	rte_eth_rxseg_queue_setup
> M-m-m... Routine name follows patter object_verb:
> rx_queue is an object, setup is an action.
> rxseg_queue is not an object.
> What about "rte_eth_rx_queue_setup_seg"?

rte_eth_rxseg is the name of the struct,
so it looks natural to me to keep it as prefix (object name).

[...]
> > > The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device
> > 
> > The name should start with RTE_ prefix.
> 
> It is an existing pattern for DEV_RX_OFFLOAD_xxxx, no RTE_ for the case.

It is a wrong pattern which must be fixed.
Please start fresh with the right prefix for new ones.
Thinking twice, it should be:
	RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT

[...]
> > > Also, the proposed segment description might be used to specify Rx
> > > packet split for some other features. For example, provide the way to
> > > specify the extra memory pool for the Header Split feature of some
> > > Intel PMD.
> > 
> > I don't understand what you are referring in this last paragraph.
> > I think explanation above is enough to demonstrate the flexibility.
> > 
> Just noted the segment description is common thing and could be
> promoted to be used in some other features. 

I think it is not needed. And giving Intel as an example is arbitrary.




^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12  9:56       ` Slava Ovsiienko
@ 2020-10-12 15:14         ` Thomas Monjalon
  2020-10-12 15:28           ` Ananyev, Konstantin
  2020-10-12 16:03           ` Andrew Rybchenko
  2020-10-13 21:59         ` Ferruh Yigit
  1 sibling, 2 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-12 15:14 UTC (permalink / raw)
  To: Andrew Rybchenko, ferruh.yigit, Slava Ovsiienko
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso

12/10/2020 11:56, Slava Ovsiienko:
> Hi, Andrew
> 
> Thank you for the comments.
> 
> We have two approaches how to specify multiple segments to split Rx packets:
> 1. update queue configuration structure
> 2. introduce new rx_queue_setup_ex() routine with extra parameters.
> 
> For [1] my only actual dislike is that we would have multiple places to specify
> the pool - in rx_queue_setup() and in the config structure. So, we should
> implement some checking (if we have offload flag set we should check
> whether mp parameter is NULL and segment descriptions array pointer/size
> is provided, if no offload flag set - we must check the description array is empty). 
> 
> > @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
> > about it.
> 
> Yes, it would be very nice to hear extra opinions. Do we think the providing
> of extra API function is worse than extending existing structure, introducing
> some conditional ambiguity and complicating the parameter compliance
> check?

Let's try listing pros and cons of each approach, so we can conclude.

1/ update queue config struct

	1.1 pro: keep same queue setup function
	1.2 con: two mempool pointers (struct or function)
	1.3 con: variable size of segment description array

2/ new queue setup function

	2.1 con: two functions for queue setup
	2.2 pro: mempool pointer is not redundant
	2.3 pro: segment description array size defined by the caller

What else I'm missing?



^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:14         ` Thomas Monjalon
@ 2020-10-12 15:28           ` Ananyev, Konstantin
  2020-10-12 15:34             ` Slava Ovsiienko
  2020-10-12 16:03           ` Andrew Rybchenko
  1 sibling, 1 reply; 172+ messages in thread
From: Ananyev, Konstantin @ 2020-10-12 15:28 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko, Yigit, Ferruh, Slava Ovsiienko
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso



> 
> 12/10/2020 11:56, Slava Ovsiienko:
> > Hi, Andrew
> >
> > Thank you for the comments.
> >
> > We have two approaches how to specify multiple segments to split Rx packets:
> > 1. update queue configuration structure
> > 2. introduce new rx_queue_setup_ex() routine with extra parameters.
> >
> > For [1] my only actual dislike is that we would have multiple places to specify
> > the pool - in rx_queue_setup() and in the config structure. So, we should
> > implement some checking (if we have offload flag set we should check
> > whether mp parameter is NULL and segment descriptions array pointer/size
> > is provided, if no offload flag set - we must check the description array is empty).
> >
> > > @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
> > > about it.
> >
> > Yes, it would be very nice to hear extra opinions. Do we think the providing
> > of extra API function is worse than extending existing structure, introducing
> > some conditional ambiguity and complicating the parameter compliance
> > check?
> 
> Let's try listing pros and cons of each approach, so we can conclude.
> 
> 1/ update queue config struct
> 
> 	1.1 pro: keep same queue setup function
> 	1.2 con: two mempool pointers (struct or function)
> 	1.3 con: variable size of segment description array
> 
> 2/ new queue setup function
> 
> 	2.1 con: two functions for queue setup
> 	2.2 pro: mempool pointer is not redundant
> 	2.3 pro: segment description array size defined by the caller
> 
> What else I'm missing?
> 

My 2 cents: can we make new (_ex) function to work for both
original config (1 mp for all sizes, no split) and for new config
(multiple mp, split allowed)?
Then in future (21.11?) we can either get rid of original one,
or even make it a wrapper around all one?
Konstantin


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:28           ` Ananyev, Konstantin
@ 2020-10-12 15:34             ` Slava Ovsiienko
  2020-10-12 15:56               ` Ananyev, Konstantin
  0 siblings, 1 reply; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12 15:34 UTC (permalink / raw)
  To: Ananyev, Konstantin, NBU-Contact-Thomas Monjalon,
	Andrew Rybchenko, Yigit, Ferruh
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso



> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Monday, October 12, 2020 18:28
> To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Andrew
> Rybchenko <Andrew.Rybchenko@oktetlabs.ru>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; Shahaf Shuler
> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>
> Subject: RE: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
> 
> 
> >
> > 12/10/2020 11:56, Slava Ovsiienko:
> > > Hi, Andrew
> > >
> > > Thank you for the comments.
> > >
> > > We have two approaches how to specify multiple segments to split Rx
> packets:
> > > 1. update queue configuration structure 2. introduce new
> > > rx_queue_setup_ex() routine with extra parameters.
> > >
> > > For [1] my only actual dislike is that we would have multiple places
> > > to specify the pool - in rx_queue_setup() and in the config
> > > structure. So, we should implement some checking (if we have offload
> > > flag set we should check whether mp parameter is NULL and segment
> > > descriptions array pointer/size is provided, if no offload flag set - we must
> check the description array is empty).
> > >
> > > > @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers
> > > > think about it.
> > >
> > > Yes, it would be very nice to hear extra opinions. Do we think the
> > > providing of extra API function is worse than extending existing
> > > structure, introducing some conditional ambiguity and complicating
> > > the parameter compliance check?
> >
> > Let's try listing pros and cons of each approach, so we can conclude.
> >
> > 1/ update queue config struct
> >
> > 	1.1 pro: keep same queue setup function
> > 	1.2 con: two mempool pointers (struct or function)
> > 	1.3 con: variable size of segment description array
> >
> > 2/ new queue setup function
> >
> > 	2.1 con: two functions for queue setup
> > 	2.2 pro: mempool pointer is not redundant
> > 	2.3 pro: segment description array size defined by the caller
> >
> > What else I'm missing?
> >
> 
> My 2 cents: can we make new (_ex) function to work for both original config
> (1 mp for all sizes, no split) and for new config (multiple mp, split allowed)?
> Then in future (21.11?) we can either get rid of original one, or even make it
> a wrapper around all one?
> Konstantin

Yes, actually the mlx5 PMD implementation follows this approach -
specifying the segment description array with the only element 
and zero size/offset provides exactly the same configuration as existing
rte_eth_rx_queue_setup().

Currently I'm detailing the description  (how HEAD_ROOM is handled, what happens
if array is shorter the the buffer chain for segment of maximal size, the zero segment
size means follow the value deduced from the pool and so on).

So, may we consider this point as one more "pro" to setup_ex approach ? 😊

With best regards, Slava




^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:34             ` Slava Ovsiienko
@ 2020-10-12 15:56               ` Ananyev, Konstantin
  2020-10-12 15:59                 ` Slava Ovsiienko
  2020-10-12 16:52                 ` Thomas Monjalon
  0 siblings, 2 replies; 172+ messages in thread
From: Ananyev, Konstantin @ 2020-10-12 15:56 UTC (permalink / raw)
  To: Slava Ovsiienko, NBU-Contact-Thomas Monjalon, Andrew Rybchenko,
	Yigit, Ferruh
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso


> 
> > -----Original Message-----
> > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Sent: Monday, October 12, 2020 18:28
> > To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Andrew
> > Rybchenko <Andrew.Rybchenko@oktetlabs.ru>; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; stephen@networkplumber.org; Shahaf Shuler
> > <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> > maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> > <asafp@nvidia.com>
> > Subject: RE: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> >
> >
> >
> > >
> > > 12/10/2020 11:56, Slava Ovsiienko:
> > > > Hi, Andrew
> > > >
> > > > Thank you for the comments.
> > > >
> > > > We have two approaches how to specify multiple segments to split Rx
> > packets:
> > > > 1. update queue configuration structure 2. introduce new
> > > > rx_queue_setup_ex() routine with extra parameters.
> > > >
> > > > For [1] my only actual dislike is that we would have multiple places
> > > > to specify the pool - in rx_queue_setup() and in the config
> > > > structure. So, we should implement some checking (if we have offload
> > > > flag set we should check whether mp parameter is NULL and segment
> > > > descriptions array pointer/size is provided, if no offload flag set - we must
> > check the description array is empty).
> > > >
> > > > > @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers
> > > > > think about it.
> > > >
> > > > Yes, it would be very nice to hear extra opinions. Do we think the
> > > > providing of extra API function is worse than extending existing
> > > > structure, introducing some conditional ambiguity and complicating
> > > > the parameter compliance check?
> > >
> > > Let's try listing pros and cons of each approach, so we can conclude.
> > >
> > > 1/ update queue config struct
> > >
> > > 	1.1 pro: keep same queue setup function
> > > 	1.2 con: two mempool pointers (struct or function)
> > > 	1.3 con: variable size of segment description array
> > >
> > > 2/ new queue setup function
> > >
> > > 	2.1 con: two functions for queue setup
> > > 	2.2 pro: mempool pointer is not redundant
> > > 	2.3 pro: segment description array size defined by the caller
> > >
> > > What else I'm missing?
> > >
> >
> > My 2 cents: can we make new (_ex) function to work for both original config
> > (1 mp for all sizes, no split) and for new config (multiple mp, split allowed)?
> > Then in future (21.11?) we can either get rid of original one, or even make it
> > a wrapper around all one?
> > Konstantin
> 
> Yes, actually the mlx5 PMD implementation follows this approach -
> specifying the segment description array with the only element
> and zero size/offset provides exactly the same configuration as existing
> rte_eth_rx_queue_setup().
> 
> Currently I'm detailing the description  (how HEAD_ROOM is handled, what happens
> if array is shorter the the buffer chain for segment of maximal size, the zero segment
> size means follow the value deduced from the pool and so on).
> 
> So, may we consider this point as one more "pro" to setup_ex approach ? 😊
> 

From my perspective, yes.
It is sort of more gradual approach.
I expect it would be experimental function for some time,
so we'll have time to try it, adjust, fix, etc without breaking original one.
 

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:56               ` Ananyev, Konstantin
@ 2020-10-12 15:59                 ` Slava Ovsiienko
  2020-10-12 16:52                 ` Thomas Monjalon
  1 sibling, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12 15:59 UTC (permalink / raw)
  To: Ananyev, Konstantin, NBU-Contact-Thomas Monjalon,
	Andrew Rybchenko, Yigit, Ferruh
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Monday, October 12, 2020 18:56
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Andrew Rybchenko
> <Andrew.Rybchenko@oktetlabs.ru>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; Shahaf Shuler
> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>
> Subject: RE: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
> 
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Sent: Monday, October 12, 2020 18:28
> > > To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Andrew
> > > Rybchenko <Andrew.Rybchenko@oktetlabs.ru>; Yigit, Ferruh
> > > <ferruh.yigit@intel.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > > Cc: dev@dpdk.org; stephen@networkplumber.org; Shahaf Shuler
> > > <shahafs@nvidia.com>; olivier.matz@6wind.com;
> jerinjacobk@gmail.com;
> > > maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf
> Penso
> > > <asafp@nvidia.com>
> > > Subject: RE: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> > >
> > >
> > >
> > > >
> > > > 12/10/2020 11:56, Slava Ovsiienko:
> > > > > Hi, Andrew
> > > > >
> > > > > Thank you for the comments.
> > > > >
> > > > > We have two approaches how to specify multiple segments to split
> > > > > Rx
> > > packets:
> > > > > 1. update queue configuration structure 2. introduce new
> > > > > rx_queue_setup_ex() routine with extra parameters.
> > > > >
> > > > > For [1] my only actual dislike is that we would have multiple
> > > > > places to specify the pool - in rx_queue_setup() and in the
> > > > > config structure. So, we should implement some checking (if we
> > > > > have offload flag set we should check whether mp parameter is
> > > > > NULL and segment descriptions array pointer/size is provided, if
> > > > > no offload flag set - we must
> > > check the description array is empty).
> > > > >
> > > > > > @Thomas, @Ferruh: I'd like to hear what other ethdev
> > > > > > maintainers think about it.
> > > > >
> > > > > Yes, it would be very nice to hear extra opinions. Do we think
> > > > > the providing of extra API function is worse than extending
> > > > > existing structure, introducing some conditional ambiguity and
> > > > > complicating the parameter compliance check?
> > > >
> > > > Let's try listing pros and cons of each approach, so we can conclude.
> > > >
> > > > 1/ update queue config struct
> > > >
> > > > 	1.1 pro: keep same queue setup function
> > > > 	1.2 con: two mempool pointers (struct or function)
> > > > 	1.3 con: variable size of segment description array
> > > >
> > > > 2/ new queue setup function
> > > >
> > > > 	2.1 con: two functions for queue setup
> > > > 	2.2 pro: mempool pointer is not redundant
> > > > 	2.3 pro: segment description array size defined by the caller
> > > >
> > > > What else I'm missing?
> > > >
> > >
> > > My 2 cents: can we make new (_ex) function to work for both original
> > > config
> > > (1 mp for all sizes, no split) and for new config (multiple mp, split
> allowed)?
> > > Then in future (21.11?) we can either get rid of original one, or
> > > even make it a wrapper around all one?
> > > Konstantin
> >
> > Yes, actually the mlx5 PMD implementation follows this approach -
> > specifying the segment description array with the only element and
> > zero size/offset provides exactly the same configuration as existing
> > rte_eth_rx_queue_setup().
> >
> > Currently I'm detailing the description  (how HEAD_ROOM is handled,
> > what happens if array is shorter the the buffer chain for segment of
> > maximal size, the zero segment size means follow the value deduced from
> the pool and so on).
> >
> > So, may we consider this point as one more "pro" to setup_ex approach
> > ? 😊
> >
> 
> From my perspective, yes.
> It is sort of more gradual approach.
> I expect it would be experimental function for some time, so we'll have time
> to try it, adjust, fix, etc without breaking original one.
> 
Thank you for providing your opinion (whatever).
Yes, function will be marked as experimental.

With best regards, Slava


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:14         ` Thomas Monjalon
  2020-10-12 15:28           ` Ananyev, Konstantin
@ 2020-10-12 16:03           ` Andrew Rybchenko
  2020-10-12 16:10             ` Slava Ovsiienko
  1 sibling, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-12 16:03 UTC (permalink / raw)
  To: Thomas Monjalon, ferruh.yigit, Slava Ovsiienko
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso

On 10/12/20 6:14 PM, Thomas Monjalon wrote:
> 12/10/2020 11:56, Slava Ovsiienko:
>> Hi, Andrew
>>
>> Thank you for the comments.
>>
>> We have two approaches how to specify multiple segments to split Rx packets:
>> 1. update queue configuration structure
>> 2. introduce new rx_queue_setup_ex() routine with extra parameters.
>>
>> For [1] my only actual dislike is that we would have multiple places to specify
>> the pool - in rx_queue_setup() and in the config structure. So, we should
>> implement some checking (if we have offload flag set we should check
>> whether mp parameter is NULL and segment descriptions array pointer/size
>> is provided, if no offload flag set - we must check the description array is empty). 
>>
>>> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
>>> about it.
>>
>> Yes, it would be very nice to hear extra opinions. Do we think the providing
>> of extra API function is worse than extending existing structure, introducing
>> some conditional ambiguity and complicating the parameter compliance
>> check?
> 
> Let's try listing pros and cons of each approach, so we can conclude.
> 
> 1/ update queue config struct
> 
> 	1.1 pro: keep same queue setup function

pro: no code duplication

> 	1.2 con: two mempool pointers (struct or function)
> 	1.3 con: variable size of segment description array
> 
> 2/ new queue setup function
> 
> 	2.1 con: two functions for queue setup

con: code duplication or refactoring of existing stable code

> 	2.2 pro: mempool pointer is not redundant
> 	2.3 pro: segment description array size defined by the caller
> 
> What else I'm missing?
> 


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 16:03           ` Andrew Rybchenko
@ 2020-10-12 16:10             ` Slava Ovsiienko
  0 siblings, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12 16:10 UTC (permalink / raw)
  To: Andrew Rybchenko, NBU-Contact-Thomas Monjalon, ferruh.yigit
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso

> -----Original Message-----
> From: Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
> Sent: Monday, October 12, 2020 19:04
> To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>;
> ferruh.yigit@intel.com; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; Shahaf Shuler
> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>
> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
> On 10/12/20 6:14 PM, Thomas Monjalon wrote:
> > 12/10/2020 11:56, Slava Ovsiienko:
> >> Hi, Andrew
> >>
> >> Thank you for the comments.
> >>
> >> We have two approaches how to specify multiple segments to split Rx
> packets:
> >> 1. update queue configuration structure 2. introduce new
> >> rx_queue_setup_ex() routine with extra parameters.
> >>
> >> For [1] my only actual dislike is that we would have multiple places
> >> to specify the pool - in rx_queue_setup() and in the config
> >> structure. So, we should implement some checking (if we have offload
> >> flag set we should check whether mp parameter is NULL and segment
> >> descriptions array pointer/size is provided, if no offload flag set - we must
> check the description array is empty).
> >>
> >>> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers
> >>> think about it.
> >>
> >> Yes, it would be very nice to hear extra opinions. Do we think the
> >> providing of extra API function is worse than extending existing
> >> structure, introducing some conditional ambiguity and complicating
> >> the parameter compliance check?
> >
> > Let's try listing pros and cons of each approach, so we can conclude.
> >
> > 1/ update queue config struct
> >
> > 	1.1 pro: keep same queue setup function
> 
> pro: no code duplication
> 
> > 	1.2 con: two mempool pointers (struct or function)
> > 	1.3 con: variable size of segment description array
> >
> > 2/ new queue setup function
> >
> > 	2.1 con: two functions for queue setup
> 
> con: code duplication or refactoring of existing stable code

- no refactoring of existing rte_eth_rx_queue_setup() - it is kept intact
- yes, there is some duplication in rte_eth_rxseg_queue_setup, but
  the large part of code is new - there is very specific check for the
 split buffer parameters.
- no code duplication at PMD level - both ways go to the same
  internal routine
- PMD code must be refactored anyway, no con/pro for this point -
  updated PMD rx queue setup must handle the new split format,
 no way to drop it.

> 
> > 	2.2 pro: mempool pointer is not redundant
> > 	2.3 pro: segment description array size defined by the caller
> >
> > What else I'm missing?
> >

With best regards, Slava


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
                   ` (2 preceding siblings ...)
  2020-10-07 15:06 ` [dpdk-dev] [PATCH v2 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-12 16:19 ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 1/9] " Viacheslav Ovsiienko
                     ` (8 more replies)
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (9 subsequent siblings)
  13 siblings, 9 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rxseg_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
    n_seg - number of elements in the array

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
[RFC]: http://patches.dpdk.org/patch/75582/
Related deprecation note (revoked): http://patches.dpdk.org/patch/75205/

v1: http://patches.dpdk.org/patch/79594/
v2: http://patches.dpdk.org/patch/79893/
    - add feature support to mlx5 PMD

v3: - rte_eth_rx_queue_setup_ex is renamed to rte_eth_rxseg_queue_setup
    - DEV_RX_OFFLOAD_BUFFER_SPLIT is renamed to RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
    - commit message update
    - documentaion provided
    - release notes update

Viacheslav Ovsiienko (9):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add extended Rx queue setup
  net/mlx5: add extended Rx queue setup routine
  net/mlx5: configure Rx queue to support split
  net/mlx5: register multiple pool for Rx queue
  net/mlx5: update Rx datapath to support split

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      |  96 +++++++++++---
 app/test-pmd/config.c                       |  63 ++++++++-
 app/test-pmd/parameters.c                   |  39 +++++-
 app/test-pmd/testpmd.c                      | 108 +++++++++++-----
 app/test-pmd/testpmd.h                      |  41 +++++-
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/release_20_11.rst      |   6 +
 doc/guides/testpmd_app_ug/run_app.rst       |  16 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 ++-
 drivers/net/mlx5/linux/mlx5_os.c            |   2 +
 drivers/net/mlx5/mlx5.h                     |   3 +
 drivers/net/mlx5/mlx5_mr.c                  |   3 +
 drivers/net/mlx5/mlx5_rxq.c                 | 194 +++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.c                |   3 +-
 drivers/net/mlx5/mlx5_rxtx.h                |  10 +-
 drivers/net/mlx5/mlx5_trigger.c             |  20 +--
 lib/librte_ethdev/rte_ethdev.c              | 178 +++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h              | 107 +++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h       |  10 ++
 lib/librte_ethdev/rte_ethdev_version.map    |   1 +
 21 files changed, 829 insertions(+), 111 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` " Viacheslav Ovsiienko
  2020-10-12 16:38     ` Andrew Rybchenko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rxseg_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
    n_seg - number of elements in the array

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst             |  15 +++
 doc/guides/rel_notes/release_20_11.rst   |   6 ++
 lib/librte_ethdev/rte_ethdev.c           | 178 +++++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 107 +++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h    |  10 ++
 lib/librte_ethdev/rte_ethdev_version.map |   1 +
 6 files changed, 317 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..21b91db 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rxseg_queue_setup()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 2cec9dd..d87247a 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -60,6 +60,12 @@ New Features
   Added the FEC API which provides functions for query FEC capabilities and
   current FEC mode from device. Also, API for configuring FEC mode is also provided.
 
+* **Introduced extended buffer description for receiving.**
+
+  Added the extended Rx queue setup routine providing the individual
+  descriptions for each Rx segment with maximal size, buffer offset and memory
+  pool to allocate data buffers from.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 59beb8a..3a55567 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
 #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_RX_OFFLOAD_##_name, #_name }
 
+#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
+	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
+
 static const struct {
 	uint64_t offload;
 	const char *name;
@@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
+#undef RTE_ETH_RX_OFFLOAD_BIT2STR
 
 #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_TX_OFFLOAD_##_name, #_name }
@@ -1920,6 +1925,179 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+	uint16_t seg_idx;
+	uint32_t mbp_buf_size;
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_rxconf local_conf;
+	void **rxq;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+	if (rx_queue_id >= dev->data->nb_rx_queues) {
+		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup, -ENOTSUP);
+
+	/*
+	 * Check the size of the mbuf data buffer.
+	 * This value must be provided in the private data of the memory pool.
+	 * First check that the memory pool has a valid private data.
+	 */
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size < rx_seg[seg_idx].length +
+				   rx_seg[seg_idx].offset +
+				   (seg_idx ? 0 :
+				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %d < %d"
+				" (segment length=%d + segment offset=%d)\n",
+				mp->name, (int)mbp_buf_size,
+				(int)(rx_seg[seg_idx].length +
+				      rx_seg[seg_idx].offset),
+				(int)rx_seg[seg_idx].length,
+				(int)rx_seg[seg_idx].offset);
+			return -EINVAL;
+		}
+	}
+
+	/* Use default specified by driver, if nb_rx_desc is zero */
+	if (nb_rx_desc == 0) {
+		nb_rx_desc = dev_info.default_rxportconf.ring_size;
+		/* If driver default is also zero, fall back on EAL default */
+		if (nb_rx_desc == 0)
+			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
+	}
+
+	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+		RTE_ETHDEV_LOG(ERR,
+			"Invalid value for nb_rx_desc(=%hu), should be: "
+			"<= %hu, >= %hu, and a product of %hu\n",
+			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
+			dev_info.rx_desc_lim.nb_min,
+			dev_info.rx_desc_lim.nb_align);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started &&
+		!(dev_info.dev_capa &
+			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
+		return -EBUSY;
+
+	if (dev->data->dev_started &&
+		(dev->data->rx_queue_state[rx_queue_id] !=
+			RTE_ETH_QUEUE_STATE_STOPPED))
+		return -EBUSY;
+
+	rxq = dev->data->rx_queues;
+	if (rxq[rx_queue_id]) {
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
+					-ENOTSUP);
+		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
+		rxq[rx_queue_id] = NULL;
+	}
+
+	if (rx_conf == NULL)
+		rx_conf = &dev_info.default_rxconf;
+
+	local_conf = *rx_conf;
+
+	/*
+	 * If an offloading has already been enabled in
+	 * rte_eth_dev_configure(), it has been enabled on all queues,
+	 * so there is no need to enable it in this queue again.
+	 * The local_conf.offloads input to underlying PMD only carries
+	 * those offloadings which are only enabled on this queue and
+	 * not enabled on all queues.
+	 */
+	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
+
+	/*
+	 * New added offloadings for this queue are those not enabled in
+	 * rte_eth_dev_configure() and they must be per-queue type.
+	 * A pure per-port offloading can't be enabled on a queue while
+	 * disabled on another queue. A pure per-port offloading can't
+	 * be enabled for any queue as new added one if it hasn't been
+	 * enabled in rte_eth_dev_configure().
+	 */
+	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
+	     local_conf.offloads) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
+			" 0x%"PRIx64" must be within per-queue offload"
+			" capabilities 0x%"PRIx64" in %s()\n",
+			port_id, rx_queue_id, local_conf.offloads,
+			dev_info.rx_queue_offload_capa,
+			__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * If LRO is enabled, check that the maximum aggregated packet
+	 * size is supported by the configured device.
+	 */
+	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
+		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
+			dev->data->dev_conf.rxmode.max_lro_pkt_size =
+				dev->data->dev_conf.rxmode.max_rx_pkt_len;
+		int ret = check_lro_pkt_size(port_id,
+				dev->data->dev_conf.rxmode.max_lro_pkt_size,
+				dev->data->dev_conf.rxmode.max_rx_pkt_len,
+				dev_info.max_lro_pkt_size);
+		if (ret != 0)
+			return ret;
+	}
+
+	ret = (*dev->dev_ops->rxseg_queue_setup)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg);
+	if (!ret) {
+		if (!dev->data->min_rx_buf_size ||
+		    dev->data->min_rx_buf_size > mbp_buf_size)
+			dev->data->min_rx_buf_size = mbp_buf_size;
+	}
+
+	return eth_err(port_id, ret);
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 3a31f94..bbf25c8 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment data length, configures split point. */
+	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1260,6 +1270,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2037,6 +2048,102 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		uint16_t nb_rx_desc, unsigned int socket_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate and set up a receive queue for an Ethernet device
+ * with specifying receiving segments parameters.
+ *
+ * The function allocates a contiguous block of memory for *nb_rx_desc*
+ * receive descriptors from a memory zone associated with *socket_id*.
+ * The descriptors might be divided into groups by PMD to receive the data
+ * into multi-segment packet presented by the chain of mbufs.
+ *
+ * Each descriptor within the group is initialized accordingly with
+ * the network buffers allocated from the specified memory pool and with
+ * specified buffer offset and maximal segment length.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of NUMA.
+ *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
+ *   the DMA memory allocated for the receive descriptors of the ring.
+ * @param rx_conf
+ *   The pointer to the configuration data to be used for the receive queue.
+ *   NULL value is allowed, in which case default RX configuration
+ *   will be used.
+ *   The *rx_conf* structure contains an *rx_thresh* structure with the values
+ *   of the Prefetch, Host, and Write-Back threshold registers of the receive
+ *   ring.
+ *   In addition it contains the hardware offloads features to activate using
+ *   the DEV_RX_OFFLOAD_* flags.
+ *   If an offloading set in rx_conf->offloads
+ *   hasn't been set in the input argument eth_conf->rxmode.offloads
+ *   to rte_eth_dev_configure(), it is a new added offloading, it must be
+ *   per-queue type and it is enabled for the queue.
+ *   No need to repeat any bit in rx_conf->offloads which has already been
+ *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
+ *   at port level can't be disabled at queue level.
+ * @param rx_seg
+ *   The pointer to the array of segment descriptions, each element describes
+ *   the memory pool, maximal segment data length, initial data offset from
+ *   the beginning of data buffer in mbuf. This allow to specify the dedicated
+ *   properties for each segment in the receiving buffer - pool, buffer
+ *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
+ *   flag is configured the PMD will split the received packets into multiple
+ *   segments according to the specification in the description array:
+ *   - the first network buffer will be allocated from the memory pool,
+ *     specified in the first segment description element, the second
+ *     network buffer - from the pool in the second segment description
+ *     element and so on. If there is no enough elements to describe
+ *     the buffer for entire packet of maximal length the pool from the last
+ *     valid element will be used to allocate the buffers from for the rest
+ *     of segments.
+ *   - the offsets from the segment description elements will provide the
+ *     data offset from the buffer beginning except the first mbuf - for this
+ *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
+ *     offset from the buffer beginning. If there is no enough elements
+ *     to describe the buffer for entire packet of maximal length the offsets
+ *     for the rest of segment will be supposed to be zero.
+ *   - the data length being received to each segment is limited by the
+ *     length specified in the segment description element. The data receiving
+ *     starts with filling up the first mbuf data buffer, if the specified
+ *     maximal segment length is reached and there are data remaining
+ *     (packet is longer than buffer in the first mbuf) the following data
+ *     will be pushed to the next segment up to its own length. If the first
+ *     two segments is not enough to store all the packet data the next
+ *     (third) segment will be engaged and so on. If the length in the segment
+ *     description element is zero the actual buffer size will be deduced
+ *     from the appropriate memory pool properties. If there is no enough
+ *     elements to describe the buffer for entire packet of maximal length
+ *     the buffer size will be deduced from the pool of the last valid
+ *     element for the all remaining segments.
+ * @param n_seg
+ *   The number of elements in the segment description array.
+ * @return
+ *   - 0: Success, receive queue correctly set up.
+ *   - -EIO: if device is removed.
+ *   - -EINVAL: The segment descriptors array is empty (pointer to is null or
+ *      zero number of elements) or the size of network buffers which can be
+ *      allocated from this memory pool does not fit the various buffer sizes
+ *      allowed by the device controller.
+ *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
+ *      allocate network memory buffers from the memory pool when
+ *      initializing receive descriptors.
+ */
+__rte_experimental
+int rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
 
 /**
  * @warning
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 35cc4fb..5dee210 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -264,6 +264,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rxseg_queue_setup_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal extended Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -711,6 +720,7 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rxseg_queue_setup_t    rxseg_queue_setup;/**< Extended RX setup. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f8a0945..d4b9849 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -195,6 +195,7 @@ EXPERIMENTAL {
 	rte_flow_get_aged_flows;
 
 	# Marked as experimental in 20.11
+	rte_eth_rxseg_queue_setup;
 	rte_tm_capabilities_get;
 	rte_tm_get_number_of_leaf_nodes;
 	rte_tm_hierarchy_commit;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 2/9] app/testpmd: add multiple pools per core creation
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 1/9] " Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 16e3c3b..0a1a178 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 273fb1a..a585cf0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2907,7 +2907,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 8ccd989..4405abc 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -647,7 +647,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3309,9 +3309,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1ead595..1f40d73 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -890,12 +892,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c..ec66060 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], 0);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, 0);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2909,13 +2916,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2959,7 +2966,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 227b694..e56a89c 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -604,17 +613,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index e2539f6..2d5a263 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 3/9] app/testpmd: add buffer split offload configuration
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 1/9] " Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a585cf0..fa71039 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -883,16 +883,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18417,7 +18417,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18497,8 +18498,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18547,7 +18548,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18603,8 +18604,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 4405abc..6af8ea9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1049,6 +1049,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 4/9] app/testpmd: add rxpkts commands and parameters
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set txpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index fa71039..d8dba54 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3889,6 +3895,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7517,6 +7569,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7529,12 +7583,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19807,6 +19861,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 6af8ea9..6a130ab 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3257,6 +3257,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3301,10 +3345,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs_is_invalid(nb_segs))
 		return;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1f40d73..99f0223 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -661,6 +662,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1270,6 +1272,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ec66060..8e1c502 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index e56a89c..5f8928e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -815,7 +822,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 2d5a263..9286281 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 12268bc..529470a 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -774,6 +776,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 5/9] app/testpmd: add extended Rx queue setup
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 38 ++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  6 ++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d8dba54..cf99f66 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2921,12 +2921,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 8e1c502..960fb67 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2412,6 +2412,40 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0)
+		return rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					      nb_rx_desc, socket_id,
+					      rx_conf, mp);
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	return rte_eth_rxseg_queue_setup(port_id, rx_queue_id,
+					 nb_rx_desc, socket_id, rx_conf,
+					 rx_seg, rx_pkt_nb_segs);
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2520,7 +2554,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2536,7 +2570,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 5f8928e..b7611be 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -871,6 +871,12 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 6/9] net/mlx5: add extended Rx queue setup routine
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The routine to provide Rx queue setup with specifying
extended receiving buffer description is added.
It allows application to specify desired segment
lengths, data position offsets in the buffer
and dedicated memory pool for each segment.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  2 +
 drivers/net/mlx5/mlx5.h          |  3 ++
 drivers/net/mlx5/mlx5_rxq.c      | 91 +++++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h     | 10 ++++-
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 487714f..0e85489 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2495,6 +2495,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rxseg_queue_setup = mlx5_rxseg_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
@@ -2578,6 +2579,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rxseg_queue_setup = mlx5_rxseg_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 87d3c15..bfc0812 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,6 +162,9 @@ struct mlx5_stats_ctrl {
 /* Maximal size of aggregated LRO packet. */
 #define MLX5_MAX_LRO_SIZE (UINT8_MAX * MLX5_LRO_SEG_CHUNK_SIZE)
 
+/* Maximal number of segments to split. */
+#define MLX5_MAX_RXQ_NSEG (1u << MLX5_MAX_LOG_RQ_SEGS)
+
 /* LRO configurations structure. */
 struct mlx5_lro_config {
 	uint32_t supported:1; /* Whether LRO is supported. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f1d8373..42818d8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -390,6 +390,7 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
 	uint64_t offloads = (DEV_RX_OFFLOAD_SCATTER |
+			     RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT |
 			     DEV_RX_OFFLOAD_TIMESTAMP |
 			     DEV_RX_OFFLOAD_JUMBO_FRAME |
 			     DEV_RX_OFFLOAD_RSS_HASH);
@@ -715,16 +716,20 @@
  *   NUMA socket on which memory must be allocated.
  * @param[in] conf
  *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
+ * @param rx_seg
+ *   Pointer the array of segment descriptions, each element
+ *   describes the memory pool, maximal data length, initial
+ *   data offset from the beginning of data buffer in mbuf
+ * @param n_seg
+ *   Number of elements in the segment descriptions array
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+mlx5_rxseg_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		       unsigned int socket, const struct rte_eth_rxconf *conf,
+		       const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
@@ -732,10 +737,43 @@
 		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 	int res;
 
+	if (!n_seg || !rx_seg) {
+		DRV_LOG(ERR, "port %u queue index %u invalid "
+			      "split description",
+			      dev->data->port_id, idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (n_seg > 1) {
+		uint64_t offloads = conf->offloads |
+				    dev->data->dev_conf.rxmode.offloads;
+
+		if (!(offloads & DEV_RX_OFFLOAD_SCATTER)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "configuration requires scattering",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "offload not configured",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (n_seg > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR, "port %u queue index %u too many "
+				     "segments %u to split",
+				     dev->data->port_id, idx, n_seg);
+			rte_errno = EOVERFLOW;
+			return -rte_errno;
+		}
+	}
 	res = mlx5_rx_queue_pre_setup(dev, idx, &desc);
 	if (res)
 		return res;
-	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
+	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, rx_seg, n_seg);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
 			dev->data->port_id, idx);
@@ -756,6 +794,39 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg = {
+		.mp = mp,
+		/*
+		 * All other fields are zeroed, zero segment length
+		 * means the pool buffer size should be used by PMD.
+		 */
+	};
+	return mlx5_rxseg_queue_setup(dev, idx, desc, socket, conf, &rx_seg, 1);
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
  * @param hairpin_conf
  *   Hairpin configuration parameters.
  *
@@ -1328,11 +1399,11 @@
 struct mlx5_rxq_ctrl *
 mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	     unsigned int socket, const struct rte_eth_rxconf *conf,
-	     struct rte_mempool *mp)
+	     const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *tmpl;
-	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
+	unsigned int mb_len = rte_pktmbuf_data_room_size(rx_seg[0].mp);
 	unsigned int mprq_stride_nums;
 	unsigned int mprq_stride_size;
 	unsigned int mprq_stride_cap;
@@ -1346,7 +1417,7 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1531,7 +1602,7 @@ struct mlx5_rxq_ctrl *
 		(!!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS));
 	tmpl->rxq.port_id = dev->data->port_id;
 	tmpl->priv = priv;
-	tmpl->rxq.mp = mp;
+	tmpl->rxq.mp = rx_seg[0].mp;
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.rq_repl_thresh =
 		MLX5_VPMD_RXQ_RPLNSH_THRESH(1 << tmpl->rxq.elts_n);
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 674296e..f103a30 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -150,6 +150,9 @@ struct mlx5_rxq_data {
 	rte_spinlock_t *uar_lock_cq;
 	/* CQ (UAR) access lock required for 32bit implementations */
 #endif
+	struct rte_eth_rxseg rxseg[MLX5_MAX_RXQ_NSEG];
+	/* Buffer split segment descriptions - sizes, offsets, pools. */
+	uint32_t rxseg_n; /* Number of split segment descriptions. */
 	uint32_t tunnel; /* Tunnel information. */
 	uint64_t flow_meta_mask;
 	int32_t flow_meta_offset;
@@ -304,6 +307,10 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rxseg_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_rxconf *conf,
+	 const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
 int mlx5_rx_hairpin_queue_setup
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
@@ -316,7 +323,8 @@ int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
-				   struct rte_mempool *mp);
+				   const struct rte_eth_rxseg *rx_seg,
+				   uint16_t n_seg);
 struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 7/9] net/mlx5: configure Rx queue to support split
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The scatter-gather elements should be configured
accordingly to support the buffer split feature.
The application provides the desired settings for
the segments at the beginning of the packets and
PMD pads the buffer chain (if needed) with attributes
of last specified segment to accommodate the packet
of maximal length.

There are some limitations are implied. The MPRQ
feature should be disengaged if split is requested,
due to MPRQ neither supports pushing data to the
dedicated pools nor follows the flexible buffer sizes.
The vectorized rx_burst routines does not support
the scattering (these ones are extremely simplified
and work over the single segment only) and can't
handle split as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 94 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 42818d8..4ec4677 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1417,7 +1417,8 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1 &&
+			    !rx_seg[0].offset && !rx_seg[0].length;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1425,22 +1426,87 @@ struct mlx5_rxq_ctrl *
 							RTE_PKTMBUF_HEADROOM;
 	unsigned int max_lro_size = 0;
 	unsigned int first_mb_free_size = mb_len - RTE_PKTMBUF_HEADROOM;
+	const struct rte_eth_rxseg *qs_seg = rx_seg;
+	unsigned int tail_len;
 
-	if (non_scatter_min_mbuf_size > mb_len && !(offloads &
-						    DEV_RX_OFFLOAD_SCATTER)) {
+	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
+			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_ASSERT(n_seg && n_seg <= MLX5_MAX_RXQ_NSEG);
+	/*
+	 * Build the array of actual buffer offsets and lengths.
+	 * Pad with the buffers from the last memory pool if
+	 * needed to handle max size packets, replace zero length
+	 * with the buffer length from the pool.
+	 */
+	tail_len = max_rx_pkt_len;
+	do {
+		struct rte_eth_rxseg *hw_seg =
+					&tmpl->rxq.rxseg[tmpl->rxq.rxseg_n];
+		uint32_t buf_len = rte_pktmbuf_data_room_size(qs_seg->mp);
+		uint32_t offset, seg_len;
+
+		/*
+		 * For the buffers beyond descriptions offset is zero,
+		 * the first buffer contains head room.
+		 */
+		offset = (tmpl->rxq.rxseg_n >= n_seg ? 0 : qs_seg->offset) +
+			 (tmpl->rxq.rxseg_n ? 0 : RTE_PKTMBUF_HEADROOM);
+		/*
+		 * For the buffers beyond descriptions the length is
+		 * pool buffer length, zero lengths are replaced with
+		 * pool buffer length either.
+		 */
+		seg_len = tmpl->rxq.rxseg_n >= n_seg ? buf_len :
+			  qs_seg->length ? qs_seg->length : (buf_len - offset);
+		/* Check is done in long int, now overflows. */
+		if (buf_len < seg_len + offset) {
+			DRV_LOG(ERR, "port %u Rx queue %u: Split offset/length "
+				     "%u/%u can't be satisfied",
+				     dev->data->port_id, idx,
+				     qs_seg->length, qs_seg->offset);
+			rte_errno = EINVAL;
+			goto error;
+		}
+		if (seg_len > tail_len)
+			seg_len = buf_len - offset;
+		if (++tmpl->rxq.rxseg_n > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR,
+				"port %u too many SGEs (%u) needed to handle"
+				" requested maximum packet size %u, the maximum"
+				" supported are %u", dev->data->port_id,
+				tmpl->rxq.rxseg_n, max_rx_pkt_len,
+				MLX5_MAX_RXQ_NSEG);
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		/* Build the actual scattering element in the queue object. */
+		hw_seg->mp = qs_seg->mp;
+		MLX5_ASSERT(offset <= UINT16_MAX);
+		MLX5_ASSERT(seg_len <= UINT16_MAX);
+		hw_seg->offset = (uint16_t)offset;
+		hw_seg->length = (uint16_t)seg_len;
+		/*
+		 * Advance the segment descriptor, the padding is the based
+		 * on the attributes of the last descriptor.
+		 */
+		if (tmpl->rxq.rxseg_n < n_seg)
+			qs_seg++;
+		tail_len -= RTE_MIN(tail_len, seg_len);
+	} while (tail_len || !rte_is_power_of_2(tmpl->rxq.rxseg_n));
+	MLX5_ASSERT(tmpl->rxq.rxseg_n &&
+		    tmpl->rxq.rxseg_n <= MLX5_MAX_RXQ_NSEG);
+	if (tmpl->rxq.rxseg_n > 1 && !(offloads & DEV_RX_OFFLOAD_SCATTER)) {
 		DRV_LOG(ERR, "port %u Rx queue %u: Scatter offload is not"
 			" configured and no enough mbuf space(%u) to contain "
 			"the maximum RX packet length(%u) with head-room(%u)",
 			dev->data->port_id, idx, mb_len, max_rx_pkt_len,
 			RTE_PKTMBUF_HEADROOM);
 		rte_errno = ENOSPC;
-		return NULL;
-	}
-	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
-			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
-	if (!tmpl) {
-		rte_errno = ENOMEM;
-		return NULL;
+		goto error;
 	}
 	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
@@ -1467,7 +1533,7 @@ struct mlx5_rxq_ctrl *
 	 *  - The number of descs is more than the number of strides.
 	 *  - max_rx_pkt_len plus overhead is less than the max size
 	 *    of a stride or mprq_stride_size is specified by a user.
-	 *    Need to nake sure that there are enough stides to encap
+	 *    Need to make sure that there are enough stides to encap
 	 *    the maximum packet size in case mprq_stride_size is set.
 	 *  Otherwise, enable Rx scatter if necessary.
 	 */
@@ -1497,11 +1563,11 @@ struct mlx5_rxq_ctrl *
 			" strd_num_n = %u, strd_sz_n = %u",
 			dev->data->port_id, idx,
 			tmpl->rxq.strd_num_n, tmpl->rxq.strd_sz_n);
-	} else if (max_rx_pkt_len <= first_mb_free_size) {
+	} else if (tmpl->rxq.rxseg_n == 1) {
+		MLX5_ASSERT(max_rx_pkt_len <= first_mb_free_size);
 		tmpl->rxq.sges_n = 0;
 		max_lro_size = max_rx_pkt_len;
 	} else if (offloads & DEV_RX_OFFLOAD_SCATTER) {
-		unsigned int size = non_scatter_min_mbuf_size;
 		unsigned int sges_n;
 
 		if (lro_on_queue && first_mb_free_size <
@@ -1516,7 +1582,7 @@ struct mlx5_rxq_ctrl *
 		 * Determine the number of SGEs needed for a full packet
 		 * and round it to the next power of two.
 		 */
-		sges_n = log2above((size / mb_len) + !!(size % mb_len));
+		sges_n = log2above(tmpl->rxq.rxseg_n);
 		if (sges_n > MLX5_MAX_LOG_RQ_SEGS) {
 			DRV_LOG(ERR,
 				"port %u too many SGEs (%u) needed to handle"
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 8/9] net/mlx5: register multiple pool for Rx queue
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The split feature for receiving packets was added to the mlx5
PMD, now Rx queue can receive the data to the buffers belonging
to the different pools and the memory of all the involved pool
must be registered for DMA operations in order to allow hardware
to store the data.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_mr.c      |  3 +++
 drivers/net/mlx5/mlx5_trigger.c | 20 ++++++++++++--------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index dbcf0aa..c308ecc 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -536,6 +536,9 @@ struct mr_update_mp_data {
 		.ret = 0,
 	};
 
+	DRV_LOG(DEBUG, "Port %u Rx queue registering mp %s "
+		       "having %u chunks.", dev->data->port_id,
+		       mp->name, mp->nb_mem_chunks);
 	rte_mempool_mem_iter(mp, mlx5_mr_update_mp_cb, &data);
 	if (data.ret < 0 && rte_errno == ENXIO) {
 		/* Mempool may have externally allocated memory. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e72e5fb..643e10f 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -145,18 +145,22 @@
 		dev->data->port_id, priv->sh->device_attr.max_sge);
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_get(dev, i);
-		struct rte_mempool *mp;
 
 		if (!rxq_ctrl)
 			continue;
 		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD) {
-			/* Pre-register Rx mempool. */
-			mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
-			     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-			DRV_LOG(DEBUG, "Port %u Rx queue %u registering mp %s"
-				" having %u chunks.", dev->data->port_id,
-				rxq_ctrl->rxq.idx, mp->name, mp->nb_mem_chunks);
-			mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl, mp);
+			/* Pre-register Rx mempools. */
+			if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) {
+				mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl,
+						  rxq_ctrl->rxq.mprq_mp);
+			} else {
+				uint32_t s;
+
+				for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++)
+					mlx5_mr_update_mp
+						(dev, &rxq_ctrl->rxq.mr_ctrl,
+						rxq_ctrl->rxq.rxseg[s].mp);
+			}
 			ret = rxq_alloc_elts(rxq_ctrl);
 			if (ret)
 				goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v3 9/9] net/mlx5: update Rx datapath to support split
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
@ 2020-10-12 16:19   ` Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 16:19 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Only the regular rx_burst routine is updated to support split,
because the vectorized ones does not support scatter and MPRQ
does not support split at all.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c  | 11 +++++------
 drivers/net/mlx5/mlx5_rxtx.c |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4ec4677..2ebb265 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -210,9 +210,10 @@
 
 	/* Iterate on segments. */
 	for (i = 0; (i != elts_n); ++i) {
+		struct rte_eth_rxseg *seg = &rxq_ctrl->rxq.rxseg[i % sges_n];
 		struct rte_mbuf *buf;
 
-		buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
+		buf = rte_pktmbuf_alloc(seg->mp);
 		if (buf == NULL) {
 			DRV_LOG(ERR, "port %u empty mbuf pool",
 				PORT_ID(rxq_ctrl->priv));
@@ -225,12 +226,10 @@
 		MLX5_ASSERT(rte_pktmbuf_data_len(buf) == 0);
 		MLX5_ASSERT(rte_pktmbuf_pkt_len(buf) == 0);
 		MLX5_ASSERT(!buf->next);
-		/* Only the first segment keeps headroom. */
-		if (i % sges_n)
-			SET_DATA_OFF(buf, 0);
+		SET_DATA_OFF(buf, seg->offset);
 		PORT(buf) = rxq_ctrl->rxq.port_id;
-		DATA_LEN(buf) = rte_pktmbuf_tailroom(buf);
-		PKT_LEN(buf) = DATA_LEN(buf);
+		DATA_LEN(buf) = seg->length;
+		PKT_LEN(buf) = seg->length;
 		NB_SEGS(buf) = 1;
 		(*rxq_ctrl->rxq.elts)[i] = buf;
 	}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b530ff4..dd84249 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1334,7 +1334,8 @@ enum mlx5_txcmp_code {
 		rte_prefetch0(seg);
 		rte_prefetch0(cqe);
 		rte_prefetch0(wqe);
-		rep = rte_mbuf_raw_alloc(rxq->mp);
+		/* Allocate the buf from the same pool. */
+		rep = rte_mbuf_raw_alloc(seg->pool);
 		if (unlikely(rep == NULL)) {
 			++rxq->stats.rx_nombuf;
 			if (!pkt) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 16:19   ` [dpdk-dev] [PATCH v3 1/9] " Viacheslav Ovsiienko
@ 2020-10-12 16:38     ` Andrew Rybchenko
  2020-10-12 17:03       ` Thomas Monjalon
  0 siblings, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-12 16:38 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand

On 10/12/20 7:19 PM, Viacheslav Ovsiienko wrote:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
> 		       	configures "split point" */
>     uint16_t offset; /* data offset from beginning
> 		       	of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
> 
> The new routine rte_eth_rxseg_queue_setup_ex() is introduced to
> setup the given Rx queue using the new extended Rx packet segment
> description:
> 
> int
> rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
>                           uint16_t nb_rx_desc, unsigned int socket_id,
>                           const struct rte_eth_rxconf *rx_conf,
> 		          const struct rte_eth_rxseg *rx_seg,
>                           uint16_t n_seg)
> 
> This routine presents the two new parameters:
>     rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
> 	     This array allows to specify the different settings for
> 	     each segment in individual fashion.
>     n_seg - number of elements in the array
> 
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
> 
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
> 
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
> 
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if n_seg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/nics/features.rst             |  15 +++
>  doc/guides/rel_notes/release_20_11.rst   |   6 ++
>  lib/librte_ethdev/rte_ethdev.c           | 178 +++++++++++++++++++++++++++++++
>  lib/librte_ethdev/rte_ethdev.h           | 107 +++++++++++++++++++
>  lib/librte_ethdev/rte_ethdev_driver.h    |  10 ++
>  lib/librte_ethdev/rte_ethdev_version.map |   1 +
>  6 files changed, 317 insertions(+)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index dd8c955..21b91db 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
>  * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
>  
>  
> +.. _nic_features_buffer_split:
> +
> +Buffer Split on Rx
> +------------
> +
> +Scatters the packets being received on specified boundaries to segmented mbufs.
> +
> +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[implements] datapath**: ``Buffer Split functionality``.
> +* **[implements] rte_eth_dev_data**: ``buffer_split``.
> +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> +* **[related] API**: ``rte_eth_rxseg_queue_setup()``.
> +
> +
>  .. _nic_features_lro:
>  
>  LRO
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 2cec9dd..d87247a 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -60,6 +60,12 @@ New Features
>    Added the FEC API which provides functions for query FEC capabilities and
>    current FEC mode from device. Also, API for configuring FEC mode is also provided.
>  
> +* **Introduced extended buffer description for receiving.**
> +
> +  Added the extended Rx queue setup routine providing the individual
> +  descriptions for each Rx segment with maximal size, buffer offset and memory
> +  pool to allocate data buffers from.
> +
>  * **Updated Broadcom bnxt driver.**
>  
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 59beb8a..3a55567 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
>  #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
>  	{ DEV_RX_OFFLOAD_##_name, #_name }
>  
> +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
> +	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
> +
>  static const struct {
>  	uint64_t offload;
>  	const char *name;
> @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
>  	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
>  	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>  	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> +	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
>  };
>  
>  #undef RTE_RX_OFFLOAD_BIT2STR
> +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
>  
>  #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
>  	{ DEV_TX_OFFLOAD_##_name, #_name }
> @@ -1920,6 +1925,179 @@ struct rte_eth_dev *
>  }
>  
>  int
> +rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +			  uint16_t nb_rx_desc, unsigned int socket_id,
> +			  const struct rte_eth_rxconf *rx_conf,
> +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
> +{
> +	int ret;
> +	uint16_t seg_idx;
> +	uint32_t mbp_buf_size;

<start-of-dup>

> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_rxconf local_conf;
> +	void **rxq;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> +	dev = &rte_eth_devices[port_id];
> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> +		return -EINVAL;
> +	}

<end-of-dup>

> +
> +	if (rx_seg == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
> +		return -EINVAL;
> +	}
> +
> +	if (n_seg == 0) {
> +		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup, -ENOTSUP);
> +

<start-of-dup>

> +	/*
> +	 * Check the size of the mbuf data buffer.
> +	 * This value must be provided in the private data of the memory pool.
> +	 * First check that the memory pool has a valid private data.
> +	 */
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;

<end-of-dup>

> +
> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
> +
> +		if (mp->private_data_size <
> +				sizeof(struct rte_pktmbuf_pool_private)) {
> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> +				mp->name, (int)mp->private_data_size,
> +				(int)sizeof(struct rte_pktmbuf_pool_private));
> +			return -ENOSPC;
> +		}
> +
> +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +		if (mbp_buf_size < rx_seg[seg_idx].length +
> +				   rx_seg[seg_idx].offset +
> +				   (seg_idx ? 0 :
> +				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
> +			RTE_ETHDEV_LOG(ERR,
> +				"%s mbuf_data_room_size %d < %d"
> +				" (segment length=%d + segment offset=%d)\n",
> +				mp->name, (int)mbp_buf_size,
> +				(int)(rx_seg[seg_idx].length +
> +				      rx_seg[seg_idx].offset),
> +				(int)rx_seg[seg_idx].length,
> +				(int)rx_seg[seg_idx].offset);
> +			return -EINVAL;
> +		}
> +	}
> +

<start-of-huge-dup>

> +	/* Use default specified by driver, if nb_rx_desc is zero */
> +	if (nb_rx_desc == 0) {
> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> +		/* If driver default is also zero, fall back on EAL default */
> +		if (nb_rx_desc == 0)
> +			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> +	}
> +
> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> +
> +		RTE_ETHDEV_LOG(ERR,
> +			"Invalid value for nb_rx_desc(=%hu), should be: "
> +			"<= %hu, >= %hu, and a product of %hu\n",
> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> +			dev_info.rx_desc_lim.nb_min,
> +			dev_info.rx_desc_lim.nb_align);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started &&
> +		!(dev_info.dev_capa &
> +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> +		return -EBUSY;
> +
> +	if (dev->data->dev_started &&
> +		(dev->data->rx_queue_state[rx_queue_id] !=
> +			RTE_ETH_QUEUE_STATE_STOPPED))
> +		return -EBUSY;
> +
> +	rxq = dev->data->rx_queues;
> +	if (rxq[rx_queue_id]) {
> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> +					-ENOTSUP);
> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> +		rxq[rx_queue_id] = NULL;
> +	}
> +
> +	if (rx_conf == NULL)
> +		rx_conf = &dev_info.default_rxconf;
> +
> +	local_conf = *rx_conf;
> +
> +	/*
> +	 * If an offloading has already been enabled in
> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> +	 * so there is no need to enable it in this queue again.
> +	 * The local_conf.offloads input to underlying PMD only carries
> +	 * those offloadings which are only enabled on this queue and
> +	 * not enabled on all queues.
> +	 */
> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> +
> +	/*
> +	 * New added offloadings for this queue are those not enabled in
> +	 * rte_eth_dev_configure() and they must be per-queue type.
> +	 * A pure per-port offloading can't be enabled on a queue while
> +	 * disabled on another queue. A pure per-port offloading can't
> +	 * be enabled for any queue as new added one if it hasn't been
> +	 * enabled in rte_eth_dev_configure().
> +	 */
> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> +	     local_conf.offloads) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
> +			" 0x%"PRIx64" must be within per-queue offload"
> +			" capabilities 0x%"PRIx64" in %s()\n",
> +			port_id, rx_queue_id, local_conf.offloads,
> +			dev_info.rx_queue_offload_capa,
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * If LRO is enabled, check that the maximum aggregated packet
> +	 * size is supported by the configured device.
> +	 */
> +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
> +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
> +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
> +				dev->data->dev_conf.rxmode.max_rx_pkt_len;
> +		int ret = check_lro_pkt_size(port_id,
> +				dev->data->dev_conf.rxmode.max_lro_pkt_size,
> +				dev->data->dev_conf.rxmode.max_rx_pkt_len,
> +				dev_info.max_lro_pkt_size);
> +		if (ret != 0)
> +			return ret;
> +	}

<end-of-huge-dup>

IMO It is not acceptable to duplication so much code.
It is simply unmaintainable.

NACK

> +
> +	ret = (*dev->dev_ops->rxseg_queue_setup)(dev, rx_queue_id, nb_rx_desc,
> +						 socket_id, &local_conf,
> +						 rx_seg, n_seg);
> +	if (!ret) {
> +		if (!dev->data->min_rx_buf_size ||
> +		    dev->data->min_rx_buf_size > mbp_buf_size)
> +			dev->data->min_rx_buf_size = mbp_buf_size;
> +	}
> +
> +	return eth_err(port_id, ret);
> +}
> +
> +int
>  rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  			       uint16_t nb_rx_desc,
>  			       const struct rte_eth_hairpin_conf *conf)
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 3a31f94..bbf25c8 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -970,6 +970,16 @@ struct rte_eth_txmode {
>  };
>  
>  /**
> + * A structure used to configure an RX packet segment to split.
> + */
> +struct rte_eth_rxseg {
> +	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
> +	uint16_t length; /**< Segment data length, configures split point. */
> +	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
> +	uint32_t reserved; /**< Reserved field */
> +};
> +
> +/**
>   * A structure used to configure an RX ring of an Ethernet port.
>   */
>  struct rte_eth_rxconf {
> @@ -1260,6 +1270,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>  #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
> +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
>  
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>  				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> @@ -2037,6 +2048,102 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  		uint16_t nb_rx_desc, unsigned int socket_id,
>  		const struct rte_eth_rxconf *rx_conf,
>  		struct rte_mempool *mb_pool);
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate and set up a receive queue for an Ethernet device
> + * with specifying receiving segments parameters.
> + *
> + * The function allocates a contiguous block of memory for *nb_rx_desc*
> + * receive descriptors from a memory zone associated with *socket_id*.
> + * The descriptors might be divided into groups by PMD to receive the data
> + * into multi-segment packet presented by the chain of mbufs.
> + *
> + * Each descriptor within the group is initialized accordingly with
> + * the network buffers allocated from the specified memory pool and with
> + * specified buffer offset and maximal segment length.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_queue_id
> + *   The index of the receive queue to set up.
> + *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param nb_rx_desc
> + *   The number of receive descriptors to allocate for the receive ring.
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of NUMA.
> + *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
> + *   the DMA memory allocated for the receive descriptors of the ring.
> + * @param rx_conf
> + *   The pointer to the configuration data to be used for the receive queue.
> + *   NULL value is allowed, in which case default RX configuration
> + *   will be used.
> + *   The *rx_conf* structure contains an *rx_thresh* structure with the values
> + *   of the Prefetch, Host, and Write-Back threshold registers of the receive
> + *   ring.
> + *   In addition it contains the hardware offloads features to activate using
> + *   the DEV_RX_OFFLOAD_* flags.
> + *   If an offloading set in rx_conf->offloads
> + *   hasn't been set in the input argument eth_conf->rxmode.offloads
> + *   to rte_eth_dev_configure(), it is a new added offloading, it must be
> + *   per-queue type and it is enabled for the queue.
> + *   No need to repeat any bit in rx_conf->offloads which has already been
> + *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
> + *   at port level can't be disabled at queue level.
> + * @param rx_seg
> + *   The pointer to the array of segment descriptions, each element describes
> + *   the memory pool, maximal segment data length, initial data offset from
> + *   the beginning of data buffer in mbuf. This allow to specify the dedicated
> + *   properties for each segment in the receiving buffer - pool, buffer
> + *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
> + *   flag is configured the PMD will split the received packets into multiple
> + *   segments according to the specification in the description array:
> + *   - the first network buffer will be allocated from the memory pool,
> + *     specified in the first segment description element, the second
> + *     network buffer - from the pool in the second segment description
> + *     element and so on. If there is no enough elements to describe
> + *     the buffer for entire packet of maximal length the pool from the last
> + *     valid element will be used to allocate the buffers from for the rest
> + *     of segments.
> + *   - the offsets from the segment description elements will provide the
> + *     data offset from the buffer beginning except the first mbuf - for this
> + *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
> + *     offset from the buffer beginning. If there is no enough elements
> + *     to describe the buffer for entire packet of maximal length the offsets
> + *     for the rest of segment will be supposed to be zero.
> + *   - the data length being received to each segment is limited by the
> + *     length specified in the segment description element. The data receiving
> + *     starts with filling up the first mbuf data buffer, if the specified
> + *     maximal segment length is reached and there are data remaining
> + *     (packet is longer than buffer in the first mbuf) the following data
> + *     will be pushed to the next segment up to its own length. If the first
> + *     two segments is not enough to store all the packet data the next
> + *     (third) segment will be engaged and so on. If the length in the segment
> + *     description element is zero the actual buffer size will be deduced
> + *     from the appropriate memory pool properties. If there is no enough
> + *     elements to describe the buffer for entire packet of maximal length
> + *     the buffer size will be deduced from the pool of the last valid
> + *     element for the all remaining segments.
> + * @param n_seg
> + *   The number of elements in the segment description array.
> + * @return
> + *   - 0: Success, receive queue correctly set up.
> + *   - -EIO: if device is removed.
> + *   - -EINVAL: The segment descriptors array is empty (pointer to is null or
> + *      zero number of elements) or the size of network buffers which can be
> + *      allocated from this memory pool does not fit the various buffer sizes
> + *      allowed by the device controller.
> + *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
> + *      allocate network memory buffers from the memory pool when
> + *      initializing receive descriptors.
> + */
> +__rte_experimental
> +int rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> +		uint16_t nb_rx_desc, unsigned int socket_id,
> +		const struct rte_eth_rxconf *rx_conf,
> +		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
>  
>  /**
>   * @warning
> diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
> index 35cc4fb..5dee210 100644
> --- a/lib/librte_ethdev/rte_ethdev_driver.h
> +++ b/lib/librte_ethdev/rte_ethdev_driver.h
> @@ -264,6 +264,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
>  				    struct rte_mempool *mb_pool);
>  /**< @internal Set up a receive queue of an Ethernet device. */
>  
> +typedef int (*eth_rxseg_queue_setup_t)(struct rte_eth_dev *dev,
> +				       uint16_t rx_queue_id,
> +				       uint16_t nb_rx_desc,
> +				       unsigned int socket_id,
> +				       const struct rte_eth_rxconf *rx_conf,
> +				       const struct rte_eth_rxseg *rx_seg,
> +				       uint16_t n_seg);
> +/**< @internal extended Set up a receive queue of an Ethernet device. */
> +
>  typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
>  				    uint16_t tx_queue_id,
>  				    uint16_t nb_tx_desc,
> @@ -711,6 +720,7 @@ struct eth_dev_ops {
>  	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
>  	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
>  	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
> +	eth_rxseg_queue_setup_t    rxseg_queue_setup;/**< Extended RX setup. */
>  	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
>  
>  	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index f8a0945..d4b9849 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -195,6 +195,7 @@ EXPERIMENTAL {
>  	rte_flow_get_aged_flows;
>  
>  	# Marked as experimental in 20.11
> +	rte_eth_rxseg_queue_setup;
>  	rte_tm_capabilities_get;
>  	rte_tm_get_number_of_leaf_nodes;
>  	rte_tm_hierarchy_commit;
> 


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12 15:56               ` Ananyev, Konstantin
  2020-10-12 15:59                 ` Slava Ovsiienko
@ 2020-10-12 16:52                 ` Thomas Monjalon
  1 sibling, 0 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-12 16:52 UTC (permalink / raw)
  To: Slava Ovsiienko, Andrew Rybchenko, Yigit, Ferruh, Ananyev, Konstantin
  Cc: dev, stephen, Shahaf Shuler, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, Asaf Penso

12/10/2020 17:56, Ananyev, Konstantin:
> > From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > 12/10/2020 11:56, Slava Ovsiienko:
> > > > > We have two approaches how to specify multiple segments to split Rx
> > > packets:
> > > > > 1. update queue configuration structure 2. introduce new
> > > > > rx_queue_setup_ex() routine with extra parameters.
> > > > >
> > > > > For [1] my only actual dislike is that we would have multiple places
> > > > > to specify the pool - in rx_queue_setup() and in the config
> > > > > structure. So, we should implement some checking (if we have offload
> > > > > flag set we should check whether mp parameter is NULL and segment
> > > > > descriptions array pointer/size is provided, if no offload flag set - we must
> > > check the description array is empty).
> > > > >
> > > > > > @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers
> > > > > > think about it.
> > > > >
> > > > > Yes, it would be very nice to hear extra opinions. Do we think the
> > > > > providing of extra API function is worse than extending existing
> > > > > structure, introducing some conditional ambiguity and complicating
> > > > > the parameter compliance check?
> > > >
> > > > Let's try listing pros and cons of each approach, so we can conclude.
> > > >
> > > > 1/ update queue config struct
> > > >
> > > > 	1.1 pro: keep same queue setup function
> > > > 	1.2 con: two mempool pointers (struct or function)
> > > > 	1.3 con: variable size of segment description array
> > > >
> > > > 2/ new queue setup function
> > > >
> > > > 	2.1 con: two functions for queue setup
> > > > 	2.2 pro: mempool pointer is not redundant
> > > > 	2.3 pro: segment description array size defined by the caller
> > > >
> > > > What else I'm missing?
> > > >
> > >
> > > My 2 cents: can we make new (_ex) function to work for both original config
> > > (1 mp for all sizes, no split) and for new config (multiple mp, split allowed)?
> > > Then in future (21.11?) we can either get rid of original one, or even make it
> > > a wrapper around all one?
> > > Konstantin
> > 
> > Yes, actually the mlx5 PMD implementation follows this approach -
> > specifying the segment description array with the only element
> > and zero size/offset provides exactly the same configuration as existing
> > rte_eth_rx_queue_setup().
> > 
> > Currently I'm detailing the description  (how HEAD_ROOM is handled, what happens
> > if array is shorter the the buffer chain for segment of maximal size, the zero segment
> > size means follow the value deduced from the pool and so on).
> > 
> > So, may we consider this point as one more "pro" to setup_ex approach ? 😊
> 
> From my perspective, yes.
> It is sort of more gradual approach.
> I expect it would be experimental function for some time,
> so we'll have time to try it, adjust, fix, etc without breaking original one.

I like the wrapper idea.
Is it possible to call rte_eth_rx_queue_setup_ex()
from rte_eth_rx_queue_setup() using a rte_eth_rxseg object on the stack?



^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 16:38     ` Andrew Rybchenko
@ 2020-10-12 17:03       ` Thomas Monjalon
  2020-10-12 17:11         ` Andrew Rybchenko
  2020-10-12 17:11         ` Slava Ovsiienko
  0 siblings, 2 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-12 17:03 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, Andrew Rybchenko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand

12/10/2020 18:38, Andrew Rybchenko:
> On 10/12/20 7:19 PM, Viacheslav Ovsiienko wrote:
> >  int
> > +rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > +			  uint16_t nb_rx_desc, unsigned int socket_id,
> > +			  const struct rte_eth_rxconf *rx_conf,
> > +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
> > +{
> > +	int ret;
> > +	uint16_t seg_idx;
> > +	uint32_t mbp_buf_size;
> 
> <start-of-dup>
> 
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_eth_rxconf local_conf;
> > +	void **rxq;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
> > +		return -EINVAL;
> > +	}
> 
> <end-of-dup>
> 
> > +
> > +	if (rx_seg == NULL) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (n_seg == 0) {
> > +		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup, -ENOTSUP);
> > +
> 
> <start-of-dup>
> 
> > +	/*
> > +	 * Check the size of the mbuf data buffer.
> > +	 * This value must be provided in the private data of the memory pool.
> > +	 * First check that the memory pool has a valid private data.
> > +	 */
> > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +	if (ret != 0)
> > +		return ret;
> 
> <end-of-dup>
> 
> > +
> > +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> > +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
> > +
> > +		if (mp->private_data_size <
> > +				sizeof(struct rte_pktmbuf_pool_private)) {
> > +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> > +				mp->name, (int)mp->private_data_size,
> > +				(int)sizeof(struct rte_pktmbuf_pool_private));
> > +			return -ENOSPC;
> > +		}
> > +
> > +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> > +		if (mbp_buf_size < rx_seg[seg_idx].length +
> > +				   rx_seg[seg_idx].offset +
> > +				   (seg_idx ? 0 :
> > +				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				"%s mbuf_data_room_size %d < %d"
> > +				" (segment length=%d + segment offset=%d)\n",
> > +				mp->name, (int)mbp_buf_size,
> > +				(int)(rx_seg[seg_idx].length +
> > +				      rx_seg[seg_idx].offset),
> > +				(int)rx_seg[seg_idx].length,
> > +				(int)rx_seg[seg_idx].offset);
> > +			return -EINVAL;
> > +		}
> > +	}
> > +
> 
> <start-of-huge-dup>
> 
> > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > +	if (nb_rx_desc == 0) {
> > +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> > +		/* If driver default is also zero, fall back on EAL default */
> > +		if (nb_rx_desc == 0)
> > +			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> > +	}
> > +
> > +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> > +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> > +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> > +
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Invalid value for nb_rx_desc(=%hu), should be: "
> > +			"<= %hu, >= %hu, and a product of %hu\n",
> > +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> > +			dev_info.rx_desc_lim.nb_min,
> > +			dev_info.rx_desc_lim.nb_align);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.dev_capa &
> > +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> > +		return -EBUSY;
> > +
> > +	if (dev->data->dev_started &&
> > +		(dev->data->rx_queue_state[rx_queue_id] !=
> > +			RTE_ETH_QUEUE_STATE_STOPPED))
> > +		return -EBUSY;
> > +
> > +	rxq = dev->data->rx_queues;
> > +	if (rxq[rx_queue_id]) {
> > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > +					-ENOTSUP);
> > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > +		rxq[rx_queue_id] = NULL;
> > +	}
> > +
> > +	if (rx_conf == NULL)
> > +		rx_conf = &dev_info.default_rxconf;
> > +
> > +	local_conf = *rx_conf;
> > +
> > +	/*
> > +	 * If an offloading has already been enabled in
> > +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> > +	 * so there is no need to enable it in this queue again.
> > +	 * The local_conf.offloads input to underlying PMD only carries
> > +	 * those offloadings which are only enabled on this queue and
> > +	 * not enabled on all queues.
> > +	 */
> > +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> > +
> > +	/*
> > +	 * New added offloadings for this queue are those not enabled in
> > +	 * rte_eth_dev_configure() and they must be per-queue type.
> > +	 * A pure per-port offloading can't be enabled on a queue while
> > +	 * disabled on another queue. A pure per-port offloading can't
> > +	 * be enabled for any queue as new added one if it hasn't been
> > +	 * enabled in rte_eth_dev_configure().
> > +	 */
> > +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> > +	     local_conf.offloads) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
> > +			" 0x%"PRIx64" must be within per-queue offload"
> > +			" capabilities 0x%"PRIx64" in %s()\n",
> > +			port_id, rx_queue_id, local_conf.offloads,
> > +			dev_info.rx_queue_offload_capa,
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * If LRO is enabled, check that the maximum aggregated packet
> > +	 * size is supported by the configured device.
> > +	 */
> > +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
> > +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
> > +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
> > +				dev->data->dev_conf.rxmode.max_rx_pkt_len;
> > +		int ret = check_lro_pkt_size(port_id,
> > +				dev->data->dev_conf.rxmode.max_lro_pkt_size,
> > +				dev->data->dev_conf.rxmode.max_rx_pkt_len,
> > +				dev_info.max_lro_pkt_size);
> > +		if (ret != 0)
> > +			return ret;
> > +	}
> 
> <end-of-huge-dup>
> 
> IMO It is not acceptable to duplication so much code.
> It is simply unmaintainable.
> 
> NACK

Can it be solved by making rte_eth_rx_queue_setup() a wrapper
on top of this new rte_eth_rxseg_queue_setup() ?




^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 17:03       ` Thomas Monjalon
@ 2020-10-12 17:11         ` Andrew Rybchenko
  2020-10-12 20:22           ` Slava Ovsiienko
  2020-10-12 17:11         ` Slava Ovsiienko
  1 sibling, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-12 17:11 UTC (permalink / raw)
  To: Thomas Monjalon, Viacheslav Ovsiienko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand

On 10/12/20 8:03 PM, Thomas Monjalon wrote:
> 12/10/2020 18:38, Andrew Rybchenko:
>> On 10/12/20 7:19 PM, Viacheslav Ovsiienko wrote:
>>>  int
>>> +rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>>> +			  uint16_t nb_rx_desc, unsigned int socket_id,
>>> +			  const struct rte_eth_rxconf *rx_conf,
>>> +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
>>> +{
>>> +	int ret;
>>> +	uint16_t seg_idx;
>>> +	uint32_t mbp_buf_size;
>>
>> <start-of-dup>
>>
>>> +	struct rte_eth_dev *dev;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	struct rte_eth_rxconf local_conf;
>>> +	void **rxq;
>>> +
>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>>> +
>>> +	dev = &rte_eth_devices[port_id];
>>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id);
>>> +		return -EINVAL;
>>> +	}
>>
>> <end-of-dup>
>>
>>> +
>>> +	if (rx_seg == NULL) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (n_seg == 0) {
>>> +		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup, -ENOTSUP);
>>> +
>>
>> <start-of-dup>
>>
>>> +	/*
>>> +	 * Check the size of the mbuf data buffer.
>>> +	 * This value must be provided in the private data of the memory pool.
>>> +	 * First check that the memory pool has a valid private data.
>>> +	 */
>>> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
>>> +	if (ret != 0)
>>> +		return ret;
>>
>> <end-of-dup>
>>
>>> +
>>> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
>>> +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
>>> +
>>> +		if (mp->private_data_size <
>>> +				sizeof(struct rte_pktmbuf_pool_private)) {
>>> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
>>> +				mp->name, (int)mp->private_data_size,
>>> +				(int)sizeof(struct rte_pktmbuf_pool_private));
>>> +			return -ENOSPC;
>>> +		}
>>> +
>>> +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
>>> +		if (mbp_buf_size < rx_seg[seg_idx].length +
>>> +				   rx_seg[seg_idx].offset +
>>> +				   (seg_idx ? 0 :
>>> +				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
>>> +			RTE_ETHDEV_LOG(ERR,
>>> +				"%s mbuf_data_room_size %d < %d"
>>> +				" (segment length=%d + segment offset=%d)\n",
>>> +				mp->name, (int)mbp_buf_size,
>>> +				(int)(rx_seg[seg_idx].length +
>>> +				      rx_seg[seg_idx].offset),
>>> +				(int)rx_seg[seg_idx].length,
>>> +				(int)rx_seg[seg_idx].offset);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +
>>
>> <start-of-huge-dup>
>>
>>> +	/* Use default specified by driver, if nb_rx_desc is zero */
>>> +	if (nb_rx_desc == 0) {
>>> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
>>> +		/* If driver default is also zero, fall back on EAL default */
>>> +		if (nb_rx_desc == 0)
>>> +			nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
>>> +	}
>>> +
>>> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
>>> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
>>> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
>>> +
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Invalid value for nb_rx_desc(=%hu), should be: "
>>> +			"<= %hu, >= %hu, and a product of %hu\n",
>>> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
>>> +			dev_info.rx_desc_lim.nb_min,
>>> +			dev_info.rx_desc_lim.nb_align);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		!(dev_info.dev_capa &
>>> +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
>>> +		return -EBUSY;
>>> +
>>> +	if (dev->data->dev_started &&
>>> +		(dev->data->rx_queue_state[rx_queue_id] !=
>>> +			RTE_ETH_QUEUE_STATE_STOPPED))
>>> +		return -EBUSY;
>>> +
>>> +	rxq = dev->data->rx_queues;
>>> +	if (rxq[rx_queue_id]) {
>>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
>>> +					-ENOTSUP);
>>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>>> +		rxq[rx_queue_id] = NULL;
>>> +	}
>>> +
>>> +	if (rx_conf == NULL)
>>> +		rx_conf = &dev_info.default_rxconf;
>>> +
>>> +	local_conf = *rx_conf;
>>> +
>>> +	/*
>>> +	 * If an offloading has already been enabled in
>>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
>>> +	 * so there is no need to enable it in this queue again.
>>> +	 * The local_conf.offloads input to underlying PMD only carries
>>> +	 * those offloadings which are only enabled on this queue and
>>> +	 * not enabled on all queues.
>>> +	 */
>>> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
>>> +
>>> +	/*
>>> +	 * New added offloadings for this queue are those not enabled in
>>> +	 * rte_eth_dev_configure() and they must be per-queue type.
>>> +	 * A pure per-port offloading can't be enabled on a queue while
>>> +	 * disabled on another queue. A pure per-port offloading can't
>>> +	 * be enabled for any queue as new added one if it hasn't been
>>> +	 * enabled in rte_eth_dev_configure().
>>> +	 */
>>> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
>>> +	     local_conf.offloads) {
>>> +		RTE_ETHDEV_LOG(ERR,
>>> +			"Ethdev port_id=%d rx_queue_id=%d, new added offloads"
>>> +			" 0x%"PRIx64" must be within per-queue offload"
>>> +			" capabilities 0x%"PRIx64" in %s()\n",
>>> +			port_id, rx_queue_id, local_conf.offloads,
>>> +			dev_info.rx_queue_offload_capa,
>>> +			__func__);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If LRO is enabled, check that the maximum aggregated packet
>>> +	 * size is supported by the configured device.
>>> +	 */
>>> +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
>>> +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
>>> +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
>>> +				dev->data->dev_conf.rxmode.max_rx_pkt_len;
>>> +		int ret = check_lro_pkt_size(port_id,
>>> +				dev->data->dev_conf.rxmode.max_lro_pkt_size,
>>> +				dev->data->dev_conf.rxmode.max_rx_pkt_len,
>>> +				dev_info.max_lro_pkt_size);
>>> +		if (ret != 0)
>>> +			return ret;
>>> +	}
>>
>> <end-of-huge-dup>
>>
>> IMO It is not acceptable to duplication so much code.
>> It is simply unmaintainable.
>>
>> NACK
> 
> Can it be solved by making rte_eth_rx_queue_setup() a wrapper
> on top of this new rte_eth_rxseg_queue_setup() ?
> 

Could be, but strictly speaking it will break arguments
validation order and error reporting in various cases.
So, refactoring is required to keep it consistent.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 17:03       ` Thomas Monjalon
  2020-10-12 17:11         ` Andrew Rybchenko
@ 2020-10-12 17:11         ` Slava Ovsiienko
  1 sibling, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12 17:11 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon, Andrew Rybchenko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, October 12, 2020 20:03
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen@networkplumber.org; ferruh.yigit@intel.com;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
> 
> 12/10/2020 18:38, Andrew Rybchenko:
> > On 10/12/20 7:19 PM, Viacheslav Ovsiienko wrote:
> > >  int
> > > +rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> > > +			  uint16_t nb_rx_desc, unsigned int socket_id,
> > > +			  const struct rte_eth_rxconf *rx_conf,
> > > +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg) {
> > > +	int ret;
> > > +	uint16_t seg_idx;
> > > +	uint32_t mbp_buf_size;
> >
> > <start-of-dup>
> >
> > > +	struct rte_eth_dev *dev;
> > > +	struct rte_eth_dev_info dev_info;
> > > +	struct rte_eth_rxconf local_conf;
> > > +	void **rxq;
> > > +
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > > +
> > > +	dev = &rte_eth_devices[port_id];
> > > +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> > > +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> > > +		return -EINVAL;
> > > +	}
> >
> > <end-of-dup>
> >
> > > +
> > > +	if (rx_seg == NULL) {
> > > +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (n_seg == 0) {
> > > +		RTE_ETHDEV_LOG(ERR, "Invalid zero description
> number\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup,
> > > +-ENOTSUP);
> > > +
> >
> > <start-of-dup>
> >
> > > +	/*
> > > +	 * Check the size of the mbuf data buffer.
> > > +	 * This value must be provided in the private data of the memory
> pool.
> > > +	 * First check that the memory pool has a valid private data.
> > > +	 */
> > > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > > +	if (ret != 0)
> > > +		return ret;
> >
> > <end-of-dup>
> >
> > > +
> > > +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> > > +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
> > > +
> > > +		if (mp->private_data_size <
> > > +				sizeof(struct rte_pktmbuf_pool_private)) {
> > > +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d <
> %d\n",
> > > +				mp->name, (int)mp->private_data_size,
> > > +				(int)sizeof(struct
> rte_pktmbuf_pool_private));
> > > +			return -ENOSPC;
> > > +		}
> > > +
> > > +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> > > +		if (mbp_buf_size < rx_seg[seg_idx].length +
> > > +				   rx_seg[seg_idx].offset +
> > > +				   (seg_idx ? 0 :
> > > +				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
> > > +			RTE_ETHDEV_LOG(ERR,
> > > +				"%s mbuf_data_room_size %d < %d"
> > > +				" (segment length=%d + segment
> offset=%d)\n",
> > > +				mp->name, (int)mbp_buf_size,
> > > +				(int)(rx_seg[seg_idx].length +
> > > +				      rx_seg[seg_idx].offset),
> > > +				(int)rx_seg[seg_idx].length,
> > > +				(int)rx_seg[seg_idx].offset);
> > > +			return -EINVAL;
> > > +		}
> > > +	}
> > > +
> >
> > <start-of-huge-dup>
> >
> > > +	/* Use default specified by driver, if nb_rx_desc is zero */
> > > +	if (nb_rx_desc == 0) {
> > > +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> > > +		/* If driver default is also zero, fall back on EAL default */
> > > +		if (nb_rx_desc == 0)
> > > +			nb_rx_desc =
> RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> > > +	}
> > > +
> > > +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> > > +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> > > +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> > > +
> > > +		RTE_ETHDEV_LOG(ERR,
> > > +			"Invalid value for nb_rx_desc(=%hu), should be: "
> > > +			"<= %hu, >= %hu, and a product of %hu\n",
> > > +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> > > +			dev_info.rx_desc_lim.nb_min,
> > > +			dev_info.rx_desc_lim.nb_align);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (dev->data->dev_started &&
> > > +		!(dev_info.dev_capa &
> > > +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> > > +		return -EBUSY;
> > > +
> > > +	if (dev->data->dev_started &&
> > > +		(dev->data->rx_queue_state[rx_queue_id] !=
> > > +			RTE_ETH_QUEUE_STATE_STOPPED))
> > > +		return -EBUSY;
> > > +
> > > +	rxq = dev->data->rx_queues;
> > > +	if (rxq[rx_queue_id]) {
> > > +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> > > +					-ENOTSUP);
> > > +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > > +		rxq[rx_queue_id] = NULL;
> > > +	}
> > > +
> > > +	if (rx_conf == NULL)
> > > +		rx_conf = &dev_info.default_rxconf;
> > > +
> > > +	local_conf = *rx_conf;
> > > +
> > > +	/*
> > > +	 * If an offloading has already been enabled in
> > > +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> > > +	 * so there is no need to enable it in this queue again.
> > > +	 * The local_conf.offloads input to underlying PMD only carries
> > > +	 * those offloadings which are only enabled on this queue and
> > > +	 * not enabled on all queues.
> > > +	 */
> > > +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> > > +
> > > +	/*
> > > +	 * New added offloadings for this queue are those not enabled in
> > > +	 * rte_eth_dev_configure() and they must be per-queue type.
> > > +	 * A pure per-port offloading can't be enabled on a queue while
> > > +	 * disabled on another queue. A pure per-port offloading can't
> > > +	 * be enabled for any queue as new added one if it hasn't been
> > > +	 * enabled in rte_eth_dev_configure().
> > > +	 */
> > > +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> > > +	     local_conf.offloads) {
> > > +		RTE_ETHDEV_LOG(ERR,
> > > +			"Ethdev port_id=%d rx_queue_id=%d, new added
> offloads"
> > > +			" 0x%"PRIx64" must be within per-queue offload"
> > > +			" capabilities 0x%"PRIx64" in %s()\n",
> > > +			port_id, rx_queue_id, local_conf.offloads,
> > > +			dev_info.rx_queue_offload_capa,
> > > +			__func__);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	/*
> > > +	 * If LRO is enabled, check that the maximum aggregated packet
> > > +	 * size is supported by the configured device.
> > > +	 */
> > > +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
> > > +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
> > > +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
> > > +				dev->data-
> >dev_conf.rxmode.max_rx_pkt_len;
> > > +		int ret = check_lro_pkt_size(port_id,
> > > +				dev->data-
> >dev_conf.rxmode.max_lro_pkt_size,
> > > +				dev->data-
> >dev_conf.rxmode.max_rx_pkt_len,
> > > +				dev_info.max_lro_pkt_size);
> > > +		if (ret != 0)
> > > +			return ret;
> > > +	}
> >
> > <end-of-huge-dup>
> >
> > IMO It is not acceptable to duplication so much code.
> > It is simply unmaintainable.
> >
> > NACK
> 
> Can it be solved by making rte_eth_rx_queue_setup() a wrapper on top of
> this new rte_eth_rxseg_queue_setup() ?
> 
It would be the code refactoring. The more simple solution - provide the subroutine to perform the common part of parameters check.

It seems there are no strong decision-making pro's and con's for these two approaches.
As I said - from my side the main concern of including segment descriptions into config structure
is introducing ambiguity of some kind. But, if we decide to switch to this approach - will handle.

With best regards, Slava




^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
                   ` (3 preceding siblings ...)
  2020-10-12 16:19 ` [dpdk-dev] [PATCH v3 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-12 20:09 ` Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 1/9] " Viacheslav Ovsiienko
                     ` (8 more replies)
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (8 subsequent siblings)
  13 siblings, 9 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rxseg_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
    n_seg - number of elements in the array

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v1: http://patches.dpdk.org/patch/79594/
v2: http://patches.dpdk.org/patch/79893/
    - add feature support to mlx5 PMD

v3: http://patches.dpdk.org/patch/80389/
    - rte_eth_rx_queue_setup_ex is renamed to rte_eth_rxseg_queue_setup
    - DEV_RX_OFFLOAD_BUFFER_SPLIT is renamed to
      RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
    - commit message update
    - documentaion provided
    - release notes update
    - minor bug fixes in testpmd related part

v4: - common part of rx_queue_setup/rxseg_queue_setup


Viacheslav Ovsiienko (9):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add extended Rx queue setup
  net/mlx5: add extended Rx queue setup routine
  net/mlx5: configure Rx queue to support split
  net/mlx5: register multiple pool for Rx queue
  net/mlx5: update Rx datapath to support split

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      |  96 +++++++++++---
 app/test-pmd/config.c                       |  63 ++++++++-
 app/test-pmd/parameters.c                   |  39 +++++-
 app/test-pmd/testpmd.c                      | 108 +++++++++++-----
 app/test-pmd/testpmd.h                      |  41 +++++-
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/release_20_11.rst      |   6 +
 doc/guides/testpmd_app_ug/run_app.rst       |  16 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 ++-
 drivers/net/mlx5/linux/mlx5_os.c            |   2 +
 drivers/net/mlx5/mlx5.h                     |   3 +
 drivers/net/mlx5/mlx5_mr.c                  |   3 +
 drivers/net/mlx5/mlx5_rxq.c                 | 194 +++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.c                |   3 +-
 drivers/net/mlx5/mlx5_rxtx.h                |  10 +-
 drivers/net/mlx5/mlx5_trigger.c             |  20 +--
 lib/librte_ethdev/ethdev_trace_points.c     |   3 +
 lib/librte_ethdev/rte_ethdev.c              | 133 +++++++++++++++----
 lib/librte_ethdev/rte_ethdev.h              | 107 +++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h       |  10 ++
 lib/librte_ethdev/rte_ethdev_trace.h        |  19 +++
 lib/librte_ethdev/rte_ethdev_version.map    |   3 +
 23 files changed, 780 insertions(+), 139 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 1/9] ethdev: introduce Rx buffer split
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` " Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The new routine rte_eth_rxseg_queue_setup_ex() is introduced to
setup the given Rx queue using the new extended Rx packet segment
description:

int
rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id,
                          uint16_t nb_rx_desc, unsigned int socket_id,
                          const struct rte_eth_rxconf *rx_conf,
		          const struct rte_eth_rxseg *rx_seg,
                          uint16_t n_seg)

This routine presents the two new parameters:
    rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
    n_seg - number of elements in the array

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst             |  15 ++++
 doc/guides/rel_notes/release_20_11.rst   |   6 ++
 lib/librte_ethdev/ethdev_trace_points.c  |   3 +
 lib/librte_ethdev/rte_ethdev.c           | 133 ++++++++++++++++++++++++-------
 lib/librte_ethdev/rte_ethdev.h           | 107 +++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_driver.h    |  10 +++
 lib/librte_ethdev/rte_ethdev_trace.h     |  19 +++++
 lib/librte_ethdev/rte_ethdev_version.map |   3 +
 8 files changed, 268 insertions(+), 28 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..21b91db 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rxseg_queue_setup()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index bcc0fc2..06a35c1 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -60,6 +60,12 @@ New Features
   Added the FEC API which provides functions for query FEC capabilities and
   current FEC mode from device. Also, API for configuring FEC mode is also provided.
 
+* **Introduced extended buffer description for receiving.**
+
+  Added the extended Rx queue setup routine providing the individual
+  descriptions for each Rx segment with maximal size, buffer offset and memory
+  pool to allocate data buffers from.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
diff --git a/lib/librte_ethdev/ethdev_trace_points.c b/lib/librte_ethdev/ethdev_trace_points.c
index 2919409..0ec8fc4 100644
--- a/lib/librte_ethdev/ethdev_trace_points.c
+++ b/lib/librte_ethdev/ethdev_trace_points.c
@@ -12,6 +12,9 @@
 RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_rxq_setup,
 	lib.ethdev.rxq.setup)
 
+RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_rxq_seg_setup,
+	lib.ethdev.rxq.setup)
+
 RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_txq_setup,
 	lib.ethdev.txq.setup)
 
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 892c246..579acf9 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
 #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_RX_OFFLOAD_##_name, #_name }
 
+#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
+	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
+
 static const struct {
 	uint64_t offload;
 	const char *name;
@@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
+#undef RTE_ETH_RX_OFFLOAD_BIT2STR
 
 #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_TX_OFFLOAD_##_name, #_name }
@@ -1763,13 +1768,14 @@ struct rte_eth_dev *
 	return ret;
 }
 
-int
-rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
-		       uint16_t nb_rx_desc, unsigned int socket_id,
-		       const struct rte_eth_rxconf *rx_conf,
-		       struct rte_mempool *mp)
+static int
+__rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			 uint16_t nb_rx_desc, unsigned int socket_id,
+			 const struct rte_eth_rxconf *rx_conf,
+			 const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
-	int ret;
+	int ret, ext;
+	uint16_t seg_idx;
 	uint32_t mbp_buf_size;
 	struct rte_eth_dev *dev;
 	struct rte_eth_dev_info dev_info;
@@ -1784,12 +1790,23 @@ struct rte_eth_dev *
 		return -EINVAL;
 	}
 
-	if (mp == NULL) {
-		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
+	if (rx_seg == NULL) {
+		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
+		return -EINVAL;
+	}
+
+	if (n_seg == 0) {
+		RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n");
 		return -EINVAL;
 	}
 
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
+	ext = rx_seg[0].length || rx_seg[0].offset || n_seg > 1;
+	if (ext)
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup,
+					-ENOTSUP);
+	else
+		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
+					-ENOTSUP);
 
 	/*
 	 * Check the size of the mbuf data buffer.
@@ -1800,22 +1817,48 @@ struct rte_eth_dev *
 	if (ret != 0)
 		return ret;
 
-	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
-		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
-			mp->name, (int)mp->private_data_size,
-			(int)sizeof(struct rte_pktmbuf_pool_private));
-		return -ENOSPC;
-	}
-	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mp = rx_seg[seg_idx].mp;
+		uint32_t length = rx_seg[seg_idx].length;
+		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
 
-	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
-		RTE_ETHDEV_LOG(ERR,
-			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
-			mp->name, (int)mbp_buf_size,
-			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
-			(int)RTE_PKTMBUF_HEADROOM,
-			(int)dev_info.min_rx_bufsize);
-		return -EINVAL;
+		if (mp == NULL) {
+			RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
+			return -EINVAL;
+		}
+
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mp->name, (int)mp->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		length = length ? length : (mbp_buf_size - head_room);
+		if (mbp_buf_size < length + offset + head_room) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %u < %u"
+				" (segment length=%u + segment offset=%u)\n",
+				mp->name, mbp_buf_size,
+				length + offset, length, offset);
+			return -EINVAL;
+		}
+		if (!ext && (mbp_buf_size < dev_info.min_rx_bufsize +
+					    RTE_PKTMBUF_HEADROOM)) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s mbuf_data_room_size %u < %u "
+				       "(RTE_PKTMBUF_HEADROOM=%u + "
+				       "min_rx_bufsize(dev)=%u)\n",
+				       mp->name, mbp_buf_size,
+				       (RTE_PKTMBUF_HEADROOM +
+				       dev_info.min_rx_bufsize),
+				       RTE_PKTMBUF_HEADROOM,
+				       dev_info.min_rx_bufsize);
+			return -EINVAL;
+		}
 	}
 
 	/* Use default specified by driver, if nb_rx_desc is zero */
@@ -1906,20 +1949,54 @@ struct rte_eth_dev *
 			return ret;
 	}
 
-	ret = (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc,
-					      socket_id, &local_conf, mp);
+	ret = ext ?
+	      (*dev->dev_ops->rxseg_queue_setup)(dev, rx_queue_id, nb_rx_desc,
+						 socket_id, &local_conf,
+						 rx_seg, n_seg) :
+	      (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc,
+					      socket_id, &local_conf,
+					      rx_seg[0].mp);
 	if (!ret) {
 		if (!dev->data->min_rx_buf_size ||
 		    dev->data->min_rx_buf_size > mbp_buf_size)
 			dev->data->min_rx_buf_size = mbp_buf_size;
 	}
 
-	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc, mp,
-		rx_conf, ret);
 	return eth_err(port_id, ret);
 }
 
 int
+rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+		       uint16_t nb_rx_desc, unsigned int socket_id,
+		       const struct rte_eth_rxconf *rx_conf,
+		       struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg = {.mp = mp};
+	int ret;
+
+	ret = __rte_eth_rx_queue_setup(port_id, rx_queue_id, nb_rx_desc,
+				       socket_id, rx_conf, &rx_seg, 1);
+	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc,
+				   mp, rx_conf, ret);
+	return ret;
+}
+
+int
+rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+			  uint16_t nb_rx_desc, unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
+{
+	int ret;
+
+	ret = __rte_eth_rx_queue_setup(port_id, rx_queue_id, nb_rx_desc,
+				       socket_id, rx_conf, rx_seg, n_seg);
+	rte_ethdev_trace_rxq_seg_setup(port_id, rx_queue_id, nb_rx_desc,
+				       rx_conf, rx_seg, n_seg, ret);
+	return ret;
+}
+
+int
 rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			       uint16_t nb_rx_desc,
 			       const struct rte_eth_hairpin_conf *conf)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 5bcfbb8..2596f6e 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment data length, configures split point. */
+	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -1260,6 +1270,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2044,6 +2055,102 @@ int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 		uint16_t nb_rx_desc, unsigned int socket_id,
 		const struct rte_eth_rxconf *rx_conf,
 		struct rte_mempool *mb_pool);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate and set up a receive queue for an Ethernet device
+ * with specifying receiving segments parameters.
+ *
+ * The function allocates a contiguous block of memory for *nb_rx_desc*
+ * receive descriptors from a memory zone associated with *socket_id*.
+ * The descriptors might be divided into groups by PMD to receive the data
+ * into multi-segment packet presented by the chain of mbufs.
+ *
+ * Each descriptor within the group is initialized accordingly with
+ * the network buffers allocated from the specified memory pool and with
+ * specified buffer offset and maximal segment length.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ *   The index of the receive queue to set up.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param nb_rx_desc
+ *   The number of receive descriptors to allocate for the receive ring.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of NUMA.
+ *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for
+ *   the DMA memory allocated for the receive descriptors of the ring.
+ * @param rx_conf
+ *   The pointer to the configuration data to be used for the receive queue.
+ *   NULL value is allowed, in which case default RX configuration
+ *   will be used.
+ *   The *rx_conf* structure contains an *rx_thresh* structure with the values
+ *   of the Prefetch, Host, and Write-Back threshold registers of the receive
+ *   ring.
+ *   In addition it contains the hardware offloads features to activate using
+ *   the DEV_RX_OFFLOAD_* flags.
+ *   If an offloading set in rx_conf->offloads
+ *   hasn't been set in the input argument eth_conf->rxmode.offloads
+ *   to rte_eth_dev_configure(), it is a new added offloading, it must be
+ *   per-queue type and it is enabled for the queue.
+ *   No need to repeat any bit in rx_conf->offloads which has already been
+ *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
+ *   at port level can't be disabled at queue level.
+ * @param rx_seg
+ *   The pointer to the array of segment descriptions, each element describes
+ *   the memory pool, maximal segment data length, initial data offset from
+ *   the beginning of data buffer in mbuf. This allow to specify the dedicated
+ *   properties for each segment in the receiving buffer - pool, buffer
+ *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
+ *   flag is configured the PMD will split the received packets into multiple
+ *   segments according to the specification in the description array:
+ *   - the first network buffer will be allocated from the memory pool,
+ *     specified in the first segment description element, the second
+ *     network buffer - from the pool in the second segment description
+ *     element and so on. If there is no enough elements to describe
+ *     the buffer for entire packet of maximal length the pool from the last
+ *     valid element will be used to allocate the buffers from for the rest
+ *     of segments.
+ *   - the offsets from the segment description elements will provide the
+ *     data offset from the buffer beginning except the first mbuf - for this
+ *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
+ *     offset from the buffer beginning. If there is no enough elements
+ *     to describe the buffer for entire packet of maximal length the offsets
+ *     for the rest of segment will be supposed to be zero.
+ *   - the data length being received to each segment is limited by the
+ *     length specified in the segment description element. The data receiving
+ *     starts with filling up the first mbuf data buffer, if the specified
+ *     maximal segment length is reached and there are data remaining
+ *     (packet is longer than buffer in the first mbuf) the following data
+ *     will be pushed to the next segment up to its own length. If the first
+ *     two segments is not enough to store all the packet data the next
+ *     (third) segment will be engaged and so on. If the length in the segment
+ *     description element is zero the actual buffer size will be deduced
+ *     from the appropriate memory pool properties. If there is no enough
+ *     elements to describe the buffer for entire packet of maximal length
+ *     the buffer size will be deduced from the pool of the last valid
+ *     element for the all remaining segments.
+ * @param n_seg
+ *   The number of elements in the segment description array.
+ * @return
+ *   - 0: Success, receive queue correctly set up.
+ *   - -EIO: if device is removed.
+ *   - -EINVAL: The segment descriptors array is empty (pointer to is null or
+ *      zero number of elements) or the size of network buffers which can be
+ *      allocated from this memory pool does not fit the various buffer sizes
+ *      allowed by the device controller.
+ *   - -ENOMEM: Unable to allocate the receive ring descriptors or to
+ *      allocate network memory buffers from the memory pool when
+ *      initializing receive descriptors.
+ */
+__rte_experimental
+int rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
 
 /**
  * @warning
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h
index 35cc4fb..5dee210 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -264,6 +264,15 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev *dev,
 				    struct rte_mempool *mb_pool);
 /**< @internal Set up a receive queue of an Ethernet device. */
 
+typedef int (*eth_rxseg_queue_setup_t)(struct rte_eth_dev *dev,
+				       uint16_t rx_queue_id,
+				       uint16_t nb_rx_desc,
+				       unsigned int socket_id,
+				       const struct rte_eth_rxconf *rx_conf,
+				       const struct rte_eth_rxseg *rx_seg,
+				       uint16_t n_seg);
+/**< @internal extended Set up a receive queue of an Ethernet device. */
+
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    uint16_t tx_queue_id,
 				    uint16_t nb_tx_desc,
@@ -711,6 +720,7 @@ struct eth_dev_ops {
 	eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
 	eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
 	eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
+	eth_rxseg_queue_setup_t    rxseg_queue_setup;/**< Extended RX setup. */
 	eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
 
 	eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
diff --git a/lib/librte_ethdev/rte_ethdev_trace.h b/lib/librte_ethdev/rte_ethdev_trace.h
index 16f5bf2..7341ae9 100644
--- a/lib/librte_ethdev/rte_ethdev_trace.h
+++ b/lib/librte_ethdev/rte_ethdev_trace.h
@@ -55,6 +55,25 @@
 )
 
 RTE_TRACE_POINT(
+	rte_ethdev_trace_rxq_seg_setup,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, const struct rte_eth_rxconf *rx_conf,
+		const struct rte_eth_rxseg *rx_seg, uint16_t n_seg, int rc),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(rx_queue_id);
+	rte_trace_point_emit_u16(nb_rx_desc);
+	rte_trace_point_emit_u8(rx_conf->rx_thresh.pthresh);
+	rte_trace_point_emit_u8(rx_conf->rx_thresh.hthresh);
+	rte_trace_point_emit_u8(rx_conf->rx_thresh.wthresh);
+	rte_trace_point_emit_u8(rx_conf->rx_drop_en);
+	rte_trace_point_emit_u8(rx_conf->rx_deferred_start);
+	rte_trace_point_emit_u64(rx_conf->offloads);
+	rte_trace_point_emit_ptr(rx_seg);
+	rte_trace_point_emit_u16(n_seg);
+	rte_trace_point_emit_int(rc);
+)
+
+RTE_TRACE_POINT(
 	rte_ethdev_trace_txq_setup,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, const struct rte_eth_txconf *tx_conf),
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f8a0945..848438b 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -195,6 +195,7 @@ EXPERIMENTAL {
 	rte_flow_get_aged_flows;
 
 	# Marked as experimental in 20.11
+	rte_eth_rxseg_queue_setup;
 	rte_tm_capabilities_get;
 	rte_tm_get_number_of_leaf_nodes;
 	rte_tm_hierarchy_commit;
@@ -232,6 +233,8 @@ EXPERIMENTAL {
 	rte_eth_fec_get_capability;
 	rte_eth_fec_get;
 	rte_eth_fec_set;
+	__rte_ethdev_trace_rxq_seg_setup;
+
 };
 
 INTERNAL {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 2/9] app/testpmd: add multiple pools per core creation
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 1/9] " Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 16e3c3b..0a1a178 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 273fb1a..a585cf0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2907,7 +2907,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index d4be694..5f501f6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -690,7 +690,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3352,9 +3352,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 15ce8c1..4db4987 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -892,12 +894,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c..7e6ef80 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], j);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, i);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2909,13 +2916,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2959,7 +2966,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9a29d7a..b42d710 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -605,17 +614,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index ec085c2..1eb0a10 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 3/9] app/testpmd: add buffer split offload configuration
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 1/9] " Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 2/9] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a585cf0..fa71039 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -883,16 +883,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18417,7 +18417,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18497,8 +18498,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18547,7 +18548,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18603,8 +18604,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 5f501f6..7126d91 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1092,6 +1092,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 4/9] app/testpmd: add rxpkts commands and parameters
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 3/9] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set txpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index fa71039..d8dba54 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3889,6 +3895,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7517,6 +7569,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7529,12 +7583,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19807,6 +19861,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7126d91..24e9a7e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3300,6 +3300,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3344,10 +3388,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs_is_invalid(nb_segs))
 		return;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 4db4987..e4e3635 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -662,6 +663,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1272,6 +1274,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 7e6ef80..f88c1e2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index b42d710..8e5ba6a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -816,7 +823,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1eb0a10..463b76c 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 795c739..ff88762 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -774,6 +776,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 5/9] app/testpmd: add extended Rx queue setup
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 4/9] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` Viacheslav Ovsiienko
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 38 ++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  6 ++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d8dba54..cf99f66 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2921,12 +2921,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f88c1e2..8cc265e 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2412,6 +2412,40 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0)
+		return rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					      nb_rx_desc, socket_id,
+					      rx_conf, mp);
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	return rte_eth_rxseg_queue_setup(port_id, rx_queue_id,
+					 nb_rx_desc, socket_id, rx_conf,
+					 rx_seg, rx_pkt_nb_segs);
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2520,7 +2554,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2536,7 +2570,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8e5ba6a..5cef419 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -872,6 +872,12 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       const struct rte_eth_rxconf *rx_conf,
+	       struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 6/9] net/mlx5: add extended Rx queue setup routine
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 5/9] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
@ 2020-10-12 20:09   ` Viacheslav Ovsiienko
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:09 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The routine to provide Rx queue setup with specifying
extended receiving buffer description is added.
It allows application to specify desired segment
lengths, data position offsets in the buffer
and dedicated memory pool for each segment.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  2 +
 drivers/net/mlx5/mlx5.h          |  3 ++
 drivers/net/mlx5/mlx5_rxq.c      | 91 +++++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h     | 10 ++++-
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 487714f..0e85489 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2495,6 +2495,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rxseg_queue_setup = mlx5_rxseg_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
@@ -2578,6 +2579,7 @@
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
+	.rxseg_queue_setup = mlx5_rxseg_queue_setup,
 	.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
 	.tx_queue_setup = mlx5_tx_queue_setup,
 	.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 87d3c15..bfc0812 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,6 +162,9 @@ struct mlx5_stats_ctrl {
 /* Maximal size of aggregated LRO packet. */
 #define MLX5_MAX_LRO_SIZE (UINT8_MAX * MLX5_LRO_SEG_CHUNK_SIZE)
 
+/* Maximal number of segments to split. */
+#define MLX5_MAX_RXQ_NSEG (1u << MLX5_MAX_LOG_RQ_SEGS)
+
 /* LRO configurations structure. */
 struct mlx5_lro_config {
 	uint32_t supported:1; /* Whether LRO is supported. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f1d8373..42818d8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -390,6 +390,7 @@
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_config *config = &priv->config;
 	uint64_t offloads = (DEV_RX_OFFLOAD_SCATTER |
+			     RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT |
 			     DEV_RX_OFFLOAD_TIMESTAMP |
 			     DEV_RX_OFFLOAD_JUMBO_FRAME |
 			     DEV_RX_OFFLOAD_RSS_HASH);
@@ -715,16 +716,20 @@
  *   NUMA socket on which memory must be allocated.
  * @param[in] conf
  *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
+ * @param rx_seg
+ *   Pointer the array of segment descriptions, each element
+ *   describes the memory pool, maximal data length, initial
+ *   data offset from the beginning of data buffer in mbuf
+ * @param n_seg
+ *   Number of elements in the segment descriptions array
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
+mlx5_rxseg_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		       unsigned int socket, const struct rte_eth_rxconf *conf,
+		       const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_data *rxq = (*priv->rxqs)[idx];
@@ -732,10 +737,43 @@
 		container_of(rxq, struct mlx5_rxq_ctrl, rxq);
 	int res;
 
+	if (!n_seg || !rx_seg) {
+		DRV_LOG(ERR, "port %u queue index %u invalid "
+			      "split description",
+			      dev->data->port_id, idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (n_seg > 1) {
+		uint64_t offloads = conf->offloads |
+				    dev->data->dev_conf.rxmode.offloads;
+
+		if (!(offloads & DEV_RX_OFFLOAD_SCATTER)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "configuration requires scattering",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (!(offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+			DRV_LOG(ERR, "port %u queue index %u split "
+				     "offload not configured",
+				     dev->data->port_id, idx);
+			rte_errno = ENOSPC;
+			return -rte_errno;
+		}
+		if (n_seg > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR, "port %u queue index %u too many "
+				     "segments %u to split",
+				     dev->data->port_id, idx, n_seg);
+			rte_errno = EOVERFLOW;
+			return -rte_errno;
+		}
+	}
 	res = mlx5_rx_queue_pre_setup(dev, idx, &desc);
 	if (res)
 		return res;
-	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, mp);
+	rxq_ctrl = mlx5_rxq_new(dev, idx, desc, socket, conf, rx_seg, n_seg);
 	if (!rxq_ctrl) {
 		DRV_LOG(ERR, "port %u unable to allocate queue index %u",
 			dev->data->port_id, idx);
@@ -756,6 +794,39 @@
  *   RX queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg = {
+		.mp = mp,
+		/*
+		 * All other fields are zeroed, zero segment length
+		 * means the pool buffer size should be used by PMD.
+		 */
+	};
+	return mlx5_rxseg_queue_setup(dev, idx, desc, socket, conf, &rx_seg, 1);
+}
+
+/**
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
  * @param hairpin_conf
  *   Hairpin configuration parameters.
  *
@@ -1328,11 +1399,11 @@
 struct mlx5_rxq_ctrl *
 mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	     unsigned int socket, const struct rte_eth_rxconf *conf,
-	     struct rte_mempool *mp)
+	     const struct rte_eth_rxseg *rx_seg, uint16_t n_seg)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *tmpl;
-	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
+	unsigned int mb_len = rte_pktmbuf_data_room_size(rx_seg[0].mp);
 	unsigned int mprq_stride_nums;
 	unsigned int mprq_stride_size;
 	unsigned int mprq_stride_cap;
@@ -1346,7 +1417,7 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1531,7 +1602,7 @@ struct mlx5_rxq_ctrl *
 		(!!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS));
 	tmpl->rxq.port_id = dev->data->port_id;
 	tmpl->priv = priv;
-	tmpl->rxq.mp = mp;
+	tmpl->rxq.mp = rx_seg[0].mp;
 	tmpl->rxq.elts_n = log2above(desc);
 	tmpl->rxq.rq_repl_thresh =
 		MLX5_VPMD_RXQ_RPLNSH_THRESH(1 << tmpl->rxq.elts_n);
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 674296e..f103a30 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -150,6 +150,9 @@ struct mlx5_rxq_data {
 	rte_spinlock_t *uar_lock_cq;
 	/* CQ (UAR) access lock required for 32bit implementations */
 #endif
+	struct rte_eth_rxseg rxseg[MLX5_MAX_RXQ_NSEG];
+	/* Buffer split segment descriptions - sizes, offsets, pools. */
+	uint32_t rxseg_n; /* Number of split segment descriptions. */
 	uint32_t tunnel; /* Tunnel information. */
 	uint64_t flow_meta_mask;
 	int32_t flow_meta_offset;
@@ -304,6 +307,10 @@ struct mlx5_txq_ctrl {
 int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			unsigned int socket, const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
+int mlx5_rxseg_queue_setup
+	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+	 unsigned int socket, const struct rte_eth_rxconf *conf,
+	 const struct rte_eth_rxseg *rx_seg, uint16_t n_seg);
 int mlx5_rx_hairpin_queue_setup
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
@@ -316,7 +323,8 @@ int mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 struct mlx5_rxq_ctrl *mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
 				   uint16_t desc, unsigned int socket,
 				   const struct rte_eth_rxconf *conf,
-				   struct rte_mempool *mp);
+				   const struct rte_eth_rxseg *rx_seg,
+				   uint16_t n_seg);
 struct mlx5_rxq_ctrl *mlx5_rxq_hairpin_new
 	(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	 const struct rte_eth_hairpin_conf *hairpin_conf);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 7/9] net/mlx5: configure Rx queue to support split
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2020-10-12 20:09   ` [dpdk-dev] [PATCH v4 6/9] net/mlx5: add extended Rx queue setup routine Viacheslav Ovsiienko
@ 2020-10-12 20:10   ` Viacheslav Ovsiienko
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:10 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The scatter-gather elements should be configured
accordingly to support the buffer split feature.
The application provides the desired settings for
the segments at the beginning of the packets and
PMD pads the buffer chain (if needed) with attributes
of last specified segment to accommodate the packet
of maximal length.

There are some limitations are implied. The MPRQ
feature should be disengaged if split is requested,
due to MPRQ neither supports pushing data to the
dedicated pools nor follows the flexible buffer sizes.
The vectorized rx_burst routines does not support
the scattering (these ones are extremely simplified
and work over the single segment only) and can't
handle split as well.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c | 94 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 42818d8..4ec4677 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1417,7 +1417,8 @@ struct mlx5_rxq_ctrl *
 	uint64_t offloads = conf->offloads |
 			   dev->data->dev_conf.rxmode.offloads;
 	unsigned int lro_on_queue = !!(offloads & DEV_RX_OFFLOAD_TCP_LRO);
-	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1;
+	const int mprq_en = mlx5_check_mprq_support(dev) > 0 && n_seg == 1 &&
+			    !rx_seg[0].offset && !rx_seg[0].length;
 	unsigned int max_rx_pkt_len = lro_on_queue ?
 			dev->data->dev_conf.rxmode.max_lro_pkt_size :
 			dev->data->dev_conf.rxmode.max_rx_pkt_len;
@@ -1425,22 +1426,87 @@ struct mlx5_rxq_ctrl *
 							RTE_PKTMBUF_HEADROOM;
 	unsigned int max_lro_size = 0;
 	unsigned int first_mb_free_size = mb_len - RTE_PKTMBUF_HEADROOM;
+	const struct rte_eth_rxseg *qs_seg = rx_seg;
+	unsigned int tail_len;
 
-	if (non_scatter_min_mbuf_size > mb_len && !(offloads &
-						    DEV_RX_OFFLOAD_SCATTER)) {
+	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
+			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
+	if (!tmpl) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_ASSERT(n_seg && n_seg <= MLX5_MAX_RXQ_NSEG);
+	/*
+	 * Build the array of actual buffer offsets and lengths.
+	 * Pad with the buffers from the last memory pool if
+	 * needed to handle max size packets, replace zero length
+	 * with the buffer length from the pool.
+	 */
+	tail_len = max_rx_pkt_len;
+	do {
+		struct rte_eth_rxseg *hw_seg =
+					&tmpl->rxq.rxseg[tmpl->rxq.rxseg_n];
+		uint32_t buf_len = rte_pktmbuf_data_room_size(qs_seg->mp);
+		uint32_t offset, seg_len;
+
+		/*
+		 * For the buffers beyond descriptions offset is zero,
+		 * the first buffer contains head room.
+		 */
+		offset = (tmpl->rxq.rxseg_n >= n_seg ? 0 : qs_seg->offset) +
+			 (tmpl->rxq.rxseg_n ? 0 : RTE_PKTMBUF_HEADROOM);
+		/*
+		 * For the buffers beyond descriptions the length is
+		 * pool buffer length, zero lengths are replaced with
+		 * pool buffer length either.
+		 */
+		seg_len = tmpl->rxq.rxseg_n >= n_seg ? buf_len :
+			  qs_seg->length ? qs_seg->length : (buf_len - offset);
+		/* Check is done in long int, now overflows. */
+		if (buf_len < seg_len + offset) {
+			DRV_LOG(ERR, "port %u Rx queue %u: Split offset/length "
+				     "%u/%u can't be satisfied",
+				     dev->data->port_id, idx,
+				     qs_seg->length, qs_seg->offset);
+			rte_errno = EINVAL;
+			goto error;
+		}
+		if (seg_len > tail_len)
+			seg_len = buf_len - offset;
+		if (++tmpl->rxq.rxseg_n > MLX5_MAX_RXQ_NSEG) {
+			DRV_LOG(ERR,
+				"port %u too many SGEs (%u) needed to handle"
+				" requested maximum packet size %u, the maximum"
+				" supported are %u", dev->data->port_id,
+				tmpl->rxq.rxseg_n, max_rx_pkt_len,
+				MLX5_MAX_RXQ_NSEG);
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		/* Build the actual scattering element in the queue object. */
+		hw_seg->mp = qs_seg->mp;
+		MLX5_ASSERT(offset <= UINT16_MAX);
+		MLX5_ASSERT(seg_len <= UINT16_MAX);
+		hw_seg->offset = (uint16_t)offset;
+		hw_seg->length = (uint16_t)seg_len;
+		/*
+		 * Advance the segment descriptor, the padding is the based
+		 * on the attributes of the last descriptor.
+		 */
+		if (tmpl->rxq.rxseg_n < n_seg)
+			qs_seg++;
+		tail_len -= RTE_MIN(tail_len, seg_len);
+	} while (tail_len || !rte_is_power_of_2(tmpl->rxq.rxseg_n));
+	MLX5_ASSERT(tmpl->rxq.rxseg_n &&
+		    tmpl->rxq.rxseg_n <= MLX5_MAX_RXQ_NSEG);
+	if (tmpl->rxq.rxseg_n > 1 && !(offloads & DEV_RX_OFFLOAD_SCATTER)) {
 		DRV_LOG(ERR, "port %u Rx queue %u: Scatter offload is not"
 			" configured and no enough mbuf space(%u) to contain "
 			"the maximum RX packet length(%u) with head-room(%u)",
 			dev->data->port_id, idx, mb_len, max_rx_pkt_len,
 			RTE_PKTMBUF_HEADROOM);
 		rte_errno = ENOSPC;
-		return NULL;
-	}
-	tmpl = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, sizeof(*tmpl) +
-			   desc_n * sizeof(struct rte_mbuf *), 0, socket);
-	if (!tmpl) {
-		rte_errno = ENOMEM;
-		return NULL;
+		goto error;
 	}
 	tmpl->type = MLX5_RXQ_TYPE_STANDARD;
 	if (mlx5_mr_btree_init(&tmpl->rxq.mr_ctrl.cache_bh,
@@ -1467,7 +1533,7 @@ struct mlx5_rxq_ctrl *
 	 *  - The number of descs is more than the number of strides.
 	 *  - max_rx_pkt_len plus overhead is less than the max size
 	 *    of a stride or mprq_stride_size is specified by a user.
-	 *    Need to nake sure that there are enough stides to encap
+	 *    Need to make sure that there are enough stides to encap
 	 *    the maximum packet size in case mprq_stride_size is set.
 	 *  Otherwise, enable Rx scatter if necessary.
 	 */
@@ -1497,11 +1563,11 @@ struct mlx5_rxq_ctrl *
 			" strd_num_n = %u, strd_sz_n = %u",
 			dev->data->port_id, idx,
 			tmpl->rxq.strd_num_n, tmpl->rxq.strd_sz_n);
-	} else if (max_rx_pkt_len <= first_mb_free_size) {
+	} else if (tmpl->rxq.rxseg_n == 1) {
+		MLX5_ASSERT(max_rx_pkt_len <= first_mb_free_size);
 		tmpl->rxq.sges_n = 0;
 		max_lro_size = max_rx_pkt_len;
 	} else if (offloads & DEV_RX_OFFLOAD_SCATTER) {
-		unsigned int size = non_scatter_min_mbuf_size;
 		unsigned int sges_n;
 
 		if (lro_on_queue && first_mb_free_size <
@@ -1516,7 +1582,7 @@ struct mlx5_rxq_ctrl *
 		 * Determine the number of SGEs needed for a full packet
 		 * and round it to the next power of two.
 		 */
-		sges_n = log2above((size / mb_len) + !!(size % mb_len));
+		sges_n = log2above(tmpl->rxq.rxseg_n);
 		if (sges_n > MLX5_MAX_LOG_RQ_SEGS) {
 			DRV_LOG(ERR,
 				"port %u too many SGEs (%u) needed to handle"
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 8/9] net/mlx5: register multiple pool for Rx queue
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 7/9] net/mlx5: configure Rx queue to support split Viacheslav Ovsiienko
@ 2020-10-12 20:10   ` Viacheslav Ovsiienko
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 9/9] net/mlx5: update Rx datapath to support split Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:10 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The split feature for receiving packets was added to the mlx5
PMD, now Rx queue can receive the data to the buffers belonging
to the different pools and the memory of all the involved pool
must be registered for DMA operations in order to allow hardware
to store the data.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_mr.c      |  3 +++
 drivers/net/mlx5/mlx5_trigger.c | 20 ++++++++++++--------
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index dbcf0aa..c308ecc 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -536,6 +536,9 @@ struct mr_update_mp_data {
 		.ret = 0,
 	};
 
+	DRV_LOG(DEBUG, "Port %u Rx queue registering mp %s "
+		       "having %u chunks.", dev->data->port_id,
+		       mp->name, mp->nb_mem_chunks);
 	rte_mempool_mem_iter(mp, mlx5_mr_update_mp_cb, &data);
 	if (data.ret < 0 && rte_errno == ENXIO) {
 		/* Mempool may have externally allocated memory. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e72e5fb..643e10f 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -145,18 +145,22 @@
 		dev->data->port_id, priv->sh->device_attr.max_sge);
 	for (i = 0; i != priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_get(dev, i);
-		struct rte_mempool *mp;
 
 		if (!rxq_ctrl)
 			continue;
 		if (rxq_ctrl->type == MLX5_RXQ_TYPE_STANDARD) {
-			/* Pre-register Rx mempool. */
-			mp = mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq) ?
-			     rxq_ctrl->rxq.mprq_mp : rxq_ctrl->rxq.mp;
-			DRV_LOG(DEBUG, "Port %u Rx queue %u registering mp %s"
-				" having %u chunks.", dev->data->port_id,
-				rxq_ctrl->rxq.idx, mp->name, mp->nb_mem_chunks);
-			mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl, mp);
+			/* Pre-register Rx mempools. */
+			if (mlx5_rxq_mprq_enabled(&rxq_ctrl->rxq)) {
+				mlx5_mr_update_mp(dev, &rxq_ctrl->rxq.mr_ctrl,
+						  rxq_ctrl->rxq.mprq_mp);
+			} else {
+				uint32_t s;
+
+				for (s = 0; s < rxq_ctrl->rxq.rxseg_n; s++)
+					mlx5_mr_update_mp
+						(dev, &rxq_ctrl->rxq.mr_ctrl,
+						rxq_ctrl->rxq.rxseg[s].mp);
+			}
 			ret = rxq_alloc_elts(rxq_ctrl);
 			if (ret)
 				goto error;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v4 9/9] net/mlx5: update Rx datapath to support split
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2020-10-12 20:10   ` [dpdk-dev] [PATCH v4 8/9] net/mlx5: register multiple pool for Rx queue Viacheslav Ovsiienko
@ 2020-10-12 20:10   ` Viacheslav Ovsiienko
  8 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-12 20:10 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Only the regular rx_burst routine is updated to support split,
because the vectorized ones does not support scatter and MPRQ
does not support split at all.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rxq.c  | 11 +++++------
 drivers/net/mlx5/mlx5_rxtx.c |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4ec4677..2ebb265 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -210,9 +210,10 @@
 
 	/* Iterate on segments. */
 	for (i = 0; (i != elts_n); ++i) {
+		struct rte_eth_rxseg *seg = &rxq_ctrl->rxq.rxseg[i % sges_n];
 		struct rte_mbuf *buf;
 
-		buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
+		buf = rte_pktmbuf_alloc(seg->mp);
 		if (buf == NULL) {
 			DRV_LOG(ERR, "port %u empty mbuf pool",
 				PORT_ID(rxq_ctrl->priv));
@@ -225,12 +226,10 @@
 		MLX5_ASSERT(rte_pktmbuf_data_len(buf) == 0);
 		MLX5_ASSERT(rte_pktmbuf_pkt_len(buf) == 0);
 		MLX5_ASSERT(!buf->next);
-		/* Only the first segment keeps headroom. */
-		if (i % sges_n)
-			SET_DATA_OFF(buf, 0);
+		SET_DATA_OFF(buf, seg->offset);
 		PORT(buf) = rxq_ctrl->rxq.port_id;
-		DATA_LEN(buf) = rte_pktmbuf_tailroom(buf);
-		PKT_LEN(buf) = DATA_LEN(buf);
+		DATA_LEN(buf) = seg->length;
+		PKT_LEN(buf) = seg->length;
 		NB_SEGS(buf) = 1;
 		(*rxq_ctrl->rxq.elts)[i] = buf;
 	}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b530ff4..dd84249 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1334,7 +1334,8 @@ enum mlx5_txcmp_code {
 		rte_prefetch0(seg);
 		rte_prefetch0(cqe);
 		rte_prefetch0(wqe);
-		rep = rte_mbuf_raw_alloc(rxq->mp);
+		/* Allocate the buf from the same pool. */
+		rep = rte_mbuf_raw_alloc(seg->pool);
 		if (unlikely(rep == NULL)) {
 			++rxq->stats.rx_nombuf;
 			if (!pkt) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
  2020-10-12 17:11         ` Andrew Rybchenko
@ 2020-10-12 20:22           ` Slava Ovsiienko
  0 siblings, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-12 20:22 UTC (permalink / raw)
  To: Andrew Rybchenko, NBU-Contact-Thomas Monjalon
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand

Hi, Andrew

You are right - the code duplication of rte_eth_rx_queue_setup() code was large
and it did not look well indeed.

I've updated the code, now rte_eth_rx_queue_setup() and rte_eth_rxseg_queue_setup()
share the underlying internal routine __rte_eth_rx_queue_setup().

Of course, there is some refactoring, but it is merely straightforward, and I hope you
will find it acceptable, please see the v4 of the patchset.

As I said, I do not see the decision-making con or pro for the case.
Anyway, if we decide to move the segment descriptions to the config struct - there is
just small step remaining over existing code to implement you that approach.

With best regards, Slava

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, October 12, 2020 20:11
> To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Slava
> Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org; ferruh.yigit@intel.com;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v3 1/9] ethdev: introduce Rx buffer split
> 
> On 10/12/20 8:03 PM, Thomas Monjalon wrote:
> > 12/10/2020 18:38, Andrew Rybchenko:
> >> On 10/12/20 7:19 PM, Viacheslav Ovsiienko wrote:
> >>>  int
> >>> +rte_eth_rxseg_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
> >>> +			  uint16_t nb_rx_desc, unsigned int socket_id,
> >>> +			  const struct rte_eth_rxconf *rx_conf,
> >>> +			  const struct rte_eth_rxseg *rx_seg, uint16_t n_seg) {
> >>> +	int ret;
> >>> +	uint16_t seg_idx;
> >>> +	uint32_t mbp_buf_size;
> >>
> >> <start-of-dup>
> >>
> >>> +	struct rte_eth_dev *dev;
> >>> +	struct rte_eth_dev_info dev_info;
> >>> +	struct rte_eth_rxconf local_conf;
> >>> +	void **rxq;
> >>> +
> >>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> >>> +
> >>> +	dev = &rte_eth_devices[port_id];
> >>> +	if (rx_queue_id >= dev->data->nb_rx_queues) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> rx_queue_id);
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> <end-of-dup>
> >>
> >>> +
> >>> +	if (rx_seg == NULL) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (n_seg == 0) {
> >>> +		RTE_ETHDEV_LOG(ERR, "Invalid zero description
> number\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxseg_queue_setup,
> >>> +-ENOTSUP);
> >>> +
> >>
> >> <start-of-dup>
> >>
> >>> +	/*
> >>> +	 * Check the size of the mbuf data buffer.
> >>> +	 * This value must be provided in the private data of the memory
> pool.
> >>> +	 * First check that the memory pool has a valid private data.
> >>> +	 */
> >>> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> >>> +	if (ret != 0)
> >>> +		return ret;
> >>
> >> <end-of-dup>
> >>
> >>> +
> >>> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> >>> +		struct rte_mempool *mp = rx_seg[seg_idx].mp;
> >>> +
> >>> +		if (mp->private_data_size <
> >>> +				sizeof(struct rte_pktmbuf_pool_private)) {
> >>> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d <
> %d\n",
> >>> +				mp->name, (int)mp->private_data_size,
> >>> +				(int)sizeof(struct
> rte_pktmbuf_pool_private));
> >>> +			return -ENOSPC;
> >>> +		}
> >>> +
> >>> +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> >>> +		if (mbp_buf_size < rx_seg[seg_idx].length +
> >>> +				   rx_seg[seg_idx].offset +
> >>> +				   (seg_idx ? 0 :
> >>> +				    (uint32_t)RTE_PKTMBUF_HEADROOM)) {
> >>> +			RTE_ETHDEV_LOG(ERR,
> >>> +				"%s mbuf_data_room_size %d < %d"
> >>> +				" (segment length=%d + segment
> offset=%d)\n",
> >>> +				mp->name, (int)mbp_buf_size,
> >>> +				(int)(rx_seg[seg_idx].length +
> >>> +				      rx_seg[seg_idx].offset),
> >>> +				(int)rx_seg[seg_idx].length,
> >>> +				(int)rx_seg[seg_idx].offset);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +
> >>
> >> <start-of-huge-dup>
> >>
> >>> +	/* Use default specified by driver, if nb_rx_desc is zero */
> >>> +	if (nb_rx_desc == 0) {
> >>> +		nb_rx_desc = dev_info.default_rxportconf.ring_size;
> >>> +		/* If driver default is also zero, fall back on EAL default */
> >>> +		if (nb_rx_desc == 0)
> >>> +			nb_rx_desc =
> RTE_ETH_DEV_FALLBACK_RX_RINGSIZE;
> >>> +	}
> >>> +
> >>> +	if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
> >>> +			nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
> >>> +			nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
> >>> +
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Invalid value for nb_rx_desc(=%hu), should be: "
> >>> +			"<= %hu, >= %hu, and a product of %hu\n",
> >>> +			nb_rx_desc, dev_info.rx_desc_lim.nb_max,
> >>> +			dev_info.rx_desc_lim.nb_min,
> >>> +			dev_info.rx_desc_lim.nb_align);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		!(dev_info.dev_capa &
> >>> +			RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP))
> >>> +		return -EBUSY;
> >>> +
> >>> +	if (dev->data->dev_started &&
> >>> +		(dev->data->rx_queue_state[rx_queue_id] !=
> >>> +			RTE_ETH_QUEUE_STATE_STOPPED))
> >>> +		return -EBUSY;
> >>> +
> >>> +	rxq = dev->data->rx_queues;
> >>> +	if (rxq[rx_queue_id]) {
> >>> +		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >rx_queue_release,
> >>> +					-ENOTSUP);
> >>> +		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >>> +		rxq[rx_queue_id] = NULL;
> >>> +	}
> >>> +
> >>> +	if (rx_conf == NULL)
> >>> +		rx_conf = &dev_info.default_rxconf;
> >>> +
> >>> +	local_conf = *rx_conf;
> >>> +
> >>> +	/*
> >>> +	 * If an offloading has already been enabled in
> >>> +	 * rte_eth_dev_configure(), it has been enabled on all queues,
> >>> +	 * so there is no need to enable it in this queue again.
> >>> +	 * The local_conf.offloads input to underlying PMD only carries
> >>> +	 * those offloadings which are only enabled on this queue and
> >>> +	 * not enabled on all queues.
> >>> +	 */
> >>> +	local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads;
> >>> +
> >>> +	/*
> >>> +	 * New added offloadings for this queue are those not enabled in
> >>> +	 * rte_eth_dev_configure() and they must be per-queue type.
> >>> +	 * A pure per-port offloading can't be enabled on a queue while
> >>> +	 * disabled on another queue. A pure per-port offloading can't
> >>> +	 * be enabled for any queue as new added one if it hasn't been
> >>> +	 * enabled in rte_eth_dev_configure().
> >>> +	 */
> >>> +	if ((local_conf.offloads & dev_info.rx_queue_offload_capa) !=
> >>> +	     local_conf.offloads) {
> >>> +		RTE_ETHDEV_LOG(ERR,
> >>> +			"Ethdev port_id=%d rx_queue_id=%d, new added
> offloads"
> >>> +			" 0x%"PRIx64" must be within per-queue offload"
> >>> +			" capabilities 0x%"PRIx64" in %s()\n",
> >>> +			port_id, rx_queue_id, local_conf.offloads,
> >>> +			dev_info.rx_queue_offload_capa,
> >>> +			__func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * If LRO is enabled, check that the maximum aggregated packet
> >>> +	 * size is supported by the configured device.
> >>> +	 */
> >>> +	if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) {
> >>> +		if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0)
> >>> +			dev->data->dev_conf.rxmode.max_lro_pkt_size =
> >>> +				dev->data-
> >dev_conf.rxmode.max_rx_pkt_len;
> >>> +		int ret = check_lro_pkt_size(port_id,
> >>> +				dev->data-
> >dev_conf.rxmode.max_lro_pkt_size,
> >>> +				dev->data-
> >dev_conf.rxmode.max_rx_pkt_len,
> >>> +				dev_info.max_lro_pkt_size);
> >>> +		if (ret != 0)
> >>> +			return ret;
> >>> +	}
> >>
> >> <end-of-huge-dup>
> >>
> >> IMO It is not acceptable to duplication so much code.
> >> It is simply unmaintainable.
> >>
> >> NACK
> >
> > Can it be solved by making rte_eth_rx_queue_setup() a wrapper on top
> > of this new rte_eth_rxseg_queue_setup() ?
> >
> 
> Could be, but strictly speaking it will break arguments validation order and
> error reporting in various cases.
> So, refactoring is required to keep it consistent.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
                   ` (4 preceding siblings ...)
  2020-10-12 20:09 ` [dpdk-dev] [PATCH v4 0/9] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-13 19:21 ` Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 1/6] " Viacheslav Ovsiienko
                     ` (5 more replies)
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (7 subsequent siblings)
  13 siblings, 6 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The segment descriptions are added to the rte_eth_rxconf structure:
   rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
   n_seg - number of elements in the array

If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v1: http://patches.dpdk.org/patch/79594/
v2: http://patches.dpdk.org/patch/79893/
    - add feature support to mlx5 PMD

v3: http://patches.dpdk.org/patch/80389/
    - rte_eth_rx_queue_setup_ex is renamed to rte_eth_rxseg_queue_setup
    - DEV_RX_OFFLOAD_BUFFER_SPLIT is renamed to
      RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
    - commit message update
    - documentaion provided
    - release notes update
    - minor bug fixes in testpmd related part

v4: http://patches.dpdk.org/patch/80401/
    - common part of rx_queue_setup/rxseg_queue_setup

v5: - refactored to approach of providing split configuration
      in the rte_eth_rxconf structure instead of introducing
      the new API routine
    - added support for rxoffs command to testpmd to
      provide segment offsets for complete testing of split
      configurations
    - patchset is split into two parts - PMD part will
      be presented as separate series

Viacheslav Ovsiienko (6):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add rxoffs commands and parameters
  app/testpmd: add extended Rx queue setup

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      | 151 ++++++++++++++++++++++++----
 app/test-pmd/config.c                       | 107 +++++++++++++++++++-
 app/test-pmd/parameters.c                   |  54 ++++++++--
 app/test-pmd/testpmd.c                      | 120 ++++++++++++++++------
 app/test-pmd/testpmd.h                      |  44 ++++++--
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/release_20_11.rst      |   9 ++
 doc/guides/testpmd_app_ug/run_app.rst       |  22 +++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  36 ++++++-
 lib/librte_ethdev/rte_ethdev.c              |  95 +++++++++++++----
 lib/librte_ethdev/rte_ethdev.h              |  58 ++++++++++-
 lib/librte_ethdev/rte_ethdev_version.map    |   1 +
 13 files changed, 621 insertions(+), 95 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 1/6] ethdev: introduce Rx buffer split
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` " Viacheslav Ovsiienko
  2020-10-13 22:34     ` Ferruh Yigit
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 2/6] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The segment descriptions are added to the rte_eth_rxconf structure:
   rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
   n_seg - number of elements in the array

If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new routine the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if n_seg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst             | 15 +++++
 doc/guides/rel_notes/release_20_11.rst   |  9 +++
 lib/librte_ethdev/rte_ethdev.c           | 95 ++++++++++++++++++++++++--------
 lib/librte_ethdev/rte_ethdev.h           | 58 ++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev_version.map |  1 +
 5 files changed, 155 insertions(+), 23 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..a45a9e8 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[implements] rte_eth_dev_data**: ``buffer_split``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rx_queue_setup()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index bcc0fc2..f67ec62 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -60,6 +60,12 @@ New Features
   Added the FEC API which provides functions for query FEC capabilities and
   current FEC mode from device. Also, API for configuring FEC mode is also provided.
 
+* **Introduced extended buffer description for receiving.**
+
+  Added the extended Rx queue setup routine providing the individual
+  descriptions for each Rx segment with maximal size, buffer offset and memory
+  pool to allocate data buffers from.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
@@ -253,6 +259,9 @@ API Changes
   As the data of ``uint8_t`` will be truncated when queue number under
   a TC is greater than 256.
 
+* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
+  to provide extended description of the receiving buffer.
+
 * vhost: Moved vDPA APIs from experimental to stable.
 
 * rawdev: Added a structure size parameter to the functions
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 892c246..7b64a6e 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
 #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_RX_OFFLOAD_##_name, #_name }
 
+#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
+	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
+
 static const struct {
 	uint64_t offload;
 	const char *name;
@@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
+#undef RTE_ETH_RX_OFFLOAD_BIT2STR
 
 #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_TX_OFFLOAD_##_name, #_name }
@@ -1770,10 +1775,14 @@ struct rte_eth_dev *
 		       struct rte_mempool *mp)
 {
 	int ret;
+	uint16_t seg_idx;
 	uint32_t mbp_buf_size;
 	struct rte_eth_dev *dev;
 	struct rte_eth_dev_info dev_info;
 	struct rte_eth_rxconf local_conf;
+	const struct rte_eth_rxseg *rx_seg;
+	struct rte_eth_rxseg seg_single = { .mp = mp};
+	uint16_t n_seg;
 	void **rxq;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -1784,13 +1793,32 @@ struct rte_eth_dev *
 		return -EINVAL;
 	}
 
-	if (mp == NULL) {
-		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
-		return -EINVAL;
-	}
-
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
 
+	rx_seg = rx_conf->rx_seg;
+	n_seg = rx_conf->rx_nseg;
+	if (rx_seg == NULL) {
+		/* Exclude ambiguities about segment descrtiptions. */
+		if (n_seg) {
+			RTE_ETHDEV_LOG(ERR,
+				       "Non empty array with null pointer\n");
+			return -EINVAL;
+		}
+		rx_seg = &seg_single;
+		n_seg = 1;
+	} else {
+		if (n_seg == 0) {
+			RTE_ETHDEV_LOG(ERR,
+				       "Invalid zero descriptions number\n");
+			return -EINVAL;
+		}
+		if (mp) {
+			RTE_ETHDEV_LOG(ERR,
+				       "Memory pool duplicated definition\n");
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Check the size of the mbuf data buffer.
 	 * This value must be provided in the private data of the memory pool.
@@ -1800,23 +1828,48 @@ struct rte_eth_dev *
 	if (ret != 0)
 		return ret;
 
-	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
-		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
-			mp->name, (int)mp->private_data_size,
-			(int)sizeof(struct rte_pktmbuf_pool_private));
-		return -ENOSPC;
-	}
-	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
+		uint32_t length = rx_seg[seg_idx].length;
+		uint32_t offset = rx_seg[seg_idx].offset;
+		uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
 
-	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
-		RTE_ETHDEV_LOG(ERR,
-			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
-			mp->name, (int)mbp_buf_size,
-			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
-			(int)RTE_PKTMBUF_HEADROOM,
-			(int)dev_info.min_rx_bufsize);
-		return -EINVAL;
+		if (mpl == NULL) {
+			RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
+			return -EINVAL;
+		}
+
+		if (mpl->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
+				mpl->name, (int)mpl->private_data_size,
+				(int)sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+
+		mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
+		length = length ? length : (mbp_buf_size - head_room);
+		if (mbp_buf_size < length + offset + head_room) {
+			RTE_ETHDEV_LOG(ERR,
+				"%s mbuf_data_room_size %u < %u"
+				" (segment length=%u + segment offset=%u)\n",
+				mpl->name, mbp_buf_size,
+				length + offset, length, offset);
+			return -EINVAL;
+		}
 	}
+	/* Check the minimal buffer size for the single segment only. */
+	if (mp && (mbp_buf_size < dev_info.min_rx_bufsize +
+				  RTE_PKTMBUF_HEADROOM)) {
+		RTE_ETHDEV_LOG(ERR,
+			       "%s mbuf_data_room_size %u < %u "
+			       "(RTE_PKTMBUF_HEADROOM=%u + "
+			       "min_rx_bufsize(dev)=%u)\n",
+			       mp->name, mbp_buf_size,
+			       RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize,
+			       RTE_PKTMBUF_HEADROOM, dev_info.min_rx_bufsize);
+			return -EINVAL;
+		}
 
 	/* Use default specified by driver, if nb_rx_desc is zero */
 	if (nb_rx_desc == 0) {
@@ -1914,8 +1967,6 @@ struct rte_eth_dev *
 			dev->data->min_rx_buf_size = mbp_buf_size;
 	}
 
-	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc, mp,
-		rx_conf, ret);
 	return eth_err(port_id, ret);
 }
 
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 5bcfbb8..9cf0a03 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
+	uint16_t length; /**< Segment data length, configures split point. */
+	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
+	uint32_t reserved; /**< Reserved field */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -977,13 +987,23 @@ struct rte_eth_rxconf {
 	uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
 	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
 	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
+	/**
+	 * The pointer to the array of segment descriptions, each element
+	 * describes the memory pool, maximal segment data length, initial
+	 * data offset from the beginning of data buffer in mbuf. This allow
+	 * to specify the dedicated properties for each segment in the receiving
+	 * buffer - pool, buffer offset, maximal segment size. The number of
+	 * segment descriptions in the array is specified by the rx_nseg
+	 * field.
+	 */
+	struct rte_eth_rxseg *rx_seg;
 	/**
 	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
 	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
 	 * fields on rte_eth_dev_info structure are allowed to be set.
 	 */
 	uint64_t offloads;
-
 	uint64_t reserved_64s[2]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
@@ -1260,6 +1280,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2027,6 +2048,41 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
  *   No need to repeat any bit in rx_conf->offloads which has already been
  *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
  *   at port level can't be disabled at queue level.
+ *   The configuration structure also contains the pointer to the array
+ *   of the receiving buffer segment descriptions, each element describes
+ *   the memory pool, maximal segment data length, initial data offset from
+ *   the beginning of data buffer in mbuf. This allow to specify the dedicated
+ *   properties for each segment in the receiving buffer - pool, buffer
+ *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
+ *   flag is configured the PMD will split the received packets into multiple
+ *   segments according to the specification in the description array:
+ *   - the first network buffer will be allocated from the memory pool,
+ *     specified in the first segment description element, the second
+ *     network buffer - from the pool in the second segment description
+ *     element and so on. If there is no enough elements to describe
+ *     the buffer for entire packet of maximal length the pool from the last
+ *     valid element will be used to allocate the buffers from for the rest
+ *     of segments.
+ *   - the offsets from the segment description elements will provide the
+ *     data offset from the buffer beginning except the first mbuf - for this
+ *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
+ *     offset from the buffer beginning. If there is no enough elements
+ *     to describe the buffer for entire packet of maximal length the offsets
+ *     for the rest of segment will be supposed to be zero.
+ *   - the data length being received to each segment is limited by the
+ *     length specified in the segment description element. The data receiving
+ *     starts with filling up the first mbuf data buffer, if the specified
+ *     maximal segment length is reached and there are data remaining
+ *     (packet is longer than buffer in the first mbuf) the following data
+ *     will be pushed to the next segment up to its own length. If the first
+ *     two segments is not enough to store all the packet data the next
+ *     (third) segment will be engaged and so on. If the length in the segment
+ *     description element is zero the actual buffer size will be deduced
+ *     from the appropriate memory pool properties. If there is no enough
+ *     elements to describe the buffer for entire packet of maximal length
+ *     the buffer size will be deduced from the pool of the last valid
+ *     element for the all remaining segments.
+ *
  * @param mb_pool
  *   The pointer to the memory pool from which to allocate *rte_mbuf* network
  *   memory buffers to populate each descriptor of the receive ring.
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f8a0945..25f7cee 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -232,6 +232,7 @@ EXPERIMENTAL {
 	rte_eth_fec_get_capability;
 	rte_eth_fec_get;
 	rte_eth_fec_set;
+
 };
 
 INTERNAL {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 2/6] app/testpmd: add multiple pools per core creation
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 1/6] " Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 3/6] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 16e3c3b..0a1a178 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 273fb1a..a585cf0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2907,7 +2907,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index d4be694..5f501f6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -690,7 +690,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3352,9 +3352,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 15ce8c1..4db4987 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -892,12 +894,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c..7e6ef80 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], j);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, i);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2909,13 +2916,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2959,7 +2966,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9a29d7a..b42d710 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -605,17 +614,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index ec085c2..1eb0a10 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 3/6] app/testpmd: add buffer split offload configuration
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 1/6] " Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 2/6] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 4/6] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a585cf0..fa71039 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -883,16 +883,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18417,7 +18417,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18497,8 +18498,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18547,7 +18548,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18603,8 +18604,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 5f501f6..7126d91 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1092,6 +1092,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 4/6] app/testpmd: add rxpkts commands and parameters
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 3/6] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 5/6] app/testpmd: add rxoffs " Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 6/6] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set rxpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index fa71039..d8dba54 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3889,6 +3895,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7517,6 +7569,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7529,12 +7583,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19807,6 +19861,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7126d91..24e9a7e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3300,6 +3300,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3344,10 +3388,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs_is_invalid(nb_segs))
 		return;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 4db4987..e4e3635 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -662,6 +663,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1272,6 +1274,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 7e6ef80..f88c1e2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index b42d710..8e5ba6a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -816,7 +823,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1eb0a10..463b76c 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 795c739..ff88762 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -774,6 +776,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 5/6] app/testpmd: add rxoffs commands and parameters
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 4/6] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` " Viacheslav Ovsiienko
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 6/6] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxoffs=X[,Y]

Sets the offsets of packet segments from the beginning of the
receiving buffer if split feature is engaged. Affects only the
queues configured with split offloads (currently BUFFER_SPLIT
is supported only).

Add interactive mode command, providing the same:

testpmd> set rxoffs (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 44 +++++++++++++++++++++
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  2 +
 app/test-pmd/testpmd.h                      |  4 ++
 doc/guides/testpmd_app_ug/run_app.rst       |  6 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 17 +++++++-
 7 files changed, 145 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d8dba54..7182bba 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxoffs (x[,y]*)\n"
+			"    Set the offset of each packet segment on"
+			" receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set rxpkts (x[,y]*)\n"
 			"    Set the length of each segment to scatter"
 			" packets on receiving if split feature is engaged."
@@ -3895,6 +3901,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT OFFSETS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxoffs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxoffs;
+	cmdline_fixed_string_t seg_offsets;
+};
+
+static void
+cmd_set_rxoffs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxoffs_result *res;
+	unsigned int seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_offsets, "segment offsets",
+				  MAX_SEGS_BUFFER_SPLIT, seg_offsets, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_offsets(seg_offsets, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxoffs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxoffs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 rxoffs, "rxoffs");
+cmdline_parse_token_string_t cmd_set_rxoffs_offsets =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 seg_offsets, NULL);
+
+cmdline_parse_inst_t cmd_set_rxoffs = {
+	.f = cmd_set_rxoffs_parsed,
+	.data = NULL,
+	.help_str = "set rxoffs <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxoffs_keyword,
+		(void *)&cmd_set_rxoffs_name,
+		(void *)&cmd_set_rxoffs_offsets,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
 
 struct cmd_set_rxpkts_result {
@@ -7569,6 +7621,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxoffs"))
+		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
@@ -7583,12 +7637,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19861,6 +19915,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 24e9a7e..43b8fb6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3300,6 +3300,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_offsets(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_offs;
+	printf("Number of offsets: %u\n", n);
+	if (n) {
+		printf("Segment offsets: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_offsets[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs)
+{
+	unsigned int i;
+
+	if (nb_offs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_offs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_offs; i++) {
+		if (seg_offsets[i] >= UINT16_MAX) {
+			printf("offset[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_offsets[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_offs; i++)
+		rx_pkt_seg_offsets[i] = (uint16_t) seg_offsets[i];
+
+	rx_pkt_nb_offs = (uint8_t) nb_offs;
+}
+
+void
 show_rx_pkt_segments(void)
 {
 	uint32_t i, n;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e4e3635..2298ba5 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
@@ -663,6 +664,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
@@ -1274,6 +1276,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxoffs")) {
+				unsigned int seg_off[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_offs;
+
+				nb_offs = parse_item_list
+						(optarg, "rxpkt offsets",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_off, 0);
+				if (nb_offs > 0)
+					set_rx_pkt_offsets(seg_off, nb_offs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxoffs\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f88c1e2..580178d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -215,6 +215,8 @@ struct fwd_engine * fwd_engines[] = {
  */
 uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8e5ba6a..fc56b60 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -425,6 +425,8 @@ struct queue_stats_mappings {
  */
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -825,6 +827,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_rx_pkt_segments(void);
+void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
+void show_rx_pkt_offsets(void);
 void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 463b76c..9b0a84a 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,12 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxoffs=X[,Y]``
+
+    Set the offsets of packet segments on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+
 *   ``--rxpkts=X[,Y]``
 
     Set the length of segments to scatter packets on receiving if split
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index ff88762..c99d887 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxoffs``: Packet offsets for RX split.
+
 * ``rxpkts``: Packets to RX split configuration.
 
 * ``txpkts``: Packets to TX configuration.
@@ -776,6 +778,19 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxoffs
+~~~~~~~~~~
+
+Set the offsets of segments relating to the data buffer beginning on receiving
+if split feature is engaged. Affects only the queues configured with split
+offloads (currently BUFFER_SPLIT is supported only).
+
+   testpmd> set rxoffs (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. If the list
+of offsets is shorter than the list of segments the zero offsets will be used
+for the remaining segments.
+
 set rxpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v5 6/6] app/testpmd: add extended Rx queue setup
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 5/6] app/testpmd: add rxoffs " Viacheslav Ovsiienko
@ 2020-10-13 19:21   ` Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-13 19:21 UTC (permalink / raw)
  To: dev
  Cc: thomasm, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  5 +++++
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7182bba..204221f 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2927,12 +2927,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 580178d..4c79570 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2414,6 +2414,50 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+	int ret;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) {
+		rx_conf->rx_seg = NULL;
+		rx_conf->rx_nseg = 0;
+		ret = rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					     nb_rx_desc, socket_id,
+					     rx_conf, mp);
+		return ret;
+	}
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].offset = i < rx_pkt_nb_offs ?
+				   rx_pkt_seg_offsets[i] : 0;
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	rx_conf->rx_nseg = rx_pkt_nb_segs;
+	rx_conf->rx_seg = rx_seg;
+	ret = rte_eth_rx_queue_setup(port_id, rx_queue_id, nb_rx_desc,
+				    socket_id, rx_conf, NULL);
+	rx_conf->rx_seg = NULL;
+	rx_conf->rx_nseg = 0;
+	return ret;
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2522,7 +2566,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2538,7 +2582,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fc56b60..af654ea 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -876,6 +876,11 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-12  9:56       ` Slava Ovsiienko
  2020-10-12 15:14         ` Thomas Monjalon
@ 2020-10-13 21:59         ` Ferruh Yigit
  2020-10-14  7:17           ` Thomas Monjalon
  2020-10-14  7:37           ` Slava Ovsiienko
  1 sibling, 2 replies; 172+ messages in thread
From: Ferruh Yigit @ 2020-10-13 21:59 UTC (permalink / raw)
  To: Slava Ovsiienko, Andrew Rybchenko, dev
  Cc: Thomas Monjalon, stephen, Shahaf Shuler, olivier.matz,
	jerinjacobk, maxime.coquelin, david.marchand, Asaf Penso,
	Konstantin Ananyev

On 10/12/2020 10:56 AM, Slava Ovsiienko wrote:
> Hi, Andrew
> 
> Thank you for the comments.
> 
> We have two approaches how to specify multiple segments to split Rx packets:
> 1. update queue configuration structure
> 2. introduce new rx_queue_setup_ex() routine with extra parameters.
> 
> For [1] my only actual dislike is that we would have multiple places to specify
> the pool - in rx_queue_setup() and in the config structure. So, we should
> implement some checking (if we have offload flag set we should check
> whether mp parameter is NULL and segment descriptions array pointer/size
> is provided, if no offload flag set - we must check the description array is empty).
> 
>> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
>> about it.
> 
> Yes, it would be very nice to hear extra opinions. Do we think the providing
> of extra API function is worse than extending existing structure, introducing
> some conditional ambiguity and complicating the parameter compliance
> check?
> 

I think decision was given with the deprecation notice which already says 
``rte_eth_rxconf`` will be updated for this.

With new API, we need to create a new dev_ops too, not sure about creating a new 
dev_ops for a single PMD.

For the PMD that supports this feature will need two dev_ops that is fairly 
close to each other, as Andrew mentioned this is a duplication.

And from user perspective two setup functions with overlaps can be confusing.

+1 to having single setup function but update the config, and I can see v5 sent 
this way, I will check it.


> Now I'm updating the existing version on the patch based on rx_queue_ex()
> and then could prepare the version for configuration structure,
> it is not a problem - approaches are very similar, we just should choose
> the most relevant one.
> 
> With best regards, Slava
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
>> Sent: Monday, October 12, 2020 11:45
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>> Cc: Thomas Monjalon <thomasm@mellanox.com>;
>> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
>> <shahafs@nvidia.com>; olivier.matz@6wind.com; jerinjacobk@gmail.com;
>> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
>> <asafp@nvidia.com>
>> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
>>
>> Hi Slava,
>>
>> I'm sorry for late reply. See my notes below.
>>
>> On 10/1/20 11:54 AM, Slava Ovsiienko wrote:
>>> Hi, Andrew
>>>
>>> Thank you for the comments, please see my replies below.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> Sent: Thursday, September 17, 2020 19:55
>>>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>>>> Cc: Thomas Monjalon <thomasm@mellanox.com>;
>>>> stephen@networkplumber.org; ferruh.yigit@intel.com; Shahaf Shuler
>>>> <shahafs@nvidia.com>; olivier.matz@6wind.com;
>> jerinjacobk@gmail.com;
>>>> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf
>> Penso
>>>> <asafp@nvidia.com>
>>>> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
>>>>
>>> [snip]
>>>>>
>>>>> For example, let's suppose we configured the Rx queue with the
>>>>> following segments:
>>>>> seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM
>>>>> seg1 - pool1, len1=20B, off1=0B
>>>>> seg2 - pool2, len2=20B, off2=0B
>>>>> seg3 - pool3, len3=512B, off3=0B
>>>>>
>>>>> The packet 46 bytes long will look like the following:
>>>>> seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>>>>> seg1 - 20B long @ 0 in mbuf from pool1
>>>>> seg2 - 12B long @ 0 in mbuf from pool2
>>>>>
>>>>> The packet 1500 bytes long will look like the following:
>>>>> seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0
>>>>> seg1 - 20B @ 0 in mbuf from pool1
>>>>> seg2 - 20B @ 0 in mbuf from pool2
>>>>> seg3 - 512B @ 0 in mbuf from pool3
>>>>> seg4 - 512B @ 0 in mbuf from pool3
>>>>> seg5 - 422B @ 0 in mbuf from pool3
>>>>
>>>> The behaviour is logical, but what to do if HW can't do it, i.e. use
>>>> the last segment many times. Should it reject configuration if
>>>> provided segments are insufficient to fit MTU packet? How to report
>>>> the limitation?
>>>> (I'm still trying to convince that SCATTER and BUFFER_SPLIT should be
>>>> independent).
>>>
>>> BUFFER_SPLIT is rather the way to tune SCATTER. Currently scattering
>>> happens on unconditional mbuf data buffer boundaries (we have reserved
>>> HEAD space in the first mbuf and fill this one to the buffer end, the
>>> next mbuf buffers might be filled completely). BUFFER_SPLIT provides
>>> the way to specify the desired points to split packet, not just
>>> blindly follow buffer boundaries. There is the check inplemented in
>>> common part if each split segment fits the mbuf allocated from
>> appropriate pool.
>>> PMD should do extra check internally whether it supports the requested
>>> split settings, if not - call will be rejected.
>>>
>>
>> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
>> about it.
>>
>>> [snip]
>>>>
>>>> I dislike the idea to introduce new device operation.
>>>> rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will mean
>>>> that PMD looks at the split configuration location there.
>>>>
>>> We considered the approach of pushing split setting to the rxconf
>>> structure.
>>>
>> [https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatc
>>>
>> hes.dpdk.org%2Fpatch%2F75205%2F&amp;data=02%7C01%7Cviacheslavo%
>> 40nvidi
>>>
>> a.com%7C97a49cb62028432610ea08d86e8b3283%7C43083d15727340c1b7
>> db39efd9c
>>>
>> cc17a%7C0%7C0%7C637380891414182285&amp;sdata=liII5DHGlJAL8wEwV
>> Vika79tp
>>> 8R9faTZ0lXrlfvQGZE%3D&amp;reserved=0]
>>> But it seems there are some issues:
>>>
>>> - the split configuration description requires the variable length
>>> array (due to variations in number of segments), so rte_eth_rxconf
>>> structure would have the variable length (not nice, IMO).
>>>
>>> We could push pointers to the array of rte_eth_rxseg, but we would
>>> lost the single structure (and contiguous memory) simplicity, this
>>> approach has no advantages over the specifying the split configuration
>>> as parameters of setup_ex().
>>>
>>
>> I think it has a huge advantage to avoid extra device operation.
>>
>>> - it would introduces the ambiguity, rte_eth_rx_queue_setup()
>>> specifies the single mbuf pool as parameter. What should we do with
>>> it? Set to NULL? Treat as the first pool? I would prefer to specify
>>> all split segments in uniform fashion, i.e. as array or rte_eth_rxseg
>>> structures (and it can be easily updated with some extra segment
>>> attributes if needed). So, in my opinion, we should remove/replace the
>>> pool parameter in rx_queue_setup (by introducing new func).
>>>
>>
>> I'm trying to resolve the ambiguity as described above (see BUFFER_SPLIT vs
>> SCATTER). Use the pointer for tail segments with respect to SCATTER
>> capability.
>>
>>> - specifying the new extended setup roiutine has an advantage that we
>>> should not update any PMDs code in part of existing implementations of
>>> rte_eth_rx_queue_setup().
>>
>> It is not required since it is controlled by the new offload flags. If the offload
>> is not supported, the new field is invisible for PMD (it simply ignores).
>>
>>>
>>> If PMD supports BUFFER_SPLIT (or other related feature) it just should
>>> provide
>>> rte_eth_rx_queue_setup_ex() and check the
>> DEV_RX_OFFLOAD_BUFFER_SPLIT
>>> (or HEADER_SPLIT, or ever feature) it supports. The common code does
>>> not check the feature flags - it is on PMDs' own. In order to
>>> configure PMD to perfrom arbitrary desired Rx spliting the application
>>> should check DEV_RX_OFFLOAD_BUFFER_SPLIT in port capabilites, if found
>>> - set DEV_RX_OFFLOAD_BUFFER_SPLIT in configuration and call
>>> rte_eth_rx_queue_setup_ex().
>>> And this approach can be followed for any other split related feature.
>>>
>>> With best regards, Slava
>>>
>>
> 


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: introduce Rx buffer split
  2020-10-13 19:21   ` [dpdk-dev] [PATCH v5 1/6] " Viacheslav Ovsiienko
@ 2020-10-13 22:34     ` Ferruh Yigit
  2020-10-14 13:31       ` Olivier Matz
  2020-10-14 14:42       ` Slava Ovsiienko
  0 siblings, 2 replies; 172+ messages in thread
From: Ferruh Yigit @ 2020-10-13 22:34 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: thomasm, stephen, olivier.matz, jerinjacobk, maxime.coquelin,
	david.marchand, arybchenko

On 10/13/2020 8:21 PM, Viacheslav Ovsiienko wrote:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>      struct rte_mempool *mp; /* memory pools to allocate segment from */
>      uint16_t length; /* segment maximal data length,
> 		       	configures "split point" */
>      uint16_t offset; /* data offset from beginning
> 		       	of mbuf data buffer */
>      uint32_t reserved; /* reserved field */
> };
> 
> The segment descriptions are added to the rte_eth_rxconf structure:
>     rx_seg - pointer the array of segment descriptions, each element
>               describes the memory pool, maximal data length, initial
>               data offset from the beginning of data buffer in mbuf.
> 	     This array allows to specify the different settings for
> 	     each segment in individual fashion.
>     n_seg - number of elements in the array
> 
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
> 
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new routine the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
> 
> - the first network buffer will be allocated from the memory pool,
>    specified in the first segment description element, the second
>    network buffer - from the pool in the second segment description
>    element and so on. If there is no enough elements to describe
>    the buffer for entire packet of maximal length the pool from the
>    last valid element will be used to allocate the buffers from for the
>    rest of segments
> 
> - the offsets from the segment description elements will provide
>    the data offset from the buffer beginning except the first mbuf -
>    for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>    actual offset from the buffer beginning. If there is no enough
>    elements to describe the buffer for entire packet of maximal length
>    the offsets for the rest of segment will be supposed to be zero.
> 
> - the data length being received to each segment is limited  by the
>    length specified in the segment description element. The data
>    receiving starts with filling up the first mbuf data buffer, if the
>    specified maximal segment length is reached and there are data
>    remaining (packet is longer than buffer in the first mbuf) the
>    following data will be pushed to the next segment up to its own
>    maximal length. If the first two segments is not enough to store
>    all the packet remaining data  the next (third) segment will
>    be engaged and so on. If the length in the segment description
>    element is zero the actual buffer size will be deduced from
>    the appropriate memory pool properties. If there is no enough
>    elements to describe the buffer for entire packet of maximal
>    length the buffer size will be deduced from the pool of the last
>    valid element for the remaining segments.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>      seg0 - pool0, len0=14B, off0=2
>      seg1 - pool1, len1=20B, off1=128B
>      seg2 - pool2, len2=20B, off2=0B
>      seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>      seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - 20B long @ 128 in mbuf from pool1
>      seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>      seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>      seg1 - 20B @ 128 in mbuf from pool1
>      seg2 - 20B @ 0 in mbuf from pool2
>      seg3 - 512B @ 0 in mbuf from pool3
>      seg4 - 512B @ 0 in mbuf from pool3
>      seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if n_seg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>   doc/guides/nics/features.rst             | 15 +++++
>   doc/guides/rel_notes/release_20_11.rst   |  9 +++
>   lib/librte_ethdev/rte_ethdev.c           | 95 ++++++++++++++++++++++++--------
>   lib/librte_ethdev/rte_ethdev.h           | 58 ++++++++++++++++++-
>   lib/librte_ethdev/rte_ethdev_version.map |  1 +

Can you please update deprecation notice too, to remove the notice?

>   5 files changed, 155 insertions(+), 23 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index dd8c955..a45a9e8 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
>   * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
>   
>   
> +.. _nic_features_buffer_split:
> +
> +Buffer Split on Rx
> +------------------
> +
> +Scatters the packets being received on specified boundaries to segmented mbufs.
> +
> +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.

uses 'rx_seg and rx_nseg' from 'rte_eth_rxconf', perhaps it can be another line.

> +* **[implements] datapath**: ``Buffer Split functionality``.
> +* **[implements] rte_eth_dev_data**: ``buffer_split``.

What is implemented here?

> +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.

Is this correct?

> +* **[related] API**: ``rte_eth_rx_queue_setup()``.
> +
> +
>   .. _nic_features_lro:
>   
>   LRO
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index bcc0fc2..f67ec62 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -60,6 +60,12 @@ New Features
>     Added the FEC API which provides functions for query FEC capabilities and
>     current FEC mode from device. Also, API for configuring FEC mode is also provided.
>   
> +* **Introduced extended buffer description for receiving.**
> +
> +  Added the extended Rx queue setup routine providing the individual

This looks wrong with last version.

> +  descriptions for each Rx segment with maximal size, buffer offset and memory
> +  pool to allocate data buffers from.
> +
>   * **Updated Broadcom bnxt driver.**
>   
>     Updated the Broadcom bnxt driver with new features and improvements, including:
> @@ -253,6 +259,9 @@ API Changes
>     As the data of ``uint8_t`` will be truncated when queue number under
>     a TC is greater than 256.
>   
> +* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
> +  to provide extended description of the receiving buffer.
> +
>   * vhost: Moved vDPA APIs from experimental to stable.
>   
>   * rawdev: Added a structure size parameter to the functions
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 892c246..7b64a6e 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
>   #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
>   	{ DEV_RX_OFFLOAD_##_name, #_name }
>   
> +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
> +	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
> +
>   static const struct {
>   	uint64_t offload;
>   	const char *name;
> @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
>   	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
>   	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>   	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> +	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
>   };
>   
>   #undef RTE_RX_OFFLOAD_BIT2STR
> +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
>   
>   #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
>   	{ DEV_TX_OFFLOAD_##_name, #_name }
> @@ -1770,10 +1775,14 @@ struct rte_eth_dev *
>   		       struct rte_mempool *mp)
>   {
>   	int ret;
> +	uint16_t seg_idx;
>   	uint32_t mbp_buf_size;
>   	struct rte_eth_dev *dev;
>   	struct rte_eth_dev_info dev_info;
>   	struct rte_eth_rxconf local_conf;
> +	const struct rte_eth_rxseg *rx_seg;
> +	struct rte_eth_rxseg seg_single = { .mp = mp};
> +	uint16_t n_seg;
>   	void **rxq;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> @@ -1784,13 +1793,32 @@ struct rte_eth_dev *
>   		return -EINVAL;
>   	}
>   
> -	if (mp == NULL) {
> -		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> -		return -EINVAL;
> -	}
> -
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
>   
> +	rx_seg = rx_conf->rx_seg;
> +	n_seg = rx_conf->rx_nseg;
> +	if (rx_seg == NULL) {
> +		/* Exclude ambiguities about segment descrtiptions. */
> +		if (n_seg) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "Non empty array with null pointer\n");
> +			return -EINVAL;
> +		}
> +		rx_seg = &seg_single;
> +		n_seg = 1;

Why setting 'rx_seg' & 'n_seg'? Why not leaving them NULL and 0 when not used?
This was PMD can do NULL/0 check and can know they are not used.


I think better to do a "if (mp == NULL)" check here, both 'rx_seg' & 'mp' should 
not be NULL.

> +	} else {
> +		if (n_seg == 0) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "Invalid zero descriptions number\n");
> +			return -EINVAL;
> +		}
> +		if (mp) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "Memory pool duplicated definition\n");
> +			return -EINVAL;
> +		}
> +	}
> +
>   	/*
>   	 * Check the size of the mbuf data buffer.
>   	 * This value must be provided in the private data of the memory pool.
> @@ -1800,23 +1828,48 @@ struct rte_eth_dev *
>   	if (ret != 0)
>   		return ret;
>   
> -	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> -		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> -			mp->name, (int)mp->private_data_size,
> -			(int)sizeof(struct rte_pktmbuf_pool_private));
> -		return -ENOSPC;
> -	}
> -	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> +		uint32_t length = rx_seg[seg_idx].length;
> +		uint32_t offset = rx_seg[seg_idx].offset;
> +		uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
>   
> -	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
> -		RTE_ETHDEV_LOG(ERR,
> -			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
> -			mp->name, (int)mbp_buf_size,
> -			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
> -			(int)RTE_PKTMBUF_HEADROOM,
> -			(int)dev_info.min_rx_bufsize);
> -		return -EINVAL;
> +		if (mpl == NULL) {
> +			RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> +			return -EINVAL;
> +		}
> +
> +		if (mpl->private_data_size <
> +				sizeof(struct rte_pktmbuf_pool_private)) {
> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> +				mpl->name, (int)mpl->private_data_size,
> +				(int)sizeof(struct rte_pktmbuf_pool_private));
> +			return -ENOSPC;
> +		}
> +
> +		mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> +		length = length ? length : (mbp_buf_size - head_room);
> +		if (mbp_buf_size < length + offset + head_room) {
> +			RTE_ETHDEV_LOG(ERR,
> +				"%s mbuf_data_room_size %u < %u"
> +				" (segment length=%u + segment offset=%u)\n",
> +				mpl->name, mbp_buf_size,
> +				length + offset, length, offset);
> +			return -EINVAL;
> +		}
>   	}
> +	/* Check the minimal buffer size for the single segment only. */

This is the main branch, what do you think moving the comment to the beggining 
of above for loop and add a comment about testing the multiple segment.

Btw, I have a concern that this single/multi segment can cause a confusion with 
multi segment packets. Can something else, like "split package" can be used 
instead of segment?

> +	if (mp && (mbp_buf_size < dev_info.min_rx_bufsize +
> +				  RTE_PKTMBUF_HEADROOM)) {
> +		RTE_ETHDEV_LOG(ERR,
> +			       "%s mbuf_data_room_size %u < %u "
> +			       "(RTE_PKTMBUF_HEADROOM=%u + "
> +			       "min_rx_bufsize(dev)=%u)\n",
> +			       mp->name, mbp_buf_size,
> +			       RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize,
> +			       RTE_PKTMBUF_HEADROOM, dev_info.min_rx_bufsize);
> +			return -EINVAL;
> +		}
>   
>   	/* Use default specified by driver, if nb_rx_desc is zero */
>   	if (nb_rx_desc == 0) {
> @@ -1914,8 +1967,6 @@ struct rte_eth_dev *
>   			dev->data->min_rx_buf_size = mbp_buf_size;
>   	}
>   
> -	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc, mp,
> -		rx_conf, ret);

Is this removed intentionally?

>   	return eth_err(port_id, ret);
>   }
>   
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 5bcfbb8..9cf0a03 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -970,6 +970,16 @@ struct rte_eth_txmode {
>   };
>   
>   /**
> + * A structure used to configure an RX packet segment to split.
> + */
> +struct rte_eth_rxseg {
> +	struct rte_mempool *mp; /**< Memory pools to allocate segment from */
> +	uint16_t length; /**< Segment data length, configures split point. */
> +	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
> +	uint32_t reserved; /**< Reserved field */
> +};
> +
> +/**
>    * A structure used to configure an RX ring of an Ethernet port.
>    */
>   struct rte_eth_rxconf {
> @@ -977,13 +987,23 @@ struct rte_eth_rxconf {
>   	uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
>   	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
>   	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> +	/**
> +	 * The pointer to the array of segment descriptions, each element
> +	 * describes the memory pool, maximal segment data length, initial
> +	 * data offset from the beginning of data buffer in mbuf. This allow
> +	 * to specify the dedicated properties for each segment in the receiving
> +	 * buffer - pool, buffer offset, maximal segment size. The number of
> +	 * segment descriptions in the array is specified by the rx_nseg
> +	 * field.
> +	 */

What do you think providing a short description here, and move above comment to 
abice "struct rte_eth_rxseg" struct?

> +	struct rte_eth_rxseg *rx_seg;
>   	/**
>   	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
>   	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
>   	 * fields on rte_eth_dev_info structure are allowed to be set.
>   	 */
>   	uint64_t offloads;
> -

unrelated

>   	uint64_t reserved_64s[2]; /**< Reserved for future fields */
>   	void *reserved_ptrs[2];   /**< Reserved for future fields */
>   };
> @@ -1260,6 +1280,7 @@ struct rte_eth_conf {
>   #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
>   #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>   #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
> +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
>   
>   #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>   				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> @@ -2027,6 +2048,41 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
>    *   No need to repeat any bit in rx_conf->offloads which has already been
>    *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
>    *   at port level can't be disabled at queue level.

Can it be possible to put a kind of marker here, like "@rx_seg & @rx_nseg", to 
clarify what are you talking about.

> + *   The configuration structure also contains the pointer to the array
> + *   of the receiving buffer segment descriptions, each element describes
> + *   the memory pool, maximal segment data length, initial data offset from
> + *   the beginning of data buffer in mbuf. This allow to specify the dedicated
> + *   properties for each segment in the receiving buffer - pool, buffer
> + *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
> + *   flag is configured the PMD will split the received packets into multiple
> + *   segments according to the specification in the description array:
> + *   - the first network buffer will be allocated from the memory pool,
> + *     specified in the first segment description element, the second
> + *     network buffer - from the pool in the second segment description
> + *     element and so on. If there is no enough elements to describe
> + *     the buffer for entire packet of maximal length the pool from the last
> + *     valid element will be used to allocate the buffers from for the rest
> + *     of segments.
> + *   - the offsets from the segment description elements will provide the
> + *     data offset from the buffer beginning except the first mbuf - for this
> + *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
> + *     offset from the buffer beginning. If there is no enough elements
> + *     to describe the buffer for entire packet of maximal length the offsets
> + *     for the rest of segment will be supposed to be zero.
> + *   - the data length being received to each segment is limited by the
> + *     length specified in the segment description element. The data receiving
> + *     starts with filling up the first mbuf data buffer, if the specified
> + *     maximal segment length is reached and there are data remaining
> + *     (packet is longer than buffer in the first mbuf) the following data
> + *     will be pushed to the next segment up to its own length. If the first
> + *     two segments is not enough to store all the packet data the next
> + *     (third) segment will be engaged and so on. If the length in the segment
> + *     description element is zero the actual buffer size will be deduced
> + *     from the appropriate memory pool properties. If there is no enough
> + *     elements to describe the buffer for entire packet of maximal length
> + *     the buffer size will be deduced from the pool of the last valid
> + *     element for the all remaining segments.
> + *

I think as a first thing the comment should clarify that if @rx_seg provided 
'mb_pool' should be NULL, and if split Rx feature is not used "@rx_seg & 
@rx_nseg" should be NULL and 0.

Also above is too wordy, it is hard to follow. Like "@rx_seg & @rx_nseg" are 
only taken into account if application provides 
'RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT' offload should be clearer to see, etc.
Can you try to simplify it, perhpas moving some of above comments to the "struct 
rte_eth_rxseg" can work?

>    * @param mb_pool
>    *   The pointer to the memory pool from which to allocate *rte_mbuf* network
>    *   memory buffers to populate each descriptor of the receive ring.
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index f8a0945..25f7cee 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -232,6 +232,7 @@ EXPERIMENTAL {
>   	rte_eth_fec_get_capability;
>   	rte_eth_fec_get;
>   	rte_eth_fec_set;
> +

unrelated

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-13 21:59         ` Ferruh Yigit
@ 2020-10-14  7:17           ` Thomas Monjalon
  2020-10-14  7:37           ` Slava Ovsiienko
  1 sibling, 0 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-14  7:17 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Slava Ovsiienko, Andrew Rybchenko, dev, stephen, Shahaf Shuler,
	olivier.matz, jerinjacobk, maxime.coquelin, david.marchand,
	Asaf Penso, Konstantin Ananyev

13/10/2020 23:59, Ferruh Yigit:
> On 10/12/2020 10:56 AM, Slava Ovsiienko wrote:
> > We have two approaches how to specify multiple segments to split Rx packets:
> > 1. update queue configuration structure
> > 2. introduce new rx_queue_setup_ex() routine with extra parameters.
> > 
> > For [1] my only actual dislike is that we would have multiple places to specify
> > the pool - in rx_queue_setup() and in the config structure. So, we should
> > implement some checking (if we have offload flag set we should check
> > whether mp parameter is NULL and segment descriptions array pointer/size
> > is provided, if no offload flag set - we must check the description array is empty).
> > 
> >> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers think
> >> about it.
> > 
> > Yes, it would be very nice to hear extra opinions. Do we think the providing
> > of extra API function is worse than extending existing structure, introducing
> > some conditional ambiguity and complicating the parameter compliance
> > check?
> 
> I think decision was given with the deprecation notice which already says 
> ``rte_eth_rxconf`` will be updated for this.
> 
> With new API, we need to create a new dev_ops too, not sure about creating a new 
> dev_ops for a single PMD.

You should not view it as a feature for a single PMD.
Yes, as always, it starts with only one PMD implementing the API,
but I really think this feature is generic and multiple NICs
will be able to support this offload.


> For the PMD that supports this feature will need two dev_ops that is fairly 
> close to each other, as Andrew mentioned this is a duplication.
> 
> And from user perspective two setup functions with overlaps can be confusing.
> 
> +1 to having single setup function but update the config, and I can see v5 sent 
> this way, I will check it.
> 
> 
> > Now I'm updating the existing version on the patch based on rx_queue_ex()
> > and then could prepare the version for configuration structure,
> > it is not a problem - approaches are very similar, we just should choose
> > the most relevant one.
> > 
> > With best regards, Slava




^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
  2020-10-13 21:59         ` Ferruh Yigit
  2020-10-14  7:17           ` Thomas Monjalon
@ 2020-10-14  7:37           ` Slava Ovsiienko
  1 sibling, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-14  7:37 UTC (permalink / raw)
  To: Ferruh Yigit, Andrew Rybchenko, dev
  Cc: Thomas Monjalon, stephen, Shahaf Shuler, olivier.matz,
	jerinjacobk, maxime.coquelin, david.marchand, Asaf Penso,
	Konstantin Ananyev

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, October 14, 2020 0:59
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Andrew Rybchenko
> <Andrew.Rybchenko@oktetlabs.ru>; dev@dpdk.org
> Cc: Thomas Monjalon <thomasm@mellanox.com>;
> stephen@networkplumber.org; Shahaf Shuler <shahafs@nvidia.com>;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com; Asaf Penso
> <asafp@nvidia.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split
> 
> On 10/12/2020 10:56 AM, Slava Ovsiienko wrote:
> > Hi, Andrew
> >
> > Thank you for the comments.
> >
> > We have two approaches how to specify multiple segments to split Rx
> packets:
> > 1. update queue configuration structure 2. introduce new
> > rx_queue_setup_ex() routine with extra parameters.
> >
> > For [1] my only actual dislike is that we would have multiple places
> > to specify the pool - in rx_queue_setup() and in the config structure.
> > So, we should implement some checking (if we have offload flag set we
> > should check whether mp parameter is NULL and segment descriptions
> > array pointer/size is provided, if no offload flag set - we must check the
> description array is empty).
> >
> >> @Thomas, @Ferruh: I'd like to hear what other ethdev maintainers
> >> think about it.
> >
> > Yes, it would be very nice to hear extra opinions. Do we think the
> > providing of extra API function is worse than extending existing
> > structure, introducing some conditional ambiguity and complicating the
> > parameter compliance check?
> >
> 
> I think decision was given with the deprecation notice which already says
> ``rte_eth_rxconf`` will be updated for this.
> 
> With new API, we need to create a new dev_ops too, not sure about creating
> a new dev_ops for a single PMD.

I would rather consider the feature as generic one, not as for "single PMD".
Currently DPDK does not provide any flexibility about Rx buffer and applications 
just have no way to control Rx buffer(s) attributes. I suppose the split
is not very specific to mlx5, it is very likely the hardware of many other
vendors should be capable to perform the same things.

> 
> For the PMD that supports this feature will need two dev_ops that is fairly
> close to each other, as Andrew mentioned this is a duplication.
> 
> And from user perspective two setup functions with overlaps can be
> confusing.
> 
> +1 to having single setup function but update the config, and I can see
> +v5 sent
> this way, I will check it.

OK, got it, thank you very much for the opinion, let's move in this way.
v5 with reverted (back to deprecation notice) approach is sent, I will address the comments.

With best regards, Slaa

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: introduce Rx buffer split
  2020-10-13 22:34     ` Ferruh Yigit
@ 2020-10-14 13:31       ` Olivier Matz
  2020-10-14 14:42       ` Slava Ovsiienko
  1 sibling, 0 replies; 172+ messages in thread
From: Olivier Matz @ 2020-10-14 13:31 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Viacheslav Ovsiienko, dev, thomasm, stephen, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Hi Slava,

After reviewing this version and comparing with v4, I think having the
configuration in rxconf (like in this patch) is a better choice.

Few comments below.

On Tue, Oct 13, 2020 at 11:34:16PM +0100, Ferruh Yigit wrote:
> On 10/13/2020 8:21 PM, Viacheslav Ovsiienko wrote:
> > The DPDK datapath in the transmit direction is very flexible.
> > An application can build the multi-segment packet and manages
> > almost all data aspects - the memory pools where segments
> > are allocated from, the segment lengths, the memory attributes
> > like external buffers, registered for DMA, etc.
> > 
> > In the receiving direction, the datapath is much less flexible,
> > an application can only specify the memory pool to configure the
> > receiving queue and nothing more. In order to extend receiving
> > datapath capabilities it is proposed to add the way to provide
> > extended information how to split the packets being received.
> > 
> > The following structure is introduced to specify the Rx packet
> > segment:
> > 
> > struct rte_eth_rxseg {
> >      struct rte_mempool *mp; /* memory pools to allocate segment from */
> >      uint16_t length; /* segment maximal data length,
> > 		       	configures "split point" */
> >      uint16_t offset; /* data offset from beginning
> > 		       	of mbuf data buffer */
> >      uint32_t reserved; /* reserved field */
> > };
> > 
> > The segment descriptions are added to the rte_eth_rxconf structure:
> >     rx_seg - pointer the array of segment descriptions, each element
> >               describes the memory pool, maximal data length, initial
> >               data offset from the beginning of data buffer in mbuf.
> > 	     This array allows to specify the different settings for
> > 	     each segment in individual fashion.
> >     n_seg - number of elements in the array
> > 
> > If the extended segment descriptions is provided with these new
> > fields the mp parameter of the rte_eth_rx_queue_setup must be
> > specified as NULL to avoid ambiguity.
> > 
> > The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> > capabilities is introduced to present the way for PMD to report to
> > application about supporting Rx packet split to configurable
> > segments. Prior invoking the rte_eth_rx_queue_setup() routine
> > application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> > 
> > If the Rx queue is configured with new routine the packets being
> > received will be split into multiple segments pushed to the mbufs
> > with specified attributes. The PMD will split the received packets
> > into multiple segments according to the specification in the
> > description array:
> > 
> > - the first network buffer will be allocated from the memory pool,
> >    specified in the first segment description element, the second
> >    network buffer - from the pool in the second segment description
> >    element and so on. If there is no enough elements to describe
> >    the buffer for entire packet of maximal length the pool from the
> >    last valid element will be used to allocate the buffers from for the
> >    rest of segments
> > 
> > - the offsets from the segment description elements will provide
> >    the data offset from the buffer beginning except the first mbuf -
> >    for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
> >    actual offset from the buffer beginning. If there is no enough
> >    elements to describe the buffer for entire packet of maximal length
> >    the offsets for the rest of segment will be supposed to be zero.
> > 
> > - the data length being received to each segment is limited  by the
> >    length specified in the segment description element. The data
> >    receiving starts with filling up the first mbuf data buffer, if the
> >    specified maximal segment length is reached and there are data
> >    remaining (packet is longer than buffer in the first mbuf) the
> >    following data will be pushed to the next segment up to its own
> >    maximal length. If the first two segments is not enough to store
> >    all the packet remaining data  the next (third) segment will
> >    be engaged and so on. If the length in the segment description
> >    element is zero the actual buffer size will be deduced from
> >    the appropriate memory pool properties. If there is no enough
> >    elements to describe the buffer for entire packet of maximal
> >    length the buffer size will be deduced from the pool of the last
> >    valid element for the remaining segments.
> > 
> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >      seg0 - pool0, len0=14B, off0=2
> >      seg1 - pool1, len1=20B, off1=128B
> >      seg2 - pool2, len2=20B, off2=0B
> >      seg3 - pool3, len3=512B, off3=0B
> > 
> > The packet 46 bytes long will look like the following:
> >      seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >      seg1 - 20B long @ 128 in mbuf from pool1
> >      seg2 - 12B long @ 0 in mbuf from pool2
> > 
> > The packet 1500 bytes long will look like the following:
> >      seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
> >      seg1 - 20B @ 128 in mbuf from pool1
> >      seg2 - 20B @ 0 in mbuf from pool2
> >      seg3 - 512B @ 0 in mbuf from pool3
> >      seg4 - 512B @ 0 in mbuf from pool3
> >      seg5 - 422B @ 0 in mbuf from pool3
> > 
> > The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> > configured to support new buffer split feature (if n_seg
> > is greater than one).
> > 
> > The new approach would allow splitting the ingress packets into
> > multiple parts pushed to the memory with different attributes.
> > For example, the packet headers can be pushed to the embedded
> > data buffers within mbufs and the application data into
> > the external buffers attached to mbufs allocated from the
> > different memory pools. The memory attributes for the split
> > parts may differ either - for example the application data
> > may be pushed into the external memory located on the dedicated
> > physical device, say GPU or NVMe. This would improve the DPDK
> > receiving datapath flexibility with preserving compatibility
> > with existing API.
> > 
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > ---
> >   doc/guides/nics/features.rst             | 15 +++++
> >   doc/guides/rel_notes/release_20_11.rst   |  9 +++
> >   lib/librte_ethdev/rte_ethdev.c           | 95 ++++++++++++++++++++++++--------
> >   lib/librte_ethdev/rte_ethdev.h           | 58 ++++++++++++++++++-
> >   lib/librte_ethdev/rte_ethdev_version.map |  1 +
> 
> Can you please update deprecation notice too, to remove the notice?
> 
> >   5 files changed, 155 insertions(+), 23 deletions(-)
> > 
> > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > index dd8c955..a45a9e8 100644
> > --- a/doc/guides/nics/features.rst
> > +++ b/doc/guides/nics/features.rst
> > @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
> >   * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
> > +.. _nic_features_buffer_split:
> > +
> > +Buffer Split on Rx
> > +------------------
> > +
> > +Scatters the packets being received on specified boundaries to segmented mbufs.
> > +
> > +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> 
> uses 'rx_seg and rx_nseg' from 'rte_eth_rxconf', perhaps it can be another line.
> 
> > +* **[implements] datapath**: ``Buffer Split functionality``.
> > +* **[implements] rte_eth_dev_data**: ``buffer_split``.
> 
> What is implemented here?
> 
> > +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> > +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> 
> Is this correct?
> 
> > +* **[related] API**: ``rte_eth_rx_queue_setup()``.
> > +
> > +
> >   .. _nic_features_lro:
> >   LRO
> > diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> > index bcc0fc2..f67ec62 100644
> > --- a/doc/guides/rel_notes/release_20_11.rst
> > +++ b/doc/guides/rel_notes/release_20_11.rst
> > @@ -60,6 +60,12 @@ New Features
> >     Added the FEC API which provides functions for query FEC capabilities and
> >     current FEC mode from device. Also, API for configuring FEC mode is also provided.
> > +* **Introduced extended buffer description for receiving.**
> > +
> > +  Added the extended Rx queue setup routine providing the individual
> 
> This looks wrong with last version.
> 
> > +  descriptions for each Rx segment with maximal size, buffer offset and memory
> > +  pool to allocate data buffers from.
> > +
> >   * **Updated Broadcom bnxt driver.**
> >     Updated the Broadcom bnxt driver with new features and improvements, including:
> > @@ -253,6 +259,9 @@ API Changes
> >     As the data of ``uint8_t`` will be truncated when queue number under
> >     a TC is greater than 256.
> > +* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
> > +  to provide extended description of the receiving buffer.
> > +
> >   * vhost: Moved vDPA APIs from experimental to stable.
> >   * rawdev: Added a structure size parameter to the functions
> > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> > index 892c246..7b64a6e 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
> >   #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
> >   	{ DEV_RX_OFFLOAD_##_name, #_name }
> > +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
> > +	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
> > +
> >   static const struct {
> >   	uint64_t offload;
> >   	const char *name;
> > @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
> >   	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
> >   	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> >   	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> > +	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
> >   };
> >   #undef RTE_RX_OFFLOAD_BIT2STR
> > +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
> >   #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
> >   	{ DEV_TX_OFFLOAD_##_name, #_name }
> > @@ -1770,10 +1775,14 @@ struct rte_eth_dev *
> >   		       struct rte_mempool *mp)
> >   {
> >   	int ret;
> > +	uint16_t seg_idx;
> >   	uint32_t mbp_buf_size;
> >   	struct rte_eth_dev *dev;
> >   	struct rte_eth_dev_info dev_info;
> >   	struct rte_eth_rxconf local_conf;
> > +	const struct rte_eth_rxseg *rx_seg;
> > +	struct rte_eth_rxseg seg_single = { .mp = mp};

missing space

> > +	uint16_t n_seg;
> >   	void **rxq;
> >   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > @@ -1784,13 +1793,32 @@ struct rte_eth_dev *
> >   		return -EINVAL;
> >   	}
> > -	if (mp == NULL) {
> > -		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> > -		return -EINVAL;
> > -	}
> > -
> >   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
> > +	rx_seg = rx_conf->rx_seg;
> > +	n_seg = rx_conf->rx_nseg;
> > +	if (rx_seg == NULL) {
> > +		/* Exclude ambiguities about segment descrtiptions. */
> > +		if (n_seg) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "Non empty array with null pointer\n");
> > +			return -EINVAL;
> > +		}
> > +		rx_seg = &seg_single;
> > +		n_seg = 1;
> 
> Why setting 'rx_seg' & 'n_seg'? Why not leaving them NULL and 0 when not used?
> This was PMD can do NULL/0 check and can know they are not used.

I think they are just set locally to factorize the checks below, but I agree it
is questionnable: it seems the only check which is really factorized is about
private_data_size.

> I think better to do a "if (mp == NULL)" check here, both 'rx_seg' & 'mp'
> should not be NULL.

Agree, something like this looks more simple:

	if (mp == NULL) {
		if (n_seg == 0 || rx_seg == NULL)
			RTE_ETHDEV_LOG(ERR, "...");
	} else {
		if (n_seg != 0 || rx_seg != NULL)
			RTE_ETHDEV_LOG(ERR, "...");
		rx_seg = &seg_single;
		n_seg = 1;
	}

 
> > +	} else {
> > +		if (n_seg == 0) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "Invalid zero descriptions number\n");
> > +			return -EINVAL;
> > +		}
> > +		if (mp) {
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "Memory pool duplicated definition\n");
> > +			return -EINVAL;
> > +		}
> > +	}
> > +
> >   	/*
> >   	 * Check the size of the mbuf data buffer.
> >   	 * This value must be provided in the private data of the memory pool.
> > @@ -1800,23 +1828,48 @@ struct rte_eth_dev *
> >   	if (ret != 0)
> >   		return ret;
> > -	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> > -		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> > -			mp->name, (int)mp->private_data_size,
> > -			(int)sizeof(struct rte_pktmbuf_pool_private));
> > -		return -ENOSPC;
> > -	}
> > -	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> > +	for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> > +		struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> > +		uint32_t length = rx_seg[seg_idx].length;
> > +		uint32_t offset = rx_seg[seg_idx].offset;
> > +		uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
> > -	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
> > -		RTE_ETHDEV_LOG(ERR,
> > -			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
> > -			mp->name, (int)mbp_buf_size,
> > -			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
> > -			(int)RTE_PKTMBUF_HEADROOM,
> > -			(int)dev_info.min_rx_bufsize);
> > -		return -EINVAL;
> > +		if (mpl == NULL) {
> > +			RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> > +			return -EINVAL;
> > +		}
> > +
> > +		if (mpl->private_data_size <
> > +				sizeof(struct rte_pktmbuf_pool_private)) {
> > +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> > +				mpl->name, (int)mpl->private_data_size,
> > +				(int)sizeof(struct rte_pktmbuf_pool_private));
> > +			return -ENOSPC;
> > +		}
> > +
> > +		mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > +		length = length ? length : (mbp_buf_size - head_room);
> > +		if (mbp_buf_size < length + offset + head_room) {


Is length == 0 allowed? Or is it to handle the case where mp != NULL?
Is the test needed in that case? It seems it is equivalent to do "if
(offset > 0)".

Wouldn't it be better to check mp like this?

	if (mp == NULL) {
		if (mbp_buf_size < length + offset + head_room)
			...error...
	}


> > +			RTE_ETHDEV_LOG(ERR,
> > +				"%s mbuf_data_room_size %u < %u"
> > +				" (segment length=%u + segment offset=%u)\n",
> > +				mpl->name, mbp_buf_size,
> > +				length + offset, length, offset);
> > +			return -EINVAL;
> > +		}
> >   	}
> > +	/* Check the minimal buffer size for the single segment only. */
> 
> This is the main branch, what do you think moving the comment to the
> beggining of above for loop and add a comment about testing the multiple
> segment.
> 
> Btw, I have a concern that this single/multi segment can cause a confusion
> with multi segment packets. Can something else, like "split package" can be
> used instead of segment?
> 
> > +	if (mp && (mbp_buf_size < dev_info.min_rx_bufsize +
> > +				  RTE_PKTMBUF_HEADROOM)) {
> > +		RTE_ETHDEV_LOG(ERR,
> > +			       "%s mbuf_data_room_size %u < %u "
> > +			       "(RTE_PKTMBUF_HEADROOM=%u + "
> > +			       "min_rx_bufsize(dev)=%u)\n",
> > +			       mp->name, mbp_buf_size,
> > +			       RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize,
> > +			       RTE_PKTMBUF_HEADROOM, dev_info.min_rx_bufsize);
> > +			return -EINVAL;
> > +		}

The "}" is not indented correctly


> >   	/* Use default specified by driver, if nb_rx_desc is zero */
> >   	if (nb_rx_desc == 0) {
> > @@ -1914,8 +1967,6 @@ struct rte_eth_dev *
> >   			dev->data->min_rx_buf_size = mbp_buf_size;
> >   	}
> > -	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc, mp,
> > -		rx_conf, ret);
> 
> Is this removed intentionally?
> 
> >   	return eth_err(port_id, ret);
> >   }
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index 5bcfbb8..9cf0a03 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -970,6 +970,16 @@ struct rte_eth_txmode {
> >   };
> >   /**
> > + * A structure used to configure an RX packet segment to split.
> > + */
> > +struct rte_eth_rxseg {
> > +	struct rte_mempool *mp; /**< Memory pools to allocate segment from */

pools -> pool

> > +	uint16_t length; /**< Segment data length, configures split point. */
> > +	uint16_t offset; /**< Data offset from beginning of mbuf data buffer */
> > +	uint32_t reserved; /**< Reserved field */

To allow future additions to the structure, should the reserved field
always be set to 0? If yes, maybe it should be specified here and
checked in rte_eth_rx_queue_setup().

Some "." missing at the end of comments.

> > +};
> > +
> > +/**
> >    * A structure used to configure an RX ring of an Ethernet port.
> >    */
> >   struct rte_eth_rxconf {
> > @@ -977,13 +987,23 @@ struct rte_eth_rxconf {
> >   	uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
> >   	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
> >   	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> > +	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> > +	/**
> > +	 * The pointer to the array of segment descriptions, each element
> > +	 * describes the memory pool, maximal segment data length, initial
> > +	 * data offset from the beginning of data buffer in mbuf. This allow

allow -> allows
or maybe "This allows to specify" -> "This specifies"

> > +	 * to specify the dedicated properties for each segment in the receiving
> > +	 * buffer - pool, buffer offset, maximal segment size. The number of
> > +	 * segment descriptions in the array is specified by the rx_nseg
> > +	 * field.
> > +	 */
> 
> What do you think providing a short description here, and move above comment
> to abice "struct rte_eth_rxseg" struct?

+1

> 
> > +	struct rte_eth_rxseg *rx_seg;
> >   	/**
> >   	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
> >   	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> >   	 * fields on rte_eth_dev_info structure are allowed to be set.
> >   	 */
> >   	uint64_t offloads;
> > -
> 
> unrelated
> 
> >   	uint64_t reserved_64s[2]; /**< Reserved for future fields */
> >   	void *reserved_ptrs[2];   /**< Reserved for future fields */
> >   };
> > @@ -1260,6 +1280,7 @@ struct rte_eth_conf {
> >   #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
> >   #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
> >   #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
> > +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
> >   #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
> >   				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> > @@ -2027,6 +2048,41 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
> >    *   No need to repeat any bit in rx_conf->offloads which has already been
> >    *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
> >    *   at port level can't be disabled at queue level.
> 
> Can it be possible to put a kind of marker here, like "@rx_seg & @rx_nseg",
> to clarify what are you talking about.
> 
> > + *   The configuration structure also contains the pointer to the array
> > + *   of the receiving buffer segment descriptions, each element describes
> > + *   the memory pool, maximal segment data length, initial data offset from
> > + *   the beginning of data buffer in mbuf. This allow to specify the dedicated
> > + *   properties for each segment in the receiving buffer - pool, buffer
> > + *   offset, maximal segment size. If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload
> > + *   flag is configured the PMD will split the received packets into multiple
> > + *   segments according to the specification in the description array:
> > + *   - the first network buffer will be allocated from the memory pool,
> > + *     specified in the first segment description element, the second
> > + *     network buffer - from the pool in the second segment description
> > + *     element and so on. If there is no enough elements to describe
> > + *     the buffer for entire packet of maximal length the pool from the last
> > + *     valid element will be used to allocate the buffers from for the rest
> > + *     of segments.
> > + *   - the offsets from the segment description elements will provide the
> > + *     data offset from the buffer beginning except the first mbuf - for this
> > + *     one the offset is added to the RTE_PKTMBUF_HEADROOM to get actual
> > + *     offset from the buffer beginning. If there is no enough elements
> > + *     to describe the buffer for entire packet of maximal length the offsets
> > + *     for the rest of segment will be supposed to be zero.
> > + *   - the data length being received to each segment is limited by the
> > + *     length specified in the segment description element. The data receiving
> > + *     starts with filling up the first mbuf data buffer, if the specified
> > + *     maximal segment length is reached and there are data remaining
> > + *     (packet is longer than buffer in the first mbuf) the following data
> > + *     will be pushed to the next segment up to its own length. If the first
> > + *     two segments is not enough to store all the packet data the next
> > + *     (third) segment will be engaged and so on. If the length in the segment
> > + *     description element is zero the actual buffer size will be deduced
> > + *     from the appropriate memory pool properties. If there is no enough
> > + *     elements to describe the buffer for entire packet of maximal length
> > + *     the buffer size will be deduced from the pool of the last valid
> > + *     element for the all remaining segments.
> > + *
> 
> I think as a first thing the comment should clarify that if @rx_seg provided
> 'mb_pool' should be NULL, and if split Rx feature is not used "@rx_seg &
> @rx_nseg" should be NULL and 0.
> 
> Also above is too wordy, it is hard to follow. Like "@rx_seg & @rx_nseg" are
> only taken into account if application provides
> 'RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT' offload should be clearer to see, etc.
> Can you try to simplify it, perhpas moving some of above comments to the
> "struct rte_eth_rxseg" can work?
> 
> >    * @param mb_pool
> >    *   The pointer to the memory pool from which to allocate *rte_mbuf* network
> >    *   memory buffers to populate each descriptor of the receive ring.
> > diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> > index f8a0945..25f7cee 100644
> > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > @@ -232,6 +232,7 @@ EXPERIMENTAL {
> >   	rte_eth_fec_get_capability;
> >   	rte_eth_fec_get;
> >   	rte_eth_fec_set;
> > +
> 
> unrelated

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: introduce Rx buffer split
  2020-10-13 22:34     ` Ferruh Yigit
  2020-10-14 13:31       ` Olivier Matz
@ 2020-10-14 14:42       ` Slava Ovsiienko
  1 sibling, 0 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-14 14:42 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: thomasm, stephen, olivier.matz, jerinjacobk, maxime.coquelin,
	david.marchand, arybchenko

Hi, Ferruh
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, October 14, 2020 1:34
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: thomasm@monjalon.net; stephen@networkplumber.org;
> olivier.matz@6wind.com; jerinjacobk@gmail.com;
> maxime.coquelin@redhat.com; david.marchand@redhat.com;
> arybchenko@solarflare.com
> Subject: Re: [PATCH v5 1/6] ethdev: introduce Rx buffer split
> 
[..snip..]
> 
> Can you please update deprecation notice too, to remove the notice?
> 
Yes, I missed the point, thank you for noticing.

> >   5 files changed, 155 insertions(+), 23 deletions(-)
> >
> > diff --git a/doc/guides/nics/features.rst
> > b/doc/guides/nics/features.rst index dd8c955..a45a9e8 100644
> > --- a/doc/guides/nics/features.rst
> > +* **[implements] rte_eth_dev_data**: ``buffer_split``.
> 
> What is implemented here?
> 
none, removed.

> > +* **[provides]   rte_eth_dev_info**:
> ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> > +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> 
> Is this correct?
Yes, the dedicated rx_burst routine supporting buffer split might
be engaged by PMD and might be reported via rxq_info_get().

> > +		rx_seg = &seg_single;
> > +		n_seg = 1;
> 
> Why setting 'rx_seg' & 'n_seg'? Why not leaving them NULL and 0 when not
> used?
> This was PMD can do NULL/0 check and can know they are not used.
Refactored, single pool (legacy) and new extended config check are
separated into dedicated branches.
 

> > -	rte_ethdev_trace_rxq_setup(port_id, rx_queue_id, nb_rx_desc, mp,
> > -		rx_conf, ret);
> 
> Is this removed intentionally?
> 
Missed statement, reverted back.

[..snip..]
Comments and descriptions rearranged and updated according to the comments, v6 is coming.

With best regards, Slava

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
                   ` (5 preceding siblings ...)
  2020-10-13 19:21 ` [dpdk-dev] [PATCH v5 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-14 18:11 ` Viacheslav Ovsiienko
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
                     ` (5 more replies)
  2020-10-15  0:55 ` [dpdk-dev] [PATCH v2] eal/rte_malloc: add alloc_size() attribute to allocation functions Stephen Hemminger
                   ` (6 subsequent siblings)
  13 siblings, 6 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:11 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The segment descriptions are added to the rte_eth_rxconf structure:
   rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
   rx_nseg - number of elements in the array

If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.

There are two options to specifiy Rx buffer configuration:
- mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
  it is compatible configuraion, follows existing implementation,
  provides single pool and no description for segment sizes
  and offsets.
- mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
  zero, it provides the extended configuration, individually for
  each segment.

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new settings the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if rx_nseg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v1: http://patches.dpdk.org/patch/79594/
v2: http://patches.dpdk.org/patch/79893/
    - add feature support to mlx5 PMD

v3: http://patches.dpdk.org/patch/80389/
    - rte_eth_rx_queue_setup_ex is renamed to rte_eth_rxseg_queue_setup
    - DEV_RX_OFFLOAD_BUFFER_SPLIT is renamed to RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
    - commit message update
    - documentaion provided
    - release notes update
    - minor bug fixes in testpmd related part

v4: http://patches.dpdk.org/patch/80401/
    - common part of rx_queue_setup/rxseg_queue_setup

v5: http://patches.dpdk.org/patch/80609/
    - refactored to approach of providing split configuration
      in the rte_eth_rxconf structure instead of introducing
      the new API routine
    - added support for rxoffs command to testpmd to
      provide segment offsets for complete testing of split
      configurations
    - patchset is split into two parts - PMD part will
      be presented as separate series

v6: - wordy comments rephrased
    - typos fixed
    - rte_eth_rx_queue_setup configuration check isolated
      for two main options
    - the rest of comments addressed


Viacheslav Ovsiienko (6):
  ethdev: introduce Rx buffer split
  app/testpmd: add multiple pools per core creation
  app/testpmd: add buffer split offload configuration
  app/testpmd: add rxpkts commands and parameters
  app/testpmd: add rxoffs commands and parameters
  app/testpmd: add extended Rx queue setup

 app/test-pmd/bpf_cmd.c                      |   4 +-
 app/test-pmd/cmdline.c                      | 151 ++++++++++++++++++++++++----
 app/test-pmd/config.c                       | 107 +++++++++++++++++++-
 app/test-pmd/parameters.c                   |  54 ++++++++--
 app/test-pmd/testpmd.c                      | 120 ++++++++++++++++------
 app/test-pmd/testpmd.h                      |  44 ++++++--
 doc/guides/nics/features.rst                |  15 +++
 doc/guides/rel_notes/deprecation.rst        |   5 -
 doc/guides/rel_notes/release_20_11.rst      |   9 ++
 doc/guides/testpmd_app_ug/run_app.rst       |  22 +++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  36 ++++++-
 lib/librte_ethdev/rte_ethdev.c              | 111 +++++++++++++++-----
 lib/librte_ethdev/rte_ethdev.h              |  62 +++++++++++-
 13 files changed, 637 insertions(+), 103 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-14 18:11   ` " Viacheslav Ovsiienko
  2020-10-14 18:57     ` Jerin Jacob
                       ` (3 more replies)
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 2/6] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  5 siblings, 4 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:11 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.

In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.

The following structure is introduced to specify the Rx packet
segment:

struct rte_eth_rxseg {
    struct rte_mempool *mp; /* memory pools to allocate segment from */
    uint16_t length; /* segment maximal data length,
		       	configures "split point" */
    uint16_t offset; /* data offset from beginning
		       	of mbuf data buffer */
    uint32_t reserved; /* reserved field */
};

The segment descriptions are added to the rte_eth_rxconf structure:
   rx_seg - pointer the array of segment descriptions, each element
             describes the memory pool, maximal data length, initial
             data offset from the beginning of data buffer in mbuf.
	     This array allows to specify the different settings for
	     each segment in individual fashion.
   rx_nseg - number of elements in the array

If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.

There are two options to specifiy Rx buffer configuration:
- mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
  it is compatible configuraion, follows existing implementation,
  provides single pool and no description for segment sizes
  and offsets.
- mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
  zero, it provides the extended configuration, individually for
  each segment.

The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.

If the Rx queue is configured with new settings the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array:

- the first network buffer will be allocated from the memory pool,
  specified in the first segment description element, the second
  network buffer - from the pool in the second segment description
  element and so on. If there is no enough elements to describe
  the buffer for entire packet of maximal length the pool from the
  last valid element will be used to allocate the buffers from for the
  rest of segments

- the offsets from the segment description elements will provide
  the data offset from the buffer beginning except the first mbuf -
  for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
  actual offset from the buffer beginning. If there is no enough
  elements to describe the buffer for entire packet of maximal length
  the offsets for the rest of segment will be supposed to be zero.

- the data length being received to each segment is limited  by the
  length specified in the segment description element. The data
  receiving starts with filling up the first mbuf data buffer, if the
  specified maximal segment length is reached and there are data
  remaining (packet is longer than buffer in the first mbuf) the
  following data will be pushed to the next segment up to its own
  maximal length. If the first two segments is not enough to store
  all the packet remaining data  the next (third) segment will
  be engaged and so on. If the length in the segment description
  element is zero the actual buffer size will be deduced from
  the appropriate memory pool properties. If there is no enough
  elements to describe the buffer for entire packet of maximal
  length the buffer size will be deduced from the pool of the last
  valid element for the remaining segments.

For example, let's suppose we configured the Rx queue with the
following segments:
    seg0 - pool0, len0=14B, off0=2
    seg1 - pool1, len1=20B, off1=128B
    seg2 - pool2, len2=20B, off2=0B
    seg3 - pool3, len3=512B, off3=0B

The packet 46 bytes long will look like the following:
    seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B long @ 128 in mbuf from pool1
    seg2 - 12B long @ 0 in mbuf from pool2

The packet 1500 bytes long will look like the following:
    seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
    seg1 - 20B @ 128 in mbuf from pool1
    seg2 - 20B @ 0 in mbuf from pool2
    seg3 - 512B @ 0 in mbuf from pool3
    seg4 - 512B @ 0 in mbuf from pool3
    seg5 - 422B @ 0 in mbuf from pool3

The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if rx_nseg
is greater than one).

The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/features.rst           |  15 +++++
 doc/guides/rel_notes/deprecation.rst   |   5 --
 doc/guides/rel_notes/release_20_11.rst |   9 +++
 lib/librte_ethdev/rte_ethdev.c         | 111 +++++++++++++++++++++++++--------
 lib/librte_ethdev/rte_ethdev.h         |  62 +++++++++++++++++-
 5 files changed, 171 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index dd8c955..832ea3b 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
 * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
 
 
+.. _nic_features_buffer_split:
+
+Buffer Split on Rx
+------------------
+
+Scatters the packets being received on specified boundaries to segmented mbufs.
+
+* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[uses]       rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``.
+* **[implements] datapath**: ``Buffer Split functionality``.
+* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
+* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
+* **[related] API**: ``rte_eth_rx_queue_setup()``.
+
+
 .. _nic_features_lro:
 
 LRO
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 584e720..232cd54 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -138,11 +138,6 @@ Deprecation Notices
   In 19.11 PMDs will still update the field even when the offload is not
   enabled.
 
-* ethdev: Add new fields to ``rte_eth_rxconf`` to configure the receiving
-  queues to split ingress packets into multiple segments according to the
-  specified lengths into the buffers allocated from the specified
-  memory pools. The backward compatibility to existing API is preserved.
-
 * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
   will be removed in 21.11.
   Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index bcc0fc2..bcc2479 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -60,6 +60,12 @@ New Features
   Added the FEC API which provides functions for query FEC capabilities and
   current FEC mode from device. Also, API for configuring FEC mode is also provided.
 
+* **Introduced extended buffer description for receiving.**
+
+  Added the extended Rx buffer description for Rx queue setup routine
+  providing the individual settings for each Rx segment with maximal size,
+  buffer offset and memory pool to allocate data buffers from.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
@@ -253,6 +259,9 @@ API Changes
   As the data of ``uint8_t`` will be truncated when queue number under
   a TC is greater than 256.
 
+* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
+  to provide extended description of the receiving buffer.
+
 * vhost: Moved vDPA APIs from experimental to stable.
 
 * rawdev: Added a structure size parameter to the functions
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 892c246..96ecb91 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
 #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_RX_OFFLOAD_##_name, #_name }
 
+#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
+	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
+
 static const struct {
 	uint64_t offload;
 	const char *name;
@@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
 	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
 	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
+#undef RTE_ETH_RX_OFFLOAD_BIT2STR
 
 #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
 	{ DEV_TX_OFFLOAD_##_name, #_name }
@@ -1784,38 +1789,94 @@ struct rte_eth_dev *
 		return -EINVAL;
 	}
 
-	if (mp == NULL) {
-		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
-		return -EINVAL;
-	}
-
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
 
-	/*
-	 * Check the size of the mbuf data buffer.
-	 * This value must be provided in the private data of the memory pool.
-	 * First check that the memory pool has a valid private data.
-	 */
 	ret = rte_eth_dev_info_get(port_id, &dev_info);
 	if (ret != 0)
 		return ret;
 
-	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
-		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
-			mp->name, (int)mp->private_data_size,
-			(int)sizeof(struct rte_pktmbuf_pool_private));
-		return -ENOSPC;
-	}
-	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+	if (mp) {
+		/* Single pool configuration check. */
+		if (rx_conf->rx_seg || rx_conf->rx_nseg) {
+			RTE_ETHDEV_LOG(ERR,
+				       "Ambiguous segment configuration\n");
+			return -EINVAL;
+		}
+		/*
+		 * Check the size of the mbuf data buffer, this value
+		 * must be provided in the private data of the memory pool.
+		 * First check that the memory pool(s) has a valid private data.
+		 */
+		if (mp->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+			RTE_ETHDEV_LOG(ERR, "%s private_data_size %u < %u\n",
+				mp->name, mp->private_data_size,
+				(unsigned int)
+				sizeof(struct rte_pktmbuf_pool_private));
+			return -ENOSPC;
+		}
+		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
+		if (mbp_buf_size < dev_info.min_rx_bufsize +
+				   RTE_PKTMBUF_HEADROOM) {
+			RTE_ETHDEV_LOG(ERR,
+				       "%s mbuf_data_room_size %u < %u"
+				       " (RTE_PKTMBUF_HEADROOM=%u +"
+				       " min_rx_bufsize(dev)=%u)\n",
+				       mp->name, mbp_buf_size,
+				       RTE_PKTMBUF_HEADROOM +
+				       dev_info.min_rx_bufsize,
+				       RTE_PKTMBUF_HEADROOM,
+				       dev_info.min_rx_bufsize);
+			return -EINVAL;
+		}
+	} else {
+		const struct rte_eth_rxseg *rx_seg = rx_conf->rx_seg;
+		uint16_t n_seg = rx_conf->rx_nseg;
+		uint16_t seg_idx;
 
-	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
-		RTE_ETHDEV_LOG(ERR,
-			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
-			mp->name, (int)mbp_buf_size,
-			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
-			(int)RTE_PKTMBUF_HEADROOM,
-			(int)dev_info.min_rx_bufsize);
-		return -EINVAL;
+		/* Extended multi-segment configuration check. */
+		if (!rx_conf->rx_seg || !rx_conf->rx_nseg) {
+			RTE_ETHDEV_LOG(ERR,
+				       "Memory pool is null and no"
+				       " extended configuration provided\n");
+			return -EINVAL;
+		}
+		/*
+		 * Check the sizes and offsets against buffer sizes
+		 * for each segment specified in extended configuration.
+		 */
+		for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
+			struct rte_mempool *mpl = rx_seg[seg_idx].mp;
+			uint32_t length = rx_seg[seg_idx].length;
+			uint32_t offset = rx_seg[seg_idx].offset;
+			uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
+
+			if (mpl == NULL) {
+				RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
+				return -EINVAL;
+			}
+			if (mpl->private_data_size <
+				sizeof(struct rte_pktmbuf_pool_private)) {
+				RTE_ETHDEV_LOG(ERR,
+					       "%s private_data_size %u < %u\n",
+					       mpl->name,
+					       mpl->private_data_size,
+					       (unsigned int)sizeof(struct
+					       rte_pktmbuf_pool_private));
+				return -ENOSPC;
+			}
+			mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
+			length = length ? length : (mbp_buf_size - head_room);
+			if (mbp_buf_size < length + offset + head_room) {
+				RTE_ETHDEV_LOG(ERR,
+					"%s mbuf_data_room_size %u < %u"
+					" (segment length=%u +"
+					" segment offset=%u)\n",
+					mpl->name, mbp_buf_size,
+					length + offset, length, offset);
+				return -EINVAL;
+			}
+		}
 	}
 
 	/* Use default specified by driver, if nb_rx_desc is zero */
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 5bcfbb8..e019f4a 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -970,6 +970,16 @@ struct rte_eth_txmode {
 };
 
 /**
+ * A structure used to configure an RX packet segment to split.
+ */
+struct rte_eth_rxseg {
+	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
+	uint16_t length; /**< Segment data length, configures split point. */
+	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
+	uint32_t reserved; /**< Reserved field. */
+};
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
@@ -977,6 +987,43 @@ struct rte_eth_rxconf {
 	uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
 	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
 	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
+	/**
+	 * Points to the array of segment descriptions. Each array element
+	 * describes the properties for each segment in the receiving
+	 * buffer.
+	 *
+	 * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field,
+	 * the PMD will split the received packets into multiple segments
+	 * according to the specification in the description array:
+	 *
+	 * - the first network buffer will be allocated from the memory pool,
+	 *   specified in the first array element, the second buffer, from the
+	 *   pool in the second element, and so on.
+	 *
+	 * - the offsets from the segment description elements specify
+	 *   the data offset from the buffer beginning except the first mbuf.
+	 *   For this one the offset is added with RTE_PKTMBUF_HEADROOM.
+	 *
+	 * - the lengthes in the elements define the maximal data amount
+	 *   being received to each segment. The receiving starts with filling
+	 *   up the first mbuf data buffer up to specified length. If the
+	 *   there are data remaining (packet is longer than buffer in the first
+	 *   mbuf) the following data will be pushed to the next segment
+	 *   up to its own length, and so on.
+	 *
+	 * - If the length in the segment description element is zero
+	 *   the actual buffer size will be deduced from the appropriate
+	 *   memory pool properties.
+	 *
+	 * - if there is not enough elements to describe the buffer for entire
+	 *   packet of maximal length the following parameters will be used
+	 *   for the all remaining segments:
+	 *     - pool from the last valid element
+	 *     - the buffer size from this pool
+	 *     - zero offset
+	 */
+	struct rte_eth_rxseg *rx_seg;
 	/**
 	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
 	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
@@ -1260,6 +1307,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
+#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 				 DEV_RX_OFFLOAD_UDP_CKSUM | \
@@ -2027,9 +2075,21 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
  *   No need to repeat any bit in rx_conf->offloads which has already been
  *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
  *   at port level can't be disabled at queue level.
+ *   The configuration structure also contains the pointer to the array
+ *   of the receiving buffer segment descriptions, see rx_seg and rx_nseg
+ *   fields, this extended configuration might be used by split offloads like
+ *   RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT. If mp_pool is not NULL,
+ *   the extended configuration fields must be set to NULL and zero.
  * @param mb_pool
  *   The pointer to the memory pool from which to allocate *rte_mbuf* network
- *   memory buffers to populate each descriptor of the receive ring.
+ *   memory buffers to populate each descriptor of the receive ring. There are
+ *   two options to provide Rx buffer configuration:
+ *   - single pool:
+ *     mb_pool is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is 0.
+ *   - multiple segments description:
+ *     mb_pool is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not 0.
+ *     Taken only if flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is set in offloads.
+ *
  * @return
  *   - 0: Success, receive queue correctly set up.
  *   - -EIO: if device is removed.
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 2/6] app/testpmd: add multiple pools per core creation
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
@ 2020-10-14 18:11   ` Viacheslav Ovsiienko
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 3/6] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:11 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

The command line parameter --mbuf-size is updated, it can handle
the multiple values like the following:

--mbuf-size=2176,512,768,4096

specifying the creation the extra memory pools with the requested
mbuf data buffer sizes. If some buffer split feature is engaged
the extra memory pools can be used to configure the Rx queues
with rte_the_dev_rx_queue_setup_ex().

The extra pools are created with requested sizes, and pool names
are assigned with appended index: mbuf_pool_socket_%socket_%index.
Index zero is used to specify the first mandatory pool to maintain
compatibility with existing code.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/bpf_cmd.c                |  4 +--
 app/test-pmd/cmdline.c                |  2 +-
 app/test-pmd/config.c                 |  6 ++--
 app/test-pmd/parameters.c             | 24 +++++++++----
 app/test-pmd/testpmd.c                | 63 +++++++++++++++++++----------------
 app/test-pmd/testpmd.h                | 24 ++++++++++---
 doc/guides/testpmd_app_ug/run_app.rst |  7 ++--
 7 files changed, 83 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
index 16e3c3b..0a1a178 100644
--- a/app/test-pmd/bpf_cmd.c
+++ b/app/test-pmd/bpf_cmd.c
@@ -69,7 +69,7 @@ struct cmd_bpf_ld_result {
 
 	*flags = RTE_BPF_ETH_F_NONE;
 	arg->type = RTE_BPF_ARG_PTR;
-	arg->size = mbuf_data_size;
+	arg->size = mbuf_data_size[0];
 
 	for (i = 0; str[i] != 0; i++) {
 		v = toupper(str[i]);
@@ -78,7 +78,7 @@ struct cmd_bpf_ld_result {
 		else if (v == 'M') {
 			arg->type = RTE_BPF_ARG_PTR_MBUF;
 			arg->size = sizeof(struct rte_mbuf);
-			arg->buf_size = mbuf_data_size;
+			arg->buf_size = mbuf_data_size[0];
 		} else if (v == '-')
 			continue;
 		else
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 273fb1a..a585cf0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2907,7 +2907,7 @@ struct cmd_setup_rxtx_queue {
 		if (!numa_support || socket_id == NUMA_NO_CONFIG)
 			socket_id = port->socket_id;
 
-		mp = mbuf_pool_find(socket_id);
+		mp = mbuf_pool_find(socket_id, 0);
 		if (mp == NULL) {
 			printf("Failed to setup RX queue: "
 				"No mempool allocation"
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index d4be694..5f501f6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -690,7 +690,7 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 	printf("\nConnect to socket: %u", port->socket_id);
 
 	if (port_numa[port_id] != NUMA_NO_CONFIG) {
-		mp = mbuf_pool_find(port_numa[port_id]);
+		mp = mbuf_pool_find(port_numa[port_id], 0);
 		if (mp)
 			printf("\nmemory allocation on the socket: %d",
 							port_numa[port_id]);
@@ -3352,9 +3352,9 @@ struct igb_ring_desc_16_bytes {
 	 */
 	tx_pkt_len = 0;
 	for (i = 0; i < nb_segs; i++) {
-		if (seg_lengths[i] > (unsigned) mbuf_data_size) {
+		if (seg_lengths[i] > mbuf_data_size[0]) {
 			printf("length[%u]=%u > mbuf_data_size=%u - give up\n",
-			       i, seg_lengths[i], (unsigned) mbuf_data_size);
+			       i, seg_lengths[i], mbuf_data_size[0]);
 			return;
 		}
 		tx_pkt_len = (uint16_t)(tx_pkt_len + seg_lengths[i]);
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 15ce8c1..4db4987 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -106,7 +106,9 @@
 	       "(flag: 1 for RX; 2 for TX; 3 for RX and TX).\n");
 	printf("  --socket-num=N: set socket from which all memory is allocated "
 	       "in NUMA mode.\n");
-	printf("  --mbuf-size=N: set the data size of mbuf to N bytes.\n");
+	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
+	       "N bytes. If multiple numbers are specified the extra pools "
+	       "will be created to receive with packet split features\n");
 	printf("  --total-num-mbufs=N: set the number of mbufs to be allocated "
 	       "in mbuf pools.\n");
 	printf("  --max-pkt-len=N: set the maximum size of packet to N bytes.\n");
@@ -892,12 +894,22 @@
 				}
 			}
 			if (!strcmp(lgopts[opt_idx].name, "mbuf-size")) {
-				n = atoi(optarg);
-				if (n > 0 && n <= 0xFFFF)
-					mbuf_data_size = (uint16_t) n;
-				else
+				unsigned int mb_sz[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs, i;
+
+				nb_segs = parse_item_list(optarg, "mbuf-size",
+					MAX_SEGS_BUFFER_SPLIT, mb_sz, 0);
+				if (nb_segs <= 0)
 					rte_exit(EXIT_FAILURE,
-						 "mbuf-size should be > 0 and < 65536\n");
+						 "bad mbuf-size\n");
+				for (i = 0; i < nb_segs; i++) {
+					if (mb_sz[i] <= 0 || mb_sz[i] > 0xFFFF)
+						rte_exit(EXIT_FAILURE,
+							 "mbuf-size should be "
+							 "> 0 and < 65536\n");
+					mbuf_data_size[i] = (uint16_t) mb_sz[i];
+				}
+				mbuf_data_size_n = nb_segs;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "total-num-mbufs")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c..7e6ef80 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -186,7 +186,7 @@ struct fwd_engine * fwd_engines[] = {
 	NULL,
 };
 
-struct rte_mempool *mempools[RTE_MAX_NUMA_NODES];
+struct rte_mempool *mempools[RTE_MAX_NUMA_NODES * MAX_SEGS_BUFFER_SPLIT];
 uint16_t mempool_flags;
 
 struct fwd_config cur_fwd_config;
@@ -195,7 +195,10 @@ struct fwd_engine * fwd_engines[] = {
 uint32_t burst_tx_delay_time = BURST_TX_WAIT_US;
 uint32_t burst_tx_retry_num = BURST_TX_RETRIES;
 
-uint16_t mbuf_data_size = DEFAULT_MBUF_DATA_SIZE; /**< Mbuf data space size. */
+uint32_t mbuf_data_size_n = 1; /* Number of specified mbuf sizes. */
+uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT] = {
+	DEFAULT_MBUF_DATA_SIZE
+}; /**< Mbuf data space size. */
 uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
                                       * specified on command-line. */
 uint16_t stats_period; /**< Period to show statistics (disabled by default) */
@@ -955,14 +958,14 @@ struct extmem_param {
  */
 static struct rte_mempool *
 mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
-		 unsigned int socket_id)
+		 unsigned int socket_id, unsigned int size_idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 	struct rte_mempool *rte_mp = NULL;
 	uint32_t mb_size;
 
 	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
-	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name), size_idx);
 
 	TESTPMD_LOG(INFO,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
@@ -1485,8 +1488,8 @@ struct extmem_param {
 				port->dev_info.rx_desc_lim.nb_mtu_seg_max;
 
 			if ((data_size + RTE_PKTMBUF_HEADROOM) >
-							mbuf_data_size) {
-				mbuf_data_size = data_size +
+							mbuf_data_size[0]) {
+				mbuf_data_size[0] = data_size +
 						 RTE_PKTMBUF_HEADROOM;
 				warning = 1;
 			}
@@ -1494,9 +1497,9 @@ struct extmem_param {
 	}
 
 	if (warning)
-		TESTPMD_LOG(WARNING, "Configured mbuf size %hu\n",
-			    mbuf_data_size);
-
+		TESTPMD_LOG(WARNING,
+			    "Configured mbuf size of the first segment %hu\n",
+			    mbuf_data_size[0]);
 	/*
 	 * Create pools of mbuf.
 	 * If NUMA support is disabled, create a single pool of mbuf in
@@ -1516,21 +1519,23 @@ struct extmem_param {
 	}
 
 	if (numa_support) {
-		uint8_t i;
+		uint8_t i, j;
 
 		for (i = 0; i < num_sockets; i++)
-			mempools[i] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool,
-						       socket_ids[i]);
+			for (j = 0; j < mbuf_data_size_n; j++)
+				mempools[i * MAX_SEGS_BUFFER_SPLIT + j] =
+					mbuf_pool_create(mbuf_data_size[j],
+							  nb_mbuf_per_pool,
+							  socket_ids[i], j);
 	} else {
-		if (socket_num == UMA_NO_CONFIG)
-			mempools[0] = mbuf_pool_create(mbuf_data_size,
-						       nb_mbuf_per_pool, 0);
-		else
-			mempools[socket_num] = mbuf_pool_create
-							(mbuf_data_size,
-							 nb_mbuf_per_pool,
-							 socket_num);
+		uint8_t i;
+
+		for (i = 0; i < mbuf_data_size_n; i++)
+			mempools[i] = mbuf_pool_create
+					(mbuf_data_size[i],
+					 nb_mbuf_per_pool,
+					 socket_num == UMA_NO_CONFIG ?
+					 0 : socket_num, i);
 	}
 
 	init_port_config();
@@ -1542,10 +1547,10 @@ struct extmem_param {
 	 */
 	for (lc_id = 0; lc_id < nb_lcores; lc_id++) {
 		mbp = mbuf_pool_find(
-			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]));
+			rte_lcore_to_socket_id(fwd_lcores_cpuids[lc_id]), 0);
 
 		if (mbp == NULL)
-			mbp = mbuf_pool_find(0);
+			mbp = mbuf_pool_find(0, 0);
 		fwd_lcores[lc_id]->mbp = mbp;
 		/* initialize GSO context */
 		fwd_lcores[lc_id]->gso_ctx.direct_pool = mbp;
@@ -2498,7 +2503,8 @@ struct extmem_param {
 				if ((numa_support) &&
 					(rxring_numa[pi] != NUMA_NO_CONFIG)) {
 					struct rte_mempool * mp =
-						mbuf_pool_find(rxring_numa[pi]);
+						mbuf_pool_find
+							(rxring_numa[pi], 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2514,7 +2520,8 @@ struct extmem_param {
 					     mp);
 				} else {
 					struct rte_mempool *mp =
-						mbuf_pool_find(port->socket_id);
+						mbuf_pool_find
+							(port->socket_id, 0);
 					if (mp == NULL) {
 						printf("Failed to setup RX queue:"
 							"No mempool allocation"
@@ -2909,13 +2916,13 @@ struct extmem_param {
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	unsigned int i;
 	int ret;
-	int i;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
 
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i]) {
 			if (mp_alloc_type == MP_ALLOC_ANON)
 				rte_mempool_mem_iter(mempools[i], dma_unmap_cb,
@@ -2959,7 +2966,7 @@ struct extmem_param {
 			return;
 		}
 	}
-	for (i = 0 ; i < RTE_MAX_NUMA_NODES ; i++) {
+	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
 		if (mempools[i])
 			rte_mempool_free(mempools[i]);
 	}
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9a29d7a..b42d710 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -42,6 +42,13 @@
  */
 #define RTE_MAX_SEGS_PER_PKT 255 /**< nb_segs is a 8-bit unsigned char. */
 
+/*
+ * The maximum number of segments per packet is used to configure
+ * buffer split feature, also specifies the maximum amount of
+ * optional Rx pools to allocate mbufs to split.
+ */
+#define MAX_SEGS_BUFFER_SPLIT 8 /**< nb_segs is a 8-bit unsigned char. */
+
 #define MAX_PKT_BURST 512
 #define DEF_PKT_BURST 32
 
@@ -393,7 +400,9 @@ struct queue_stats_mappings {
 extern uint8_t dcb_config;
 extern uint8_t dcb_test;
 
-extern uint16_t mbuf_data_size; /**< Mbuf data space size. */
+extern uint32_t mbuf_data_size_n;
+extern uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
+/**< Mbuf data space size. */
 extern uint32_t param_total_num_mbufs;
 
 extern uint16_t stats_period;
@@ -605,17 +614,22 @@ struct mplsoudp_decap_conf {
 
 /* Mbuf Pools */
 static inline void
-mbuf_poolname_build(unsigned int sock_id, char* mp_name, int name_size)
+mbuf_poolname_build(unsigned int sock_id, char *mp_name,
+		    int name_size, unsigned int idx)
 {
-	snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	if (!idx)
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u", sock_id);
+	else
+		snprintf(mp_name, name_size, "mbuf_pool_socket_%u_%u",
+			 sock_id, idx);
 }
 
 static inline struct rte_mempool *
-mbuf_pool_find(unsigned int sock_id)
+mbuf_pool_find(unsigned int sock_id, unsigned int idx)
 {
 	char pool_name[RTE_MEMPOOL_NAMESIZE];
 
-	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name));
+	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
 	return rte_mempool_lookup((const char *)pool_name);
 }
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index ec085c2..1eb0a10 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -107,9 +107,12 @@ The command line options are:
     Set the socket from which all memory is allocated in NUMA mode,
     where 0 <= N < number of sockets on the board.
 
-*   ``--mbuf-size=N``
+*   ``--mbuf-size=N[,N1[,...Nn]``
 
-    Set the data size of the mbufs used to N bytes, where N < 65536. The default value is 2048.
+    Set the data size of the mbufs used to N bytes, where N < 65536.
+    The default value is 2048. If multiple mbuf-size values are specified the
+    extra memory pools will be created for allocating mbufs to receive packets
+    with buffer splittling features.
 
 *   ``--total-num-mbufs=N``
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 3/6] app/testpmd: add buffer split offload configuration
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 2/6] app/testpmd: add multiple pools per core creation Viacheslav Ovsiienko
@ 2020-10-14 18:11   ` Viacheslav Ovsiienko
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 4/6] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:11 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

This patch add support for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
providing per queue configuration for this offload.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 21 +++++++++++----------
 app/test-pmd/config.c  |  9 +++++++++
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a585cf0..fa71039 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -883,16 +883,16 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"port config <port_id> rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"     Enable or disable a per port Rx offloading"
 			" on all Rx queues of a port\n\n"
 
 			"port (port_id) rxq (queue_id) rx_offload vlan_strip|"
 			"ipv4_cksum|udp_cksum|tcp_cksum|tcp_lro|qinq_strip|"
 			"outer_ipv4_cksum|macsec_strip|header_split|"
-			"vlan_filter|vlan_extend|jumbo_frame|"
-			"scatter|timestamp|security|keep_crc on|off\n"
+			"vlan_filter|vlan_extend|jumbo_frame|scatter|"
+			"buffer_split|timestamp|security|keep_crc on|off\n"
 			"    Enable or disable a per queue Rx offloading"
 			" only on a specific Rx queue\n\n"
 
@@ -18417,7 +18417,8 @@ struct cmd_config_per_port_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc#rss_hash");
+			   "scatter#buffer_split#timestamp#security#"
+			   "keep_crc#rss_hash");
 cmdline_parse_token_string_t cmd_config_per_port_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_rx_offload_result,
@@ -18497,8 +18498,8 @@ struct cmd_config_per_port_rx_offload_result {
 	.help_str = "port config <port_id> rx_offload vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc|rss_hash "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc|rss_hash on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_rx_offload_result_port,
 		(void *)&cmd_config_per_port_rx_offload_result_config,
@@ -18547,7 +18548,7 @@ struct cmd_config_per_queue_rx_offload_result {
 		 offload, "vlan_strip#ipv4_cksum#udp_cksum#tcp_cksum#tcp_lro#"
 			   "qinq_strip#outer_ipv4_cksum#macsec_strip#"
 			   "header_split#vlan_filter#vlan_extend#jumbo_frame#"
-			   "scatter#timestamp#security#keep_crc");
+			   "scatter#buffer_split#timestamp#security#keep_crc");
 cmdline_parse_token_string_t cmd_config_per_queue_rx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_queue_rx_offload_result,
@@ -18603,8 +18604,8 @@ struct cmd_config_per_queue_rx_offload_result {
 		    "vlan_strip|ipv4_cksum|"
 		    "udp_cksum|tcp_cksum|tcp_lro|qinq_strip|outer_ipv4_cksum|"
 		    "macsec_strip|header_split|vlan_filter|vlan_extend|"
-		    "jumbo_frame|scatter|timestamp|security|keep_crc "
-		    "on|off",
+		    "jumbo_frame|scatter|buffer_split|timestamp|security|"
+		    "keep_crc on|off",
 	.tokens = {
 		(void *)&cmd_config_per_queue_rx_offload_result_port,
 		(void *)&cmd_config_per_queue_rx_offload_result_port_id,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 5f501f6..7126d91 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1092,6 +1092,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
+		printf("RX offload buffer split:       ");
+		if (ports[port_id].dev_conf.rxmode.offloads &
+		    RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT) {
 		printf("VLAN insert:                   ");
 		if (ports[port_id].dev_conf.txmode.offloads &
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 4/6] app/testpmd: add rxpkts commands and parameters
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 3/6] app/testpmd: add buffer split offload configuration Viacheslav Ovsiienko
@ 2020-10-14 18:12   ` Viacheslav Ovsiienko
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 5/6] app/testpmd: add rxoffs " Viacheslav Ovsiienko
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 6/6] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:12 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxpkts=X[,Y]

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only).

Add interactive mode command:

testpmd> set rxpkts (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Sets the length of segments to scatter packets on receiving if split
feature is engaged. Affects only the queues configured with split
offloads (currently BUFFER_SPLIT is supported only). Optionally the
multiple memory pools can be specified with --mbuf-size command line
parameter and the mbufs to receive will be allocated sequentially
from these extra memory pools.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 48 ++++++++++++++++++++++-
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  7 ++++
 app/test-pmd/testpmd.h                      | 11 +++++-
 doc/guides/testpmd_app_ug/run_app.rst       |  9 +++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +++++++++-
 7 files changed, 165 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index fa71039..d8dba54 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxpkts (x[,y]*)\n"
+			"    Set the length of each segment to scatter"
+			" packets on receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set txpkts (x[,y]*)\n"
 			"    Set the length of each segment of TXONLY"
 			" and optionally CSUM packets.\n\n"
@@ -3889,6 +3895,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxpkts_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxpkts;
+	cmdline_fixed_string_t seg_lengths;
+};
+
+static void
+cmd_set_rxpkts_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxpkts_result *res;
+	unsigned int seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_lengths, "segment lengths",
+				  MAX_SEGS_BUFFER_SPLIT, seg_lengths, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_segments(seg_lengths, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxpkts_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxpkts_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 rxpkts, "rxpkts");
+cmdline_parse_token_string_t cmd_set_rxpkts_lengths =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxpkts_result,
+				 seg_lengths, NULL);
+
+cmdline_parse_inst_t cmd_set_rxpkts = {
+	.f = cmd_set_rxpkts_parsed,
+	.data = NULL,
+	.help_str = "set rxpkts <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxpkts_keyword,
+		(void *)&cmd_set_rxpkts_name,
+		(void *)&cmd_set_rxpkts_lengths,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF TXONLY PACKETS *** */
 
 struct cmd_set_txpkts_result {
@@ -7517,6 +7569,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxpkts"))
+		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
 	else if (!strcmp(res->what, "txtimes"))
@@ -7529,12 +7583,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19807,6 +19861,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
 	(cmdline_parse_inst_t *)&cmd_set_txtimes,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7126d91..24e9a7e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3300,6 +3300,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_segments(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_segs;
+	printf("Number of segments: %u\n", n);
+	if (n) {
+		printf("Segment sizes: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_lengths[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
+{
+	unsigned int i;
+
+	if (nb_segs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_segs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_segs; i++) {
+		if (seg_lengths[i] >= UINT16_MAX) {
+			printf("length[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_lengths[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_segs; i++)
+		rx_pkt_seg_lengths[i] = (uint16_t) seg_lengths[i];
+
+	rx_pkt_nb_segs = (uint8_t) nb_segs;
+}
+
+void
 show_tx_pkt_segments(void)
 {
 	uint32_t i, n;
@@ -3344,10 +3388,10 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
-set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
+set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs)
 {
 	uint16_t tx_pkt_len;
-	unsigned i;
+	unsigned int i;
 
 	if (nb_segs_is_invalid(nb_segs))
 		return;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 4db4987..e4e3635 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
 	printf("  --txonly-multi-flow: generate multiple flows in txonly mode\n");
@@ -662,6 +663,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
 		{ "disable-link-check",		0, 0, 0 },
@@ -1272,6 +1274,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
+				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_segs;
+
+				nb_segs = parse_item_list
+						(optarg, "rxpkt segments",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_len, 0);
+				if (nb_segs > 0)
+					set_rx_pkt_segments(seg_len, nb_segs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxpkts\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "txpkts")) {
 				unsigned seg_lengths[RTE_MAX_SEGS_PER_PKT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 7e6ef80..f88c1e2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -210,6 +210,13 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t f_quit;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 uint16_t tx_pkt_length = TXONLY_DEF_PACKET_LEN; /**< TXONLY packet length. */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index b42d710..8e5ba6a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -420,6 +420,13 @@ struct queue_stats_mappings {
 extern struct rte_fdir_conf fdir_conf;
 
 /*
+ * Configuration of packet segments used to scatter received packets
+ * if some of split features is configured.
+ */
+extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+
+/*
  * Configuration of packet segments used by the "txonly" processing engine.
  */
 #define TXONLY_DEF_PACKET_LEN 64
@@ -816,7 +823,9 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_record_core_cycles(uint8_t on_off);
 void set_record_burst_stats(uint8_t on_off);
 void set_verbose_level(uint16_t vb_level);
-void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
+void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
+void show_rx_pkt_segments(void);
+void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
 void show_tx_pkt_times(void);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1eb0a10..463b76c 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,15 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxpkts=X[,Y]``
+
+    Set the length of segments to scatter packets on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+    Optionally the multiple memory pools can be specified with --mbuf-size
+    command line parameter and the mbufs to receive will be allocated
+    sequentially from these extra memory pools.
+
 *   ``--txpkts=X[,Y]``
 
     Set TX segment sizes or total packet length. Valid for ``tx-only``
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 795c739..ff88762 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxpkts``: Packets to RX split configuration.
+
 * ``txpkts``: Packets to TX configuration.
 
 * ``txtimes``: Burst time pattern for Tx only mode.
@@ -774,6 +776,23 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxpkts
+~~~~~~~~~~
+
+Set the length of segments to scatter packets on receiving if split
+feature is engaged. Affects only the queues configured with split offloads
+(currently BUFFER_SPLIT is supported only). Optionally the multiple memory
+pools can be specified with --mbuf-size command line parameter and the mbufs
+to receive will be allocated sequentially from these extra memory pools (the
+mbuf for the first segment is allocated from the first pool, the second one
+from the second pool, and so on, if segment number is greater then pool's the
+mbuf for remaining segments will be allocated from the last valid pool).
+
+   testpmd> set rxpkts (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. Zero value
+means to use the corresponding memory pool data buffer size.
+
 set txpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 5/6] app/testpmd: add rxoffs commands and parameters
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 4/6] app/testpmd: add rxpkts commands and parameters Viacheslav Ovsiienko
@ 2020-10-14 18:12   ` " Viacheslav Ovsiienko
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 6/6] app/testpmd: add extended Rx queue setup Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:12 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

Add command line parameter:

--rxoffs=X[,Y]

Sets the offsets of packet segments from the beginning of the
receiving buffer if split feature is engaged. Affects only the
queues configured with split offloads (currently BUFFER_SPLIT
is supported only).

Add interactive mode command, providing the same:

testpmd> set rxoffs (x[,y]*)

Where x[,y]* represents a CSV list of values, without white space.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c                      | 61 +++++++++++++++++++++++++++--
 app/test-pmd/config.c                       | 44 +++++++++++++++++++++
 app/test-pmd/parameters.c                   | 15 +++++++
 app/test-pmd/testpmd.c                      |  2 +
 app/test-pmd/testpmd.h                      |  4 ++
 doc/guides/testpmd_app_ug/run_app.rst       |  6 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 17 +++++++-
 7 files changed, 145 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d8dba54..7182bba 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -183,7 +183,7 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"show (rxq|txq) info (port_id) (queue_id)\n"
 			"    Display information for configured RX/TX queue.\n\n"
 
-			"show config (rxtx|cores|fwd|rxpkts|txpkts)\n"
+			"show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts)\n"
 			"    Display the given configuration.\n\n"
 
 			"read rxd (port_id) (queue_id) (rxd_id)\n"
@@ -294,6 +294,12 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"    Set the transmit delay time and number of retries,"
 			" effective when retry is enabled.\n\n"
 
+			"set rxoffs (x[,y]*)\n"
+			"    Set the offset of each packet segment on"
+			" receiving if split feature is engaged."
+			" Affects only the queues configured with split"
+			" offloads.\n\n"
+
 			"set rxpkts (x[,y]*)\n"
 			"    Set the length of each segment to scatter"
 			" packets on receiving if split feature is engaged."
@@ -3895,6 +3901,52 @@ struct cmd_set_log_result {
 	},
 };
 
+/* *** SET SEGMENT OFFSETS OF RX PACKETS SPLIT *** */
+
+struct cmd_set_rxoffs_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t rxoffs;
+	cmdline_fixed_string_t seg_offsets;
+};
+
+static void
+cmd_set_rxoffs_parsed(void *parsed_result,
+		      __rte_unused struct cmdline *cl,
+		      __rte_unused void *data)
+{
+	struct cmd_set_rxoffs_result *res;
+	unsigned int seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+	unsigned int nb_segs;
+
+	res = parsed_result;
+	nb_segs = parse_item_list(res->seg_offsets, "segment offsets",
+				  MAX_SEGS_BUFFER_SPLIT, seg_offsets, 0);
+	if (nb_segs > 0)
+		set_rx_pkt_offsets(seg_offsets, nb_segs);
+}
+
+cmdline_parse_token_string_t cmd_set_rxoffs_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_rxoffs_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 rxoffs, "rxoffs");
+cmdline_parse_token_string_t cmd_set_rxoffs_offsets =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_rxoffs_result,
+				 seg_offsets, NULL);
+
+cmdline_parse_inst_t cmd_set_rxoffs = {
+	.f = cmd_set_rxoffs_parsed,
+	.data = NULL,
+	.help_str = "set rxoffs <len0[,len1]*>",
+	.tokens = {
+		(void *)&cmd_set_rxoffs_keyword,
+		(void *)&cmd_set_rxoffs_name,
+		(void *)&cmd_set_rxoffs_offsets,
+		NULL,
+	},
+};
+
 /* *** SET SEGMENT LENGTHS OF RX PACKETS SPLIT *** */
 
 struct cmd_set_rxpkts_result {
@@ -7569,6 +7621,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		fwd_lcores_config_display();
 	else if (!strcmp(res->what, "fwd"))
 		pkt_fwd_config_display(&cur_fwd_config);
+	else if (!strcmp(res->what, "rxoffs"))
+		show_rx_pkt_offsets();
 	else if (!strcmp(res->what, "rxpkts"))
 		show_rx_pkt_segments();
 	else if (!strcmp(res->what, "txpkts"))
@@ -7583,12 +7637,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#rxpkts#txpkts#txtimes");
+				 "rxtx#cores#fwd#rxoffs#rxpkts#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|rxpkts|txpkts|txtimes",
+	.help_str = "show config rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -19861,6 +19915,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_reset,
 	(cmdline_parse_inst_t *)&cmd_set_numbers,
 	(cmdline_parse_inst_t *)&cmd_set_log,
+	(cmdline_parse_inst_t *)&cmd_set_rxoffs,
 	(cmdline_parse_inst_t *)&cmd_set_rxpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 24e9a7e..43b8fb6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -3300,6 +3300,50 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_rx_pkt_offsets(void)
+{
+	uint32_t i, n;
+
+	n = rx_pkt_nb_offs;
+	printf("Number of offsets: %u\n", n);
+	if (n) {
+		printf("Segment offsets: ");
+		for (i = 0; i != n - 1; i++)
+			printf("%hu,", rx_pkt_seg_offsets[i]);
+		printf("%hu\n", rx_pkt_seg_lengths[i]);
+	}
+}
+
+void
+set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs)
+{
+	unsigned int i;
+
+	if (nb_offs >= MAX_SEGS_BUFFER_SPLIT) {
+		printf("nb segments per RX packets=%u >= "
+		       "MAX_SEGS_BUFFER_SPLIT - ignored\n", nb_offs);
+		return;
+	}
+
+	/*
+	 * No extra check here, the segment length will be checked by PMD
+	 * in the extended queue setup.
+	 */
+	for (i = 0; i < nb_offs; i++) {
+		if (seg_offsets[i] >= UINT16_MAX) {
+			printf("offset[%u]=%u > UINT16_MAX - give up\n",
+			       i, seg_offsets[i]);
+			return;
+		}
+	}
+
+	for (i = 0; i < nb_offs; i++)
+		rx_pkt_seg_offsets[i] = (uint16_t) seg_offsets[i];
+
+	rx_pkt_nb_offs = (uint8_t) nb_offs;
+}
+
+void
 show_rx_pkt_segments(void)
 {
 	uint32_t i, n;
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e4e3635..2298ba5 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -184,6 +184,7 @@
 	       "(0 <= mapping <= %d).\n", RTE_ETHDEV_QUEUE_STAT_CNTRS - 1);
 	printf("  --no-flush-rx: Don't flush RX streams before forwarding."
 	       " Used mainly with PCAP drivers.\n");
+	printf("  --rxoffs=X[,Y]*: set RX segment offsets for split.\n");
 	printf("  --rxpkts=X[,Y]*: set RX segment sizes to split.\n");
 	printf("  --txpkts=X[,Y]*: set TX segment sizes"
 		" or total packet length.\n");
@@ -663,6 +664,7 @@
 		{ "rx-queue-stats-mapping",	1, 0, 0 },
 		{ "no-flush-rx",	0, 0, 0 },
 		{ "flow-isolate-all",	        0, 0, 0 },
+		{ "rxoffs",			1, 0, 0 },
 		{ "rxpkts",			1, 0, 0 },
 		{ "txpkts",			1, 0, 0 },
 		{ "txonly-multi-flow",		0, 0, 0 },
@@ -1274,6 +1276,19 @@
 						 "invalid RX queue statistics mapping config entered\n");
 				}
 			}
+			if (!strcmp(lgopts[opt_idx].name, "rxoffs")) {
+				unsigned int seg_off[MAX_SEGS_BUFFER_SPLIT];
+				unsigned int nb_offs;
+
+				nb_offs = parse_item_list
+						(optarg, "rxpkt offsets",
+						 MAX_SEGS_BUFFER_SPLIT,
+						 seg_off, 0);
+				if (nb_offs > 0)
+					set_rx_pkt_offsets(seg_off, nb_offs);
+				else
+					rte_exit(EXIT_FAILURE, "bad rxoffs\n");
+			}
 			if (!strcmp(lgopts[opt_idx].name, "rxpkts")) {
 				unsigned int seg_len[MAX_SEGS_BUFFER_SPLIT];
 				unsigned int nb_segs;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f88c1e2..580178d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -215,6 +215,8 @@ struct fwd_engine * fwd_engines[] = {
  */
 uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8e5ba6a..fc56b60 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -425,6 +425,8 @@ struct queue_stats_mappings {
  */
 extern uint16_t rx_pkt_seg_lengths[MAX_SEGS_BUFFER_SPLIT];
 extern uint8_t  rx_pkt_nb_segs; /**< Number of segments to split */
+extern uint16_t rx_pkt_seg_offsets[MAX_SEGS_BUFFER_SPLIT];
+extern uint8_t  rx_pkt_nb_offs; /**< Number of specified offsets */
 
 /*
  * Configuration of packet segments used by the "txonly" processing engine.
@@ -825,6 +827,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_rx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_rx_pkt_segments(void);
+void set_rx_pkt_offsets(unsigned int *seg_offsets, unsigned int nb_offs);
+void show_rx_pkt_offsets(void);
 void set_tx_pkt_segments(unsigned int *seg_lengths, unsigned int nb_segs);
 void show_tx_pkt_segments(void);
 void set_tx_pkt_times(unsigned int *tx_times);
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 463b76c..9b0a84a 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -361,6 +361,12 @@ The command line options are:
 
     Don't flush the RX streams before starting forwarding. Used mainly with the PCAP PMD.
 
+*   ``--rxoffs=X[,Y]``
+
+    Set the offsets of packet segments on receiving if split
+    feature is engaged. Affects only the queues configured
+    with split offloads (currently BUFFER_SPLIT is supported only).
+
 *   ``--rxpkts=X[,Y]``
 
     Set the length of segments to scatter packets on receiving if split
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index ff88762..c99d887 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -273,7 +273,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|rxpkts|txpkts|txtimes)
+   testpmd> show config (rxtx|cores|fwd|rxoffs|rxpkts|txpkts|txtimes)
 
 The available information categories are:
 
@@ -283,6 +283,8 @@ The available information categories are:
 
 * ``fwd``: Packet forwarding configuration.
 
+* ``rxoffs``: Packet offsets for RX split.
+
 * ``rxpkts``: Packets to RX split configuration.
 
 * ``txpkts``: Packets to TX configuration.
@@ -776,6 +778,19 @@ When retry is enabled, the transmit delay time and number of retries can also be
 
    testpmd> set burst tx delay (microseconds) retry (num)
 
+set rxoffs
+~~~~~~~~~~
+
+Set the offsets of segments relating to the data buffer beginning on receiving
+if split feature is engaged. Affects only the queues configured with split
+offloads (currently BUFFER_SPLIT is supported only).
+
+   testpmd> set rxoffs (x[,y]*)
+
+Where x[,y]* represents a CSV list of values, without white space. If the list
+of offsets is shorter than the list of segments the zero offsets will be used
+for the remaining segments.
+
 set rxpkts
 ~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v6 6/6] app/testpmd: add extended Rx queue setup
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-10-14 18:12   ` [dpdk-dev] [PATCH v6 5/6] app/testpmd: add rxoffs " Viacheslav Ovsiienko
@ 2020-10-14 18:12   ` Viacheslav Ovsiienko
  5 siblings, 0 replies; 172+ messages in thread
From: Viacheslav Ovsiienko @ 2020-10-14 18:12 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

If Rx queue is configured with split feature the extended
setup with specified segment sizes and pool will be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 app/test-pmd/cmdline.c | 12 ++++++------
 app/test-pmd/testpmd.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 app/test-pmd/testpmd.h |  5 +++++
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 7182bba..204221f 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2927,12 +2927,12 @@ struct cmd_setup_rxtx_queue {
 				rxring_numa[res->portid]);
 			return;
 		}
-		ret = rte_eth_rx_queue_setup(res->portid,
-					     res->qid,
-					     port->nb_rx_desc[res->qid],
-					     socket_id,
-					     &port->rx_conf[res->qid],
-					     mp);
+		ret = rx_queue_setup(res->portid,
+				     res->qid,
+				     port->nb_rx_desc[res->qid],
+				     socket_id,
+				     &port->rx_conf[res->qid],
+				     mp);
 		if (ret)
 			printf("Failed to setup RX queue\n");
 	} else {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 580178d..4c79570 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2414,6 +2414,50 @@ struct extmem_param {
 	return 0;
 }
 
+/* Configure the Rx with optional split. */
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp)
+{
+	struct rte_eth_rxseg rx_seg[MAX_SEGS_BUFFER_SPLIT] = {};
+	unsigned int i, mp_n;
+	int ret;
+
+	if (rx_pkt_nb_segs <= 1 ||
+	    (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) == 0) {
+		rx_conf->rx_seg = NULL;
+		rx_conf->rx_nseg = 0;
+		ret = rte_eth_rx_queue_setup(port_id, rx_queue_id,
+					     nb_rx_desc, socket_id,
+					     rx_conf, mp);
+		return ret;
+	}
+	for (i = 0; i < rx_pkt_nb_segs; i++) {
+		struct rte_mempool *mpx;
+		/*
+		 * Use last valid pool for the segments with number
+		 * exceeding the pool index.
+		 */
+		mp_n = (i > mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
+		mpx = mbuf_pool_find(socket_id, mp_n);
+		/* Handle zero as mbuf data buffer size. */
+		rx_seg[i].length = rx_pkt_seg_lengths[i] ?
+				   rx_pkt_seg_lengths[i] :
+				   mbuf_data_size[mp_n];
+		rx_seg[i].offset = i < rx_pkt_nb_offs ?
+				   rx_pkt_seg_offsets[i] : 0;
+		rx_seg[i].mp = mpx ? mpx : mp;
+	}
+	rx_conf->rx_nseg = rx_pkt_nb_segs;
+	rx_conf->rx_seg = rx_seg;
+	ret = rte_eth_rx_queue_setup(port_id, rx_queue_id, nb_rx_desc,
+				    socket_id, rx_conf, NULL);
+	rx_conf->rx_seg = NULL;
+	rx_conf->rx_nseg = 0;
+	return ret;
+}
+
 int
 start_port(portid_t pid)
 {
@@ -2522,7 +2566,7 @@ struct extmem_param {
 						return -1;
 					}
 
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     rxring_numa[pi],
 					     &(port->rx_conf[qi]),
@@ -2538,7 +2582,7 @@ struct extmem_param {
 							port->socket_id);
 						return -1;
 					}
-					diag = rte_eth_rx_queue_setup(pi, qi,
+					diag = rx_queue_setup(pi, qi,
 					     port->nb_rx_desc[qi],
 					     port->socket_id,
 					     &(port->rx_conf[qi]),
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fc56b60..af654ea 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -876,6 +876,11 @@ void port_rss_reta_info(portid_t port_id,
 
 void set_vf_traffic(portid_t port_id, uint8_t is_rx, uint16_t vf, uint8_t on);
 
+int
+rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
+	       uint16_t nb_rx_desc, unsigned int socket_id,
+	       struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp);
+
 int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
@ 2020-10-14 18:57     ` Jerin Jacob
  2020-10-15  7:43       ` Slava Ovsiienko
  2020-10-14 22:13     ` Thomas Monjalon
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 172+ messages in thread
From: Jerin Jacob @ 2020-10-14 18:57 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dpdk-dev, Thomas Monjalon, Stephen Hemminger, Ferruh Yigit,
	Olivier Matz, Maxime Coquelin, David Marchand, Andrew Rybchenko

On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
<viacheslavo@nvidia.com> wrote:
>
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
>
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
>
> The following structure is introduced to specify the Rx packet
> segment:
>
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
>                         configures "split point" */
>     uint16_t offset; /* data offset from beginning
>                         of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
>
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
>              This array allows to specify the different settings for
>              each segment in individual fashion.
>    rx_nseg - number of elements in the array
>
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
>
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
>
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
>
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
>
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
>
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
>
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B


Sorry for chime in late. This API lookout looks good to me.
But, I am wondering how the application can know the capability or "limits" of
struct rte_eth_rxseg structure for the specific PMD. The other
descriptor limit, it's being exposed with struct
rte_eth_dev_info::rx_desc_lim;
If PMD can support a specific pattern rather than returning the
blanket error, the application should know the limit.
IMO, it is better to add
struct rte_eth_rxseg *rxsegs;
unint16_t nb_max_rxsegs
in rte_eth_dev_info structure to express the capablity.
Where the en and offset can define the max offset.

Thoughts?

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
  2020-10-14 18:57     ` Jerin Jacob
@ 2020-10-14 22:13     ` Thomas Monjalon
  2020-10-14 22:50     ` Ajit Khaparde
  2020-10-15 10:11     ` Andrew Rybchenko
  3 siblings, 0 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-14 22:13 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

14/10/2020 20:11, Viacheslav Ovsiienko:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
> 		       	configures "split point" */
>     uint16_t offset; /* data offset from beginning
> 		       	of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
> 
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
> 	     This array allows to specify the different settings for
> 	     each segment in individual fashion.
>    rx_nseg - number of elements in the array
> 
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
> 
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
> 
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
> 
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
> 
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
> 
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if rx_nseg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

A large part of this commit log can be dropped because redundant
with the doxygen comments.

Acked-by: Thomas Monjalon <thomas@monjalon.net>



^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
  2020-10-14 18:57     ` Jerin Jacob
  2020-10-14 22:13     ` Thomas Monjalon
@ 2020-10-14 22:50     ` Ajit Khaparde
  2020-10-15 10:11     ` Andrew Rybchenko
  3 siblings, 0 replies; 172+ messages in thread
From: Ajit Khaparde @ 2020-10-14 22:50 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dpdk-dev, Thomas Monjalon, Stephen Hemminger, Ferruh Yigit,
	Olivier Matz, Jerin Jacob, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

On Wed, Oct 14, 2020 at 11:13 AM Viacheslav Ovsiienko
<viacheslavo@nvidia.com> wrote:
>
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
>
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
>
> The following structure is introduced to specify the Rx packet
> segment:
>
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
>                         configures "split point" */
>     uint16_t offset; /* data offset from beginning
>                         of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
>
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
>              This array allows to specify the different settings for
>              each segment in individual fashion.
>    rx_nseg - number of elements in the array
>
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
>
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
>
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
>
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
>
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
>
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
>
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
>
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
>
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
>
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
>
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if rx_nseg
> is greater than one).
>
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

> ---
>  doc/guides/nics/features.rst           |  15 +++++
>  doc/guides/rel_notes/deprecation.rst   |   5 --
>  doc/guides/rel_notes/release_20_11.rst |   9 +++
>  lib/librte_ethdev/rte_ethdev.c         | 111 +++++++++++++++++++++++++--------
>  lib/librte_ethdev/rte_ethdev.h         |  62 +++++++++++++++++-
>  5 files changed, 171 insertions(+), 31 deletions(-)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index dd8c955..832ea3b 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
>  * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
>
>
> +.. _nic_features_buffer_split:
> +
> +Buffer Split on Rx
> +------------------
> +
> +Scatters the packets being received on specified boundaries to segmented mbufs.
> +
> +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[uses]       rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``.
> +* **[implements] datapath**: ``Buffer Split functionality``.
> +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> +* **[related] API**: ``rte_eth_rx_queue_setup()``.
> +
> +
>  .. _nic_features_lro:
>
>  LRO
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 584e720..232cd54 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -138,11 +138,6 @@ Deprecation Notices
>    In 19.11 PMDs will still update the field even when the offload is not
>    enabled.
>
> -* ethdev: Add new fields to ``rte_eth_rxconf`` to configure the receiving
> -  queues to split ingress packets into multiple segments according to the
> -  specified lengths into the buffers allocated from the specified
> -  memory pools. The backward compatibility to existing API is preserved.
> -
>  * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
>    will be removed in 21.11.
>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index bcc0fc2..bcc2479 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -60,6 +60,12 @@ New Features
>    Added the FEC API which provides functions for query FEC capabilities and
>    current FEC mode from device. Also, API for configuring FEC mode is also provided.
>
> +* **Introduced extended buffer description for receiving.**
> +
> +  Added the extended Rx buffer description for Rx queue setup routine
> +  providing the individual settings for each Rx segment with maximal size,
> +  buffer offset and memory pool to allocate data buffers from.
> +
>  * **Updated Broadcom bnxt driver.**
>
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> @@ -253,6 +259,9 @@ API Changes
>    As the data of ``uint8_t`` will be truncated when queue number under
>    a TC is greater than 256.
>
> +* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
> +  to provide extended description of the receiving buffer.
> +
>  * vhost: Moved vDPA APIs from experimental to stable.
>
>  * rawdev: Added a structure size parameter to the functions
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 892c246..96ecb91 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
>  #define RTE_RX_OFFLOAD_BIT2STR(_name)  \
>         { DEV_RX_OFFLOAD_##_name, #_name }
>
> +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)      \
> +       { RTE_ETH_RX_OFFLOAD_##_name, #_name }
> +
>  static const struct {
>         uint64_t offload;
>         const char *name;
> @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
>         RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
>         RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>         RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> +       RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
>  };
>
>  #undef RTE_RX_OFFLOAD_BIT2STR
> +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
>
>  #define RTE_TX_OFFLOAD_BIT2STR(_name)  \
>         { DEV_TX_OFFLOAD_##_name, #_name }
> @@ -1784,38 +1789,94 @@ struct rte_eth_dev *
>                 return -EINVAL;
>         }
>
> -       if (mp == NULL) {
> -               RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> -               return -EINVAL;
> -       }
> -
>         RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
>
> -       /*
> -        * Check the size of the mbuf data buffer.
> -        * This value must be provided in the private data of the memory pool.
> -        * First check that the memory pool has a valid private data.
> -        */
>         ret = rte_eth_dev_info_get(port_id, &dev_info);
>         if (ret != 0)
>                 return ret;
>
> -       if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> -               RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> -                       mp->name, (int)mp->private_data_size,
> -                       (int)sizeof(struct rte_pktmbuf_pool_private));
> -               return -ENOSPC;
> -       }
> -       mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +       if (mp) {
> +               /* Single pool configuration check. */
> +               if (rx_conf->rx_seg || rx_conf->rx_nseg) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "Ambiguous segment configuration\n");
> +                       return -EINVAL;
> +               }
> +               /*
> +                * Check the size of the mbuf data buffer, this value
> +                * must be provided in the private data of the memory pool.
> +                * First check that the memory pool(s) has a valid private data.
> +                */
> +               if (mp->private_data_size <
> +                               sizeof(struct rte_pktmbuf_pool_private)) {
> +                       RTE_ETHDEV_LOG(ERR, "%s private_data_size %u < %u\n",
> +                               mp->name, mp->private_data_size,
> +                               (unsigned int)
> +                               sizeof(struct rte_pktmbuf_pool_private));
> +                       return -ENOSPC;
> +               }
> +               mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +               if (mbp_buf_size < dev_info.min_rx_bufsize +
> +                                  RTE_PKTMBUF_HEADROOM) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "%s mbuf_data_room_size %u < %u"
> +                                      " (RTE_PKTMBUF_HEADROOM=%u +"
> +                                      " min_rx_bufsize(dev)=%u)\n",
> +                                      mp->name, mbp_buf_size,
> +                                      RTE_PKTMBUF_HEADROOM +
> +                                      dev_info.min_rx_bufsize,
> +                                      RTE_PKTMBUF_HEADROOM,
> +                                      dev_info.min_rx_bufsize);
> +                       return -EINVAL;
> +               }
> +       } else {
> +               const struct rte_eth_rxseg *rx_seg = rx_conf->rx_seg;
> +               uint16_t n_seg = rx_conf->rx_nseg;
> +               uint16_t seg_idx;
>
> -       if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
> -               RTE_ETHDEV_LOG(ERR,
> -                       "%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
> -                       mp->name, (int)mbp_buf_size,
> -                       (int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
> -                       (int)RTE_PKTMBUF_HEADROOM,
> -                       (int)dev_info.min_rx_bufsize);
> -               return -EINVAL;
> +               /* Extended multi-segment configuration check. */
> +               if (!rx_conf->rx_seg || !rx_conf->rx_nseg) {
> +                       RTE_ETHDEV_LOG(ERR,
> +                                      "Memory pool is null and no"
> +                                      " extended configuration provided\n");
> +                       return -EINVAL;
> +               }
> +               /*
> +                * Check the sizes and offsets against buffer sizes
> +                * for each segment specified in extended configuration.
> +                */
> +               for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +                       struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> +                       uint32_t length = rx_seg[seg_idx].length;
> +                       uint32_t offset = rx_seg[seg_idx].offset;
> +                       uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;
> +
> +                       if (mpl == NULL) {
> +                               RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> +                               return -EINVAL;
> +                       }
> +                       if (mpl->private_data_size <
> +                               sizeof(struct rte_pktmbuf_pool_private)) {
> +                               RTE_ETHDEV_LOG(ERR,
> +                                              "%s private_data_size %u < %u\n",
> +                                              mpl->name,
> +                                              mpl->private_data_size,
> +                                              (unsigned int)sizeof(struct
> +                                              rte_pktmbuf_pool_private));
> +                               return -ENOSPC;
> +                       }
> +                       mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> +                       length = length ? length : (mbp_buf_size - head_room);
> +                       if (mbp_buf_size < length + offset + head_room) {
> +                               RTE_ETHDEV_LOG(ERR,
> +                                       "%s mbuf_data_room_size %u < %u"
> +                                       " (segment length=%u +"
> +                                       " segment offset=%u)\n",
> +                                       mpl->name, mbp_buf_size,
> +                                       length + offset, length, offset);
> +                               return -EINVAL;
> +                       }
> +               }
>         }
>
>         /* Use default specified by driver, if nb_rx_desc is zero */
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 5bcfbb8..e019f4a 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -970,6 +970,16 @@ struct rte_eth_txmode {
>  };
>
>  /**
> + * A structure used to configure an RX packet segment to split.
> + */
> +struct rte_eth_rxseg {
> +       struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
> +       uint16_t length; /**< Segment data length, configures split point. */
> +       uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> +       uint32_t reserved; /**< Reserved field. */
> +};
> +
> +/**
>   * A structure used to configure an RX ring of an Ethernet port.
>   */
>  struct rte_eth_rxconf {
> @@ -977,6 +987,43 @@ struct rte_eth_rxconf {
>         uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
>         uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
>         uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +       uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> +       /**
> +        * Points to the array of segment descriptions. Each array element
> +        * describes the properties for each segment in the receiving
> +        * buffer.
> +        *
> +        * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field,
> +        * the PMD will split the received packets into multiple segments
> +        * according to the specification in the description array:
> +        *
> +        * - the first network buffer will be allocated from the memory pool,
> +        *   specified in the first array element, the second buffer, from the
> +        *   pool in the second element, and so on.
> +        *
> +        * - the offsets from the segment description elements specify
> +        *   the data offset from the buffer beginning except the first mbuf.
> +        *   For this one the offset is added with RTE_PKTMBUF_HEADROOM.
> +        *
> +        * - the lengthes in the elements define the maximal data amount
> +        *   being received to each segment. The receiving starts with filling
> +        *   up the first mbuf data buffer up to specified length. If the
> +        *   there are data remaining (packet is longer than buffer in the first
> +        *   mbuf) the following data will be pushed to the next segment
> +        *   up to its own length, and so on.
> +        *
> +        * - If the length in the segment description element is zero
> +        *   the actual buffer size will be deduced from the appropriate
> +        *   memory pool properties.
> +        *
> +        * - if there is not enough elements to describe the buffer for entire
> +        *   packet of maximal length the following parameters will be used
> +        *   for the all remaining segments:
> +        *     - pool from the last valid element
> +        *     - the buffer size from this pool
> +        *     - zero offset
> +        */
> +       struct rte_eth_rxseg *rx_seg;
>         /**
>          * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
>          * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1260,6 +1307,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_SCTP_CKSUM      0x00020000
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>  #define DEV_RX_OFFLOAD_RSS_HASH                0x00080000
> +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
>
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>                                  DEV_RX_OFFLOAD_UDP_CKSUM | \
> @@ -2027,9 +2075,21 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
>   *   No need to repeat any bit in rx_conf->offloads which has already been
>   *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
>   *   at port level can't be disabled at queue level.
> + *   The configuration structure also contains the pointer to the array
> + *   of the receiving buffer segment descriptions, see rx_seg and rx_nseg
> + *   fields, this extended configuration might be used by split offloads like
> + *   RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT. If mp_pool is not NULL,
> + *   the extended configuration fields must be set to NULL and zero.
>   * @param mb_pool
>   *   The pointer to the memory pool from which to allocate *rte_mbuf* network
> - *   memory buffers to populate each descriptor of the receive ring.
> + *   memory buffers to populate each descriptor of the receive ring. There are
> + *   two options to provide Rx buffer configuration:
> + *   - single pool:
> + *     mb_pool is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is 0.
> + *   - multiple segments description:
> + *     mb_pool is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not 0.
> + *     Taken only if flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is set in offloads.
> + *
>   * @return
>   *   - 0: Success, receive queue correctly set up.
>   *   - -EIO: if device is removed.
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [dpdk-dev] [PATCH v2] eal/rte_malloc: add alloc_size() attribute to allocation functions
  2020-08-17 17:49 [dpdk-dev] [RFC] ethdev: introduce Rx buffer split Slava Ovsiienko
                   ` (6 preceding siblings ...)
  2020-10-14 18:11 ` [dpdk-dev] [PATCH v6 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
@ 2020-10-15  0:55 ` Stephen Hemminger
  2020-10-19 14:13   ` Thomas Monjalon
  2020-10-15 20:17 ` [dpdk-dev] [PATCH v7 0/6] ethdev: introduce Rx buffer split Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 172+ messages in thread
From: Stephen Hemminger @ 2020-10-15  0:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

By using the alloc_size() attribute the compiler can optimize
better and detect errors at compile time.

For example, Gcc will fail one of the invalid allocation examples
in app/test/test_malloc.c because the allocation is outside the
limits of memory.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 - rebase onto correct branch (main)

 app/test/test_malloc.c              |  5 ++++-
 lib/librte_eal/include/rte_common.h | 12 ++++++++++++
 lib/librte_eal/include/rte_malloc.h | 24 ++++++++++++++++--------
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/app/test/test_malloc.c b/app/test/test_malloc.c
index 71b3cfdde5cf..fdf77b4f6a14 100644
--- a/app/test/test_malloc.c
+++ b/app/test/test_malloc.c
@@ -846,6 +846,9 @@ test_malloc_bad_params(void)
 	if (bad_ptr != NULL)
 		goto err_return;
 
+#if defined(RTE_CC_GCC) || defined(RTE_CC_CLANG)
+	/* this test can not be built, will get trapped at compile time! */
+#else
 	/* rte_malloc expected to return null with size will cause overflow */
 	align = RTE_CACHE_LINE_SIZE;
 	size = (size_t)-8;
@@ -857,7 +860,7 @@ test_malloc_bad_params(void)
 	bad_ptr = rte_realloc(NULL, size, align);
 	if (bad_ptr != NULL)
 		goto err_return;
-
+#endif
 	return 0;
 
 err_return:
diff --git a/lib/librte_eal/include/rte_common.h b/lib/librte_eal/include/rte_common.h
index 2920255fc1e3..e63ef0f1de5e 100644
--- a/lib/librte_eal/include/rte_common.h
+++ b/lib/librte_eal/include/rte_common.h
@@ -134,6 +134,18 @@ typedef uint16_t unaligned_uint16_t;
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
 
+/**
+ * Tells compiler that the function returns a value that points to
+ * memory, where the size is given by the one or two arguments.
+ * Used by compiler to validate object size.
+ */
+#if defined(RTE_CC_GCC) || defined(RTE_CC_CLANG)
+#define __rte_alloc_size(...) \
+	__attribute__((alloc_size(__VA_ARGS__)))
+#else
+#define __rte_alloc_size(...)
+#endif
+
 #define RTE_PRIORITY_LOG 101
 #define RTE_PRIORITY_BUS 110
 #define RTE_PRIORITY_CLASS 120
diff --git a/lib/librte_eal/include/rte_malloc.h b/lib/librte_eal/include/rte_malloc.h
index 42ca05182f8e..3af64f87618e 100644
--- a/lib/librte_eal/include/rte_malloc.h
+++ b/lib/librte_eal/include/rte_malloc.h
@@ -54,7 +54,8 @@ struct rte_malloc_socket_stats {
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_malloc(const char *type, size_t size, unsigned align);
+rte_malloc(const char *type, size_t size, unsigned align)
+	__rte_alloc_size(2);
 
 /**
  * Allocate zero'ed memory from the heap.
@@ -80,7 +81,8 @@ rte_malloc(const char *type, size_t size, unsigned align);
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_zmalloc(const char *type, size_t size, unsigned align);
+rte_zmalloc(const char *type, size_t size, unsigned align)
+	__rte_alloc_size(2);
 
 /**
  * Replacement function for calloc(), using huge-page memory. Memory area is
@@ -106,7 +108,8 @@ rte_zmalloc(const char *type, size_t size, unsigned align);
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_calloc(const char *type, size_t num, size_t size, unsigned align);
+rte_calloc(const char *type, size_t num, size_t size, unsigned align)
+	__rte_alloc_size(2, 3);
 
 /**
  * Replacement function for realloc(), using huge-page memory. Reserved area
@@ -129,7 +132,8 @@ rte_calloc(const char *type, size_t num, size_t size, unsigned align);
  *   - Otherwise, the pointer to the reallocated memory.
  */
 void *
-rte_realloc(void *ptr, size_t size, unsigned int align);
+rte_realloc(void *ptr, size_t size, unsigned int align)
+	__rte_alloc_size(2);
 
 /**
  * Replacement function for realloc(), using huge-page memory. Reserved area
@@ -155,7 +159,8 @@ rte_realloc(void *ptr, size_t size, unsigned int align);
  */
 __rte_experimental
 void *
-rte_realloc_socket(void *ptr, size_t size, unsigned int align, int socket);
+rte_realloc_socket(void *ptr, size_t size, unsigned int align, int socket)
+	__rte_alloc_size(2, 3);
 
 /**
  * This function allocates memory from the huge-page area of memory. The memory
@@ -181,7 +186,8 @@ rte_realloc_socket(void *ptr, size_t size, unsigned int align, int socket);
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_malloc_socket(const char *type, size_t size, unsigned align, int socket);
+rte_malloc_socket(const char *type, size_t size, unsigned align, int socket)
+	__rte_alloc_size(2);
 
 /**
  * Allocate zero'ed memory from the heap.
@@ -209,7 +215,8 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket);
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket);
+rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket)
+	__rte_alloc_size(2);
 
 /**
  * Replacement function for calloc(), using huge-page memory. Memory area is
@@ -237,7 +244,8 @@ rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket);
  *   - Otherwise, the pointer to the allocated object.
  */
 void *
-rte_calloc_socket(const char *type, size_t num, size_t size, unsigned align, int socket);
+rte_calloc_socket(const char *type, size_t num, size_t size, unsigned align, int socket)
+	__rte_alloc_size(2, 3);
 
 /**
  * Frees the memory space pointed to by the provided pointer.
-- 
2.27.0


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:57     ` Jerin Jacob
@ 2020-10-15  7:43       ` Slava Ovsiienko
  2020-10-15  9:27         ` Jerin Jacob
  2020-10-15  9:49         ` Andrew Rybchenko
  0 siblings, 2 replies; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-15  7:43 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, NBU-Contact-Thomas Monjalon, Stephen Hemminger,
	Ferruh Yigit, Olivier Matz, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

Hi, Jerin

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, October 14, 2020 21:57
> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Stephen Hemminger
> <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Subject: Re: [PATCH v6 1/6] ethdev: introduce Rx buffer split
> 
> On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
> <viacheslavo@nvidia.com> wrote:
> >
> > The DPDK datapath in the transmit direction is very flexible.
> > An application can build the multi-segment packet and manages almost
> > all data aspects - the memory pools where segments are allocated from,
> > the segment lengths, the memory attributes like external buffers,
> > registered for DMA, etc.
> >

[..snip..]

> > For example, let's suppose we configured the Rx queue with the
> > following segments:
> >     seg0 - pool0, len0=14B, off0=2
> >     seg1 - pool1, len1=20B, off1=128B
> >     seg2 - pool2, len2=20B, off2=0B
> >     seg3 - pool3, len3=512B, off3=0B
> 
> 
> Sorry for chime in late. This API lookout looks good to me.
> But, I am wondering how the application can know the capability or "limits" of
> struct rte_eth_rxseg structure for the specific PMD. The other descriptor limit,
> it's being exposed with struct rte_eth_dev_info::rx_desc_lim; If PMD can
> support a specific pattern rather than returning the blanket error, the
> application should know the limit.
> IMO, it is better to add
> struct rte_eth_rxseg *rxsegs;
> unint16_t nb_max_rxsegs
> in rte_eth_dev_info structure to express the capablity.
> Where the en and offset can define the max offset.
> 
> Thoughts?

Moreover, there might be implied a lot of various limitations - offsets might be not supported at all or
have some requirements for alignment, the similar requirements might be applied to segment size
(say, ask for some granularity). Currently it is not obvious how to report all nuances, and it is supposed
the limitations of this kind must be documented in PMD chapter. As for mlx5 - it has no special
limitations besides common requirements to the regular segments.

One more point - the split feature might be considered as just one of possible cases of using
these segment descriptions, other features might impose other (unknown for now) limitations.
If we see some of the features of such kind or other PMDs adopts the split feature - we'll try to find
the common root and consider the way how to report it.

With best regards, Slava


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-15  7:43       ` Slava Ovsiienko
@ 2020-10-15  9:27         ` Jerin Jacob
  2020-10-15 10:27           ` Jerin Jacob
  2020-10-15  9:49         ` Andrew Rybchenko
  1 sibling, 1 reply; 172+ messages in thread
From: Jerin Jacob @ 2020-10-15  9:27 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dpdk-dev, NBU-Contact-Thomas Monjalon, Stephen Hemminger,
	Ferruh Yigit, Olivier Matz, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

On Thu, Oct 15, 2020 at 1:13 PM Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
>
> Hi, Jerin

Hi Slava,

>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, October 14, 2020 21:57
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; Stephen Hemminger
> > <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; David Marchand
> > <david.marchand@redhat.com>; Andrew Rybchenko
> > <arybchenko@solarflare.com>
> > Subject: Re: [PATCH v6 1/6] ethdev: introduce Rx buffer split
> >
> > On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
> > <viacheslavo@nvidia.com> wrote:
> > >
> > > The DPDK datapath in the transmit direction is very flexible.
> > > An application can build the multi-segment packet and manages almost
> > > all data aspects - the memory pools where segments are allocated from,
> > > the segment lengths, the memory attributes like external buffers,
> > > registered for DMA, etc.
> > >
>
> [..snip..]
>
> > > For example, let's suppose we configured the Rx queue with the
> > > following segments:
> > >     seg0 - pool0, len0=14B, off0=2
> > >     seg1 - pool1, len1=20B, off1=128B
> > >     seg2 - pool2, len2=20B, off2=0B
> > >     seg3 - pool3, len3=512B, off3=0B
> >
> >
> > Sorry for chime in late. This API lookout looks good to me.
> > But, I am wondering how the application can know the capability or "limits" of
> > struct rte_eth_rxseg structure for the specific PMD. The other descriptor limit,
> > it's being exposed with struct rte_eth_dev_info::rx_desc_lim; If PMD can
> > support a specific pattern rather than returning the blanket error, the
> > application should know the limit.
> > IMO, it is better to add
> > struct rte_eth_rxseg *rxsegs;
> > unint16_t nb_max_rxsegs
> > in rte_eth_dev_info structure to express the capablity.
> > Where the en and offset can define the max offset.
> >
> > Thoughts?
>
> Moreover, there might be implied a lot of various limitations - offsets might be not supported at all or
> have some requirements for alignment, the similar requirements might be applied to segment size
> (say, ask for some granularity). Currently it is not obvious how to report all nuances, and it is supposed
> the limitations of this kind must be documented in PMD chapter. As for mlx5 - it has no special
> limitations besides common requirements to the regular segments.

Reporting the limitation in the documentation will not help for the
generic applications.

>
> One more point - the split feature might be considered as just one of possible cases of using
> these segment descriptions, other features might impose other (unknown for now) limitations.
> If we see some of the features of such kind or other PMDs adopts the split feature - we'll try to find
> the common root and consider the way how to report it.

My only concern with that approach will be ABI break again if
something needs to exposed over rte_eth_dev_info().
IMO, if we featured needs to completed only when its capabilities are
exposed in a programmatic manner.
As of mlx5, if there not limitation then info
rte_eth_dev_info::rxsegs[x].len, offset etc as UINT16_MAX so
that application is aware of the state.

>
> With best regards, Slava
>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-15  7:43       ` Slava Ovsiienko
  2020-10-15  9:27         ` Jerin Jacob
@ 2020-10-15  9:49         ` Andrew Rybchenko
  2020-10-15 10:34           ` Slava Ovsiienko
  1 sibling, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-15  9:49 UTC (permalink / raw)
  To: Slava Ovsiienko, Jerin Jacob
  Cc: dpdk-dev, NBU-Contact-Thomas Monjalon, Stephen Hemminger,
	Ferruh Yigit, Olivier Matz, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

On 10/15/20 10:43 AM, Slava Ovsiienko wrote:
> Hi, Jerin
> 
>> -----Original Message-----
>> From: Jerin Jacob <jerinjacobk@gmail.com>
>> Sent: Wednesday, October 14, 2020 21:57
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>
>> Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
>> <thomas@monjalon.net>; Stephen Hemminger
>> <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
>> Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
>> <maxime.coquelin@redhat.com>; David Marchand
>> <david.marchand@redhat.com>; Andrew Rybchenko
>> <arybchenko@solarflare.com>
>> Subject: Re: [PATCH v6 1/6] ethdev: introduce Rx buffer split
>>
>> On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
>> <viacheslavo@nvidia.com> wrote:
>>>
>>> The DPDK datapath in the transmit direction is very flexible.
>>> An application can build the multi-segment packet and manages almost
>>> all data aspects - the memory pools where segments are allocated from,
>>> the segment lengths, the memory attributes like external buffers,
>>> registered for DMA, etc.
>>>
> 
> [..snip..]
> 
>>> For example, let's suppose we configured the Rx queue with the
>>> following segments:
>>>     seg0 - pool0, len0=14B, off0=2
>>>     seg1 - pool1, len1=20B, off1=128B
>>>     seg2 - pool2, len2=20B, off2=0B
>>>     seg3 - pool3, len3=512B, off3=0B
>>
>>
>> Sorry for chime in late. This API lookout looks good to me.
>> But, I am wondering how the application can know the capability or "limits" of
>> struct rte_eth_rxseg structure for the specific PMD. The other descriptor limit,
>> it's being exposed with struct rte_eth_dev_info::rx_desc_lim; If PMD can
>> support a specific pattern rather than returning the blanket error, the
>> application should know the limit.
>> IMO, it is better to add
>> struct rte_eth_rxseg *rxsegs;
>> unint16_t nb_max_rxsegs
>> in rte_eth_dev_info structure to express the capablity.
>> Where the en and offset can define the max offset.
>>
>> Thoughts?
> 
> Moreover, there might be implied a lot of various limitations - offsets might be not supported at all or
> have some requirements for alignment, the similar requirements might be applied to segment size
> (say, ask for some granularity). Currently it is not obvious how to report all nuances, and it is supposed
> the limitations of this kind must be documented in PMD chapter. As for mlx5 - it has no special
> limitations besides common requirements to the regular segments.
> 
> One more point - the split feature might be considered as just one of possible cases of using
> these segment descriptions, other features might impose other (unknown for now) limitations.
> If we see some of the features of such kind or other PMDs adopts the split feature - we'll try to find
> the common root and consider the way how to report it.

At least there are few simple limitations which are easy to
express:
 1. Maximum number of segments
 2. Possibility to use the last segment many times if required
    (I was suggesting to use scatter for it, but you rejected
     the idea - may be time to reconsider :) )
 3. Maximum offset
    Frankly speaking I'm not sure why it cannot be handled on
    PMD level (i.e. provide descriptors with offset taken into
    account or guarantee that HW mempool objects initialized
    correctly with required headroom). May be in some corner
    cases when the same HW mempool is shared by various
    segments with different offset requirements.
 4. Offset alignment
 5. Maximum/minimum length of a segment
 6. Length alignment

I realize that 3, 4 and 5 could be per segment number.
If it is really that complex, report common denominator
which is guaranteed to work. If we have no checks on ethdev
layer, application can ignore it if it knows better.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-14 18:11   ` [dpdk-dev] [PATCH v6 1/6] " Viacheslav Ovsiienko
                       ` (2 preceding siblings ...)
  2020-10-14 22:50     ` Ajit Khaparde
@ 2020-10-15 10:11     ` Andrew Rybchenko
  2020-10-15 10:19       ` Thomas Monjalon
  3 siblings, 1 reply; 172+ messages in thread
From: Andrew Rybchenko @ 2020-10-15 10:11 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: thomas, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

On 10/14/20 9:11 PM, Viacheslav Ovsiienko wrote:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
> 		       	configures "split point" */
>     uint16_t offset; /* data offset from beginning
> 		       	of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
> 
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
> 	     This array allows to specify the different settings for
> 	     each segment in individual fashion.
>    rx_nseg - number of elements in the array
> 
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
> 
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
> 
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
> 
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
> 
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
> 
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if rx_nseg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/nics/features.rst           |  15 +++++
>  doc/guides/rel_notes/deprecation.rst   |   5 --
>  doc/guides/rel_notes/release_20_11.rst |   9 +++
>  lib/librte_ethdev/rte_ethdev.c         | 111 +++++++++++++++++++++++++--------
>  lib/librte_ethdev/rte_ethdev.h         |  62 +++++++++++++++++-
>  5 files changed, 171 insertions(+), 31 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index dd8c955..832ea3b 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -185,6 +185,21 @@ Supports receiving segmented mbufs.
>  * **[related]    eth_dev_ops**: ``rx_pkt_burst``.
>  
>  
> +.. _nic_features_buffer_split:
> +
> +Buffer Split on Rx
> +------------------
> +
> +Scatters the packets being received on specified boundaries to segmented mbufs.
> +
> +* **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[uses]       rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``.
> +* **[implements] datapath**: ``Buffer Split functionality``.
> +* **[provides]   rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``.
> +* **[provides]   eth_dev_ops**: ``rxq_info_get:buffer_split``.
> +* **[related] API**: ``rte_eth_rx_queue_setup()``.
> +
> +
>  .. _nic_features_lro:
>  
>  LRO
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 584e720..232cd54 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -138,11 +138,6 @@ Deprecation Notices
>    In 19.11 PMDs will still update the field even when the offload is not
>    enabled.
>  
> -* ethdev: Add new fields to ``rte_eth_rxconf`` to configure the receiving
> -  queues to split ingress packets into multiple segments according to the
> -  specified lengths into the buffers allocated from the specified
> -  memory pools. The backward compatibility to existing API is preserved.
> -
>  * ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
>    will be removed in 21.11.
>    Existing ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index bcc0fc2..bcc2479 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -60,6 +60,12 @@ New Features
>    Added the FEC API which provides functions for query FEC capabilities and
>    current FEC mode from device. Also, API for configuring FEC mode is also provided.
>  
> +* **Introduced extended buffer description for receiving.**
> +
> +  Added the extended Rx buffer description for Rx queue setup routine
> +  providing the individual settings for each Rx segment with maximal size,
> +  buffer offset and memory pool to allocate data buffers from.
> +
>  * **Updated Broadcom bnxt driver.**
>  
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> @@ -253,6 +259,9 @@ API Changes
>    As the data of ``uint8_t`` will be truncated when queue number under
>    a TC is greater than 256.
>  
> +* ethdev: Added fields rx_seg and rx_nseg to rte_eth_rxconf structure
> +  to provide extended description of the receiving buffer.
> +
>  * vhost: Moved vDPA APIs from experimental to stable.
>  
>  * rawdev: Added a structure size parameter to the functions
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 892c246..96ecb91 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -105,6 +105,9 @@ struct rte_eth_xstats_name_off {
>  #define RTE_RX_OFFLOAD_BIT2STR(_name)	\
>  	{ DEV_RX_OFFLOAD_##_name, #_name }
>  
> +#define RTE_ETH_RX_OFFLOAD_BIT2STR(_name)	\
> +	{ RTE_ETH_RX_OFFLOAD_##_name, #_name }
> +
>  static const struct {
>  	uint64_t offload;
>  	const char *name;
> @@ -128,9 +131,11 @@ struct rte_eth_xstats_name_off {
>  	RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
>  	RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
>  	RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
> +	RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
>  };
>  
>  #undef RTE_RX_OFFLOAD_BIT2STR
> +#undef RTE_ETH_RX_OFFLOAD_BIT2STR
>  
>  #define RTE_TX_OFFLOAD_BIT2STR(_name)	\
>  	{ DEV_TX_OFFLOAD_##_name, #_name }
> @@ -1784,38 +1789,94 @@ struct rte_eth_dev *
>  		return -EINVAL;
>  	}
>  
> -	if (mp == NULL) {
> -		RTE_ETHDEV_LOG(ERR, "Invalid null mempool pointer\n");
> -		return -EINVAL;
> -	}
> -
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
>  
> -	/*
> -	 * Check the size of the mbuf data buffer.
> -	 * This value must be provided in the private data of the memory pool.
> -	 * First check that the memory pool has a valid private data.
> -	 */
>  	ret = rte_eth_dev_info_get(port_id, &dev_info);
>  	if (ret != 0)
>  		return ret;
>  
> -	if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) {
> -		RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n",
> -			mp->name, (int)mp->private_data_size,
> -			(int)sizeof(struct rte_pktmbuf_pool_private));
> -		return -ENOSPC;
> -	}
> -	mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +	if (mp) {

Please, compare vs NULL [1]

[1] https://doc.dpdk.org/guides/contributing/coding_style.html#null-pointers

> +		/* Single pool configuration check. */
> +		if (rx_conf->rx_seg || rx_conf->rx_nseg) {

Please, compare vs NULL and 0. IMHO, rx_nsegs check is sufficient. If it
is 0, nobody cares what is in rx_seg.

> +			RTE_ETHDEV_LOG(ERR,
> +				       "Ambiguous segment configuration\n");
> +			return -EINVAL;
> +		}
> +		/*
> +		 * Check the size of the mbuf data buffer, this value
> +		 * must be provided in the private data of the memory pool.
> +		 * First check that the memory pool(s) has a valid private data.
> +		 */
> +		if (mp->private_data_size <
> +				sizeof(struct rte_pktmbuf_pool_private)) {
> +			RTE_ETHDEV_LOG(ERR, "%s private_data_size %u < %u\n",
> +				mp->name, mp->private_data_size,
> +				(unsigned int)
> +				sizeof(struct rte_pktmbuf_pool_private));
> +			return -ENOSPC;
> +		}
> +		mbp_buf_size = rte_pktmbuf_data_room_size(mp);
> +		if (mbp_buf_size < dev_info.min_rx_bufsize +
> +				   RTE_PKTMBUF_HEADROOM) {
> +			RTE_ETHDEV_LOG(ERR,
> +				       "%s mbuf_data_room_size %u < %u"
> +				       " (RTE_PKTMBUF_HEADROOM=%u +"
> +				       " min_rx_bufsize(dev)=%u)\n",

Do not split format string. It is not a problem that it is long.

> +				       mp->name, mbp_buf_size,
> +				       RTE_PKTMBUF_HEADROOM +
> +				       dev_info.min_rx_bufsize,
> +				       RTE_PKTMBUF_HEADROOM,
> +				       dev_info.min_rx_bufsize);
> +			return -EINVAL;
> +		}
> +	} else {
> +		const struct rte_eth_rxseg *rx_seg = rx_conf->rx_seg;
> +		uint16_t n_seg = rx_conf->rx_nseg;
> +		uint16_t seg_idx;
>  
> -	if (mbp_buf_size < dev_info.min_rx_bufsize + RTE_PKTMBUF_HEADROOM) {
> -		RTE_ETHDEV_LOG(ERR,
> -			"%s mbuf_data_room_size %d < %d (RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)=%d)\n",
> -			mp->name, (int)mbp_buf_size,
> -			(int)(RTE_PKTMBUF_HEADROOM + dev_info.min_rx_bufsize),
> -			(int)RTE_PKTMBUF_HEADROOM,
> -			(int)dev_info.min_rx_bufsize);
> -		return -EINVAL;
> +		/* Extended multi-segment configuration check. */
> +		if (!rx_conf->rx_seg || !rx_conf->rx_nseg) {

Please, compare vs NULL and 0

> +			RTE_ETHDEV_LOG(ERR,
> +				       "Memory pool is null and no"
> +				       " extended configuration provided\n");

Do not split format string. It is not a problem that it is long.


> +			return -EINVAL;
> +		}
> +		/*
> +		 * Check the sizes and offsets against buffer sizes
> +		 * for each segment specified in extended configuration.
> +		 */
> +		for (seg_idx = 0; seg_idx < n_seg; seg_idx++) {
> +			struct rte_mempool *mpl = rx_seg[seg_idx].mp;
> +			uint32_t length = rx_seg[seg_idx].length;
> +			uint32_t offset = rx_seg[seg_idx].offset;
> +			uint32_t head_room = seg_idx ? 0 : RTE_PKTMBUF_HEADROOM;

Why? Shouldn't it be in offset? IMHO too many offsets this way.

> +
> +			if (mpl == NULL) {
> +				RTE_ETHDEV_LOG(ERR, "null mempool pointer\n");
> +				return -EINVAL;
> +			}
> +			if (mpl->private_data_size <
> +				sizeof(struct rte_pktmbuf_pool_private)) {
> +				RTE_ETHDEV_LOG(ERR,
> +					       "%s private_data_size %u < %u\n",
> +					       mpl->name,
> +					       mpl->private_data_size,
> +					       (unsigned int)sizeof(struct
> +					       rte_pktmbuf_pool_private));
> +				return -ENOSPC;
> +			}
> +			mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> +			length = length ? length : (mbp_buf_size - head_room);

Compare length with 0.
What does ensure that mbp_buf_size is greater or equal to head_room size?

> +			if (mbp_buf_size < length + offset + head_room) {
> +				RTE_ETHDEV_LOG(ERR,
> +					"%s mbuf_data_room_size %u < %u"
> +					" (segment length=%u +"
> +					" segment offset=%u)\n",

Do not split format string. It is not a problem that it is long.


> +					mpl->name, mbp_buf_size,
> +					length + offset, length, offset);
> +				return -EINVAL;
> +			}
> +		}
>  	}
>  
>  	/* Use default specified by driver, if nb_rx_desc is zero */
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 5bcfbb8..e019f4a 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -970,6 +970,16 @@ struct rte_eth_txmode {
>  };
>  
>  /**
> + * A structure used to configure an RX packet segment to split.

RX -> Rx

> + */
> +struct rte_eth_rxseg {
> +	struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
> +	uint16_t length; /**< Segment data length, configures split point. */
> +	uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
> +	uint32_t reserved; /**< Reserved field. */
> +};
> +
> +/**
>   * A structure used to configure an RX ring of an Ethernet port.
>   */
>  struct rte_eth_rxconf {
> @@ -977,6 +987,43 @@ struct rte_eth_rxconf {
>  	uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
>  	uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
>  	uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +	uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
> +	/**
> +	 * Points to the array of segment descriptions. Each array element
> +	 * describes the properties for each segment in the receiving
> +	 * buffer.
> +	 *
> +	 * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field,
> +	 * the PMD will split the received packets into multiple segments
> +	 * according to the specification in the description array:
> +	 *
> +	 * - the first network buffer will be allocated from the memory pool,
> +	 *   specified in the first array element, the second buffer, from the
> +	 *   pool in the second element, and so on.
> +	 *
> +	 * - the offsets from the segment description elements specify
> +	 *   the data offset from the buffer beginning except the first mbuf.
> +	 *   For this one the offset is added with RTE_PKTMBUF_HEADROOM.
> +	 *
> +	 * - the lengthes in the elements define the maximal data amount
> +	 *   being received to each segment. The receiving starts with filling
> +	 *   up the first mbuf data buffer up to specified length. If the
> +	 *   there are data remaining (packet is longer than buffer in the first
> +	 *   mbuf) the following data will be pushed to the next segment
> +	 *   up to its own length, and so on.
> +	 *
> +	 * - If the length in the segment description element is zero
> +	 *   the actual buffer size will be deduced from the appropriate
> +	 *   memory pool properties.
> +	 *
> +	 * - if there is not enough elements to describe the buffer for entire
> +	 *   packet of maximal length the following parameters will be used
> +	 *   for the all remaining segments:
> +	 *     - pool from the last valid element
> +	 *     - the buffer size from this pool
> +	 *     - zero offset
> +	 */
> +	struct rte_eth_rxseg *rx_seg;
>  	/**
>  	 * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
>  	 * Only offloads set on rx_queue_offload_capa or rx_offload_capa
> @@ -1260,6 +1307,7 @@ struct rte_eth_conf {
>  #define DEV_RX_OFFLOAD_SCTP_CKSUM	0x00020000
>  #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
>  #define DEV_RX_OFFLOAD_RSS_HASH		0x00080000
> +#define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
>  
>  #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
>  				 DEV_RX_OFFLOAD_UDP_CKSUM | \
> @@ -2027,9 +2075,21 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_queue,
>   *   No need to repeat any bit in rx_conf->offloads which has already been
>   *   enabled in rte_eth_dev_configure() at port level. An offloading enabled
>   *   at port level can't be disabled at queue level.
> + *   The configuration structure also contains the pointer to the array
> + *   of the receiving buffer segment descriptions, see rx_seg and rx_nseg
> + *   fields, this extended configuration might be used by split offloads like
> + *   RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT. If mp_pool is not NULL,
> + *   the extended configuration fields must be set to NULL and zero.
>   * @param mb_pool
>   *   The pointer to the memory pool from which to allocate *rte_mbuf* network
> - *   memory buffers to populate each descriptor of the receive ring.
> + *   memory buffers to populate each descriptor of the receive ring. There are
> + *   two options to provide Rx buffer configuration:
> + *   - single pool:
> + *     mb_pool is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is 0.
> + *   - multiple segments description:
> + *     mb_pool is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not 0.
> + *     Taken only if flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is set in offloads.
> + *
>   * @return
>   *   - 0: Success, receive queue correctly set up.
>   *   - -EIO: if device is removed.
> 


^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-15 10:11     ` Andrew Rybchenko
@ 2020-10-15 10:19       ` Thomas Monjalon
  0 siblings, 0 replies; 172+ messages in thread
From: Thomas Monjalon @ 2020-10-15 10:19 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, Andrew Rybchenko
  Cc: dev, stephen, ferruh.yigit, olivier.matz, jerinjacobk,
	maxime.coquelin, david.marchand, arybchenko

15/10/2020 12:11, Andrew Rybchenko:
> On 10/14/20 9:11 PM, Viacheslav Ovsiienko wrote:
> > +		/* Single pool configuration check. */
> > +		if (rx_conf->rx_seg || rx_conf->rx_nseg) {
> 
> Please, compare vs NULL and 0. IMHO, rx_nsegs check is sufficient. If it
> is 0, nobody cares what is in rx_seg.

Yes the pointer should not be a criteria.
Having more than zero items is enough to check.

[...]
> > +			RTE_ETHDEV_LOG(ERR,
> > +				       "%s mbuf_data_room_size %u < %u"
> > +				       " (RTE_PKTMBUF_HEADROOM=%u +"
> > +				       " min_rx_bufsize(dev)=%u)\n",
> 
> Do not split format string. It is not a problem that it is long.

The benefit of keeping format string on the same line is for "grepping"
the source code. But after a format specifier, I think we can split.
Who is grepping "< %u (RTE_PKTMBUF_HEADROOM" ?
I would just change the split on the second line after the %u.




^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-15  9:27         ` Jerin Jacob
@ 2020-10-15 10:27           ` Jerin Jacob
  2020-10-15 10:51             ` Slava Ovsiienko
  0 siblings, 1 reply; 172+ messages in thread
From: Jerin Jacob @ 2020-10-15 10:27 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dpdk-dev, NBU-Contact-Thomas Monjalon, Stephen Hemminger,
	Ferruh Yigit, Olivier Matz, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

On Thu, Oct 15, 2020 at 2:57 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 1:13 PM Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
> >
> > Hi, Jerin
>
> Hi Slava,
>
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, October 14, 2020 21:57
> > > To: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
> > > <thomas@monjalon.net>; Stephen Hemminger
> > > <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
> > > <maxime.coquelin@redhat.com>; David Marchand
> > > <david.marchand@redhat.com>; Andrew Rybchenko
> > > <arybchenko@solarflare.com>
> > > Subject: Re: [PATCH v6 1/6] ethdev: introduce Rx buffer split
> > >
> > > On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
> > > <viacheslavo@nvidia.com> wrote:
> > > >
> > > > The DPDK datapath in the transmit direction is very flexible.
> > > > An application can build the multi-segment packet and manages almost
> > > > all data aspects - the memory pools where segments are allocated from,
> > > > the segment lengths, the memory attributes like external buffers,
> > > > registered for DMA, etc.
> > > >
> >
> > [..snip..]
> >
> > > > For example, let's suppose we configured the Rx queue with the
> > > > following segments:
> > > >     seg0 - pool0, len0=14B, off0=2
> > > >     seg1 - pool1, len1=20B, off1=128B
> > > >     seg2 - pool2, len2=20B, off2=0B
> > > >     seg3 - pool3, len3=512B, off3=0B
> > >
> > >
> > > Sorry for chime in late. This API lookout looks good to me.
> > > But, I am wondering how the application can know the capability or "limits" of
> > > struct rte_eth_rxseg structure for the specific PMD. The other descriptor limit,
> > > it's being exposed with struct rte_eth_dev_info::rx_desc_lim; If PMD can
> > > support a specific pattern rather than returning the blanket error, the
> > > application should know the limit.
> > > IMO, it is better to add
> > > struct rte_eth_rxseg *rxsegs;
> > > unint16_t nb_max_rxsegs
> > > in rte_eth_dev_info structure to express the capablity.
> > > Where the en and offset can define the max offset.
> > >
> > > Thoughts?
> >
> > Moreover, there might be implied a lot of various limitations - offsets might be not supported at all or
> > have some requirements for alignment, the similar requirements might be applied to segment size
> > (say, ask for some granularity). Currently it is not obvious how to report all nuances, and it is supposed
> > the limitations of this kind must be documented in PMD chapter. As for mlx5 - it has no special
> > limitations besides common requirements to the regular segments.
>
> Reporting the limitation in the documentation will not help for the
> generic applications.
>
> >
> > One more point - the split feature might be considered as just one of possible cases of using
> > these segment descriptions, other features might impose other (unknown for now) limitations.

Also , I agree that w will have multiple use cases with segment descriptors.
In order to make it future proof on the API definion is better to have
from:
struct rte_eth_rxseg {
   struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
  uint16_t length; /**< Segment data length, configures split point. */
  uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
  uint32_t reserved; /**< Reserved field. */
};
something lime below:

struct rte_eth_rxseg {
    enum rte_eth_rxseg_mode mode ;
    union {
               struct rte_eth_rxseg_mode xxx {
                              struct rte_mempool *mp; /**< Memory pool
to allocate segment from. */
                              uint16_t length; /**< Segment data
length, configures split point. */
                               uint16_t offset; /**< Data offset from
beginning of mbuf data buffer. */
                               uint32_t reserved; /**< Reserved field. */
             }
}

Another mode, Marvell PMD has it(I believe Intel also) i.e
When we say:

seg0 - pool0, len0=2000B, off0=0
seg1 - pool1, len1=2001B, off1=0

packet size up to, 2000B goes to pool 0 and if is >=2001 goes to pool1.
I think, it is better to have mode param in rte_eth_rxseg for avoiding
ABI changes.(Just  like clean rte_flow APIs)

> > If we see some of the features of such kind or other PMDs adopts the split feature - we'll try to find
> > the common root and consider the way how to report it.
>
> My only concern with that approach will be ABI break again if
> something needs to exposed over rte_eth_dev_info().
> IMO, if we featured needs to completed only when its capabilities are
> exposed in a programmatic manner.
> As of mlx5, if there not limitation then info
> rte_eth_dev_info::rxsegs[x].len, offset etc as UINT16_MAX so
> that application is aware of the state.
>
> >
> > With best regards, Slava
> >

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
  2020-10-15  9:49         ` Andrew Rybchenko
@ 2020-10-15 10:34           ` Slava Ovsiienko
  2020-10-15 11:09             ` Andrew Rybchenko
  0 siblings, 1 reply; 172+ messages in thread
From: Slava Ovsiienko @ 2020-10-15 10:34 UTC (permalink / raw)
  To: Andrew Rybchenko, Jerin Jacob
  Cc: dpdk-dev, NBU-Contact-Thomas Monjalon, Stephen Hemminger,
	Ferruh Yigit, Olivier Matz, Maxime Coquelin, David Marchand,
	Andrew Rybchenko

Hi, Andrew

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Thursday, October 15, 2020 12:49
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Jerin Jacob
> <jerinjacobk@gmail.com>
> Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Stephen Hemminger
> <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Subject: Re: [dpdk-dev] [PATCH v6 1/6] ethdev: introduce Rx buffer split
> 
> On 10/15/20 10:43 AM, Slava Ovsiienko wrote:
> > Hi, Jerin
> >
> >> -----Original Message-----
> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> Sent: Wednesday, October 14, 2020 21:57
> >> To: Slava Ovsiienko <viacheslavo@nvidia.com>
> >> Cc: dpdk-dev <dev@dpdk.org>; NBU-Contact-Thomas Monjalon
> >> <thomas@monjalon.net>; Stephen Hemminger
> >> <stephen@networkplumber.org>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >> Olivier Matz <olivier.matz@6wind.com>; Maxime Coquelin
> >> <maxime.coquelin@redhat.com>; David Marchand
> >> <david.marchand@redhat.com>; Andrew Rybchenko
> >> <arybchenko@solarflare.com>
> >> Subject: Re: [PATCH v6 1/6] ethdev: introduce Rx buffer split
> >>
> >> On Wed, Oct 14, 2020 at 11:42 PM Viacheslav Ovsiienko
> >> <viacheslavo@nvidia.com> wrote:
> >>>
> >>> The DPDK datapath in the transmit direction is very flexible.
> >>> An application can build the multi-segment packet and manages almost
> >>> all data aspects - the memory pools where segments are allocated
> >>> from, the segment lengths, the memory attributes like external
> >>> buffers, registered for DMA, etc.
> >>>
> >
> > [..snip..]
> >
> >>> For example, let's suppose we configured the Rx queue with the
> >>> following segments:
> >>>     seg0 - pool0, len0=14B, off0=2
> >>>     seg1 - pool1, len1=20B, off1=128B
> >>>     seg2 - pool2, len2=20B, off2=0B
> >>>     seg3 - pool3, len3=512B, off3=0B
> >>
> >>
> >> Sorry for chime in late. This API lookout looks good to me.
> >> But, I am wondering how the application can know the capability or
> >> "limits" of struct rte_eth_rxseg structure for the specific PMD. The
> >> other descriptor limit, it's being exposed with struct
> >> rte_eth_dev_info::rx_desc_lim; If PMD can support a specific pattern
> >> rather than returning the blanket error, the application should know the
> limit.
> >> IMO, it is better to add
> >> struct rte_eth_rxseg *rxsegs;
> >> unint16_t nb_max_rxsegs
> >> in rte_eth_dev_info structure to express the capablity.
> >> Where the en and offset can define the max offset.
> >>
> >> Thoughts?
> >
> > Moreover, there might be implied a lot of various limitations -
> > offsets might be not supported at all or have some requirements for
> > alignment, the similar requirements might be applied to segment size
> > (say, ask for some granularity). Currently it is not obvious how to
> > report all nuances, and it is supposed the limitations of this kind must be
> documented in PMD chapter. As for mlx5 - it has no special limitations besides
> common requirements to the regular segments.
> >
> > One more point - the split feature might be considered as just one of
> > possible cases of using these segment descriptions, other features might
> impose other (unknown for now) limitations.
> > If we see some of the features of such kind or other PMDs adopts the
> > split feature - we'll try to find the common root and consider the way how to
> report it.
> 
> At least there are few simple limitations which are easy to
> express:
>  1. Maximum number of segments
We have scatter capability and we do not report the maximal number of segments,
it is on PMD own. We could add the field to the rte_eth_dev_info, but not sure
whether we have something special to report there even for mlx5 case.


>  2. Possibility to use the last segment many times if required
>     (I was suggesting to use scatter for it, but you rejected
>      the idea - may be time to reconsider :) ) 

Mmm, sorry I do not follow, it might be I did not understand/missed your idea.
Some of the last segment attributes are used multiple times to scatter the rest
of the data in fashion very close to the existing scattering approach - at least,
pool and buffer size from this pool are used. The beginning of the packet
scattered according to the new descriptions, the rest of the packet -
according to the existing regular scattering with pool settings from
the last segment description.

 3. Maximum offset
>     Frankly speaking I'm not sure why it cannot be handled on
>     PMD level (i.e. provide descriptors with offset taken into
>     account or guarantee that HW mempool objects initialized
>     correctly with required headroom). May be in some corner
>     cases when the same HW mempool is shared by various
>     segments with different offset requirements.

HW offsets are beyond the feature scope, the offsets in the segment
description is supposed to be added to the native pool offsets (if any).

>  4. Offset alignment
>  5. Maximum/minimum length of a segment
>  6. Length alignment
In which form? Mask of lsbs ? 0 means no limitations ?

> 
> I realize that 3, 4 and 5 could be per segment number.
> If it is really that complex, report common denominator which is guaranteed to
> work. If we have no checks on ethdev layer, application can ignore it if it knows
> better.

Currently it is not clear at all what kind of limitations should be reported,
we could include all of mentioned/proposed ones, and no one will report there -
mlx5 has no any reasonable limitations to report for now.

Should we reserve some pointer field in the rte_eth_dev_info to report
the limitations? (Limitation description should contain variable size array,
depending on the number of segments, so pointer seems to be appropriate).
It would allow us to avoid ABI break, and present the limitation structure once it is defined.

With best regards, Slava


^ permalink raw reply	[