DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
@ 2016-07-20 14:24 Tomasz Kulasek
  2016-07-20 15:01 ` Thomas Monjalon
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Tomasz Kulasek @ 2016-07-20 14:24 UTC (permalink / raw)
  To: dev

This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
changes in rte_eth_dev and rte_eth_desc_lim structures.

In 16.11, we plan to introduce rte_eth_tx_prep() function to do
necessary preparations of packet burst to be safely transmitted on
device for desired HW offloads (set/reset checksum field according to
the hardware requirements) and check HW constraints (number of segments
per packet, etc).

While the limitations and requirements may differ for devices, it
requires to extend rte_eth_dev structure with new function pointer
"tx_pkt_prep" which can be implemented in the driver to prepare and
verify packets, in devices specific way, before burst, what should to
prevent application to send malformed packets.

Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
nb_mtu_seg_max, providing an information about max segments in TSO and
non TSO packets acceptable by device.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 doc/guides/rel_notes/deprecation.rst |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..485aacb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,10 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
+  extended with new function pointer ``tx_pkt_prep`` allowing verification
+  and processing of packet burst to meet HW specific requirements before
+  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
+  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
+  segments limit to be transmitted by device for TSO/non-TSO packets.
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
@ 2016-07-20 15:01 ` Thomas Monjalon
  2016-07-20 15:13   ` Ananyev, Konstantin
  2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
  2016-07-31  7:46 ` [dpdk-dev] [PATCH] " Vlad Zolotarov
  2 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2016-07-20 15:01 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

Hi,

This patch announces an interesting change in the DPDK design.

2016-07-20 16:24, Tomasz Kulasek:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
> 
> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> necessary preparations of packet burst to be safely transmitted on
> device for desired HW offloads (set/reset checksum field according to
> the hardware requirements) and check HW constraints (number of segments
> per packet, etc).
> 
> While the limitations and requirements may differ for devices, it
> requires to extend rte_eth_dev structure with new function pointer
> "tx_pkt_prep" which can be implemented in the driver to prepare and
> verify packets, in devices specific way, before burst, what should to
> prevent application to send malformed packets.
> 
> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> nb_mtu_seg_max, providing an information about max segments in TSO and
> non TSO packets acceptable by device.

We cannot acknowledge such notice without a prior design discussion.
Please explain why you plan to work on this change and give a draft of
the new structures (a RFC patch would be ideal).

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:01 ` Thomas Monjalon
@ 2016-07-20 15:13   ` Ananyev, Konstantin
  2016-07-20 15:22     ` Thomas Monjalon
  0 siblings, 1 reply; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-20 15:13 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev

Hi Thomas,

> Hi,
> 
> This patch announces an interesting change in the DPDK design.
> 
> 2016-07-20 16:24, Tomasz Kulasek:
> > This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> > changes in rte_eth_dev and rte_eth_desc_lim structures.
> >
> > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > necessary preparations of packet burst to be safely transmitted on
> > device for desired HW offloads (set/reset checksum field according to
> > the hardware requirements) and check HW constraints (number of
> > segments per packet, etc).
> >
> > While the limitations and requirements may differ for devices, it
> > requires to extend rte_eth_dev structure with new function pointer
> > "tx_pkt_prep" which can be implemented in the driver to prepare and
> > verify packets, in devices specific way, before burst, what should to
> > prevent application to send malformed packets.
> >
> > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> > nb_mtu_seg_max, providing an information about max segments in TSO and
> > non TSO packets acceptable by device.
> 
> We cannot acknowledge such notice without a prior design discussion.
> Please explain why you plan to work on this change and give a draft of the new structures (a RFC patch would be ideal).

I think it is not really a deprecation note, but announce ABI change for rte_ethdev.h structures.
The plan is to implement what was proposed & discussed the following thread:
http://dpdk.org/ml/archives/dev/2015-September/023603.html

Konstantin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:13   ` Ananyev, Konstantin
@ 2016-07-20 15:22     ` Thomas Monjalon
  2016-07-20 15:42       ` Kulasek, TomaszX
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2016-07-20 15:22 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, dev

2016-07-20 15:13, Ananyev, Konstantin:
> Hi Thomas,
> 
> > Hi,
> > 
> > This patch announces an interesting change in the DPDK design.
> > 
> > 2016-07-20 16:24, Tomasz Kulasek:
> > > This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> > > changes in rte_eth_dev and rte_eth_desc_lim structures.
> > >
> > > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > > necessary preparations of packet burst to be safely transmitted on
> > > device for desired HW offloads (set/reset checksum field according to
> > > the hardware requirements) and check HW constraints (number of
> > > segments per packet, etc).
> > >
> > > While the limitations and requirements may differ for devices, it
> > > requires to extend rte_eth_dev structure with new function pointer
> > > "tx_pkt_prep" which can be implemented in the driver to prepare and
> > > verify packets, in devices specific way, before burst, what should to
> > > prevent application to send malformed packets.
> > >
> > > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> > > nb_mtu_seg_max, providing an information about max segments in TSO and
> > > non TSO packets acceptable by device.
> > 
> > We cannot acknowledge such notice without a prior design discussion.
> > Please explain why you plan to work on this change and give a draft of the new structures (a RFC patch would be ideal).
> 
> I think it is not really a deprecation note, but announce ABI change for rte_ethdev.h structures.

An ABI break requires a deprecation notice. So it is :)

> The plan is to implement what was proposed & discussed the following thread:
> http://dpdk.org/ml/archives/dev/2015-September/023603.html

Please could you summarize it here?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:22     ` Thomas Monjalon
@ 2016-07-20 15:42       ` Kulasek, TomaszX
  0 siblings, 0 replies; 27+ messages in thread
From: Kulasek, TomaszX @ 2016-07-20 15:42 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, July 20, 2016 17:22
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev
> structure
> 
> 2016-07-20 15:13, Ananyev, Konstantin:
> > Hi Thomas,
> >
> > > Hi,
> > >
> > > This patch announces an interesting change in the DPDK design.
> > >
> > > 2016-07-20 16:24, Tomasz Kulasek:
> > > > This is an ABI deprecation notice for DPDK 16.11 in librte_ether
> > > > about changes in rte_eth_dev and rte_eth_desc_lim structures.
> > > >
> > > > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > > > necessary preparations of packet burst to be safely transmitted on
> > > > device for desired HW offloads (set/reset checksum field according
> > > > to the hardware requirements) and check HW constraints (number of
> > > > segments per packet, etc).
> > > >
> > > > While the limitations and requirements may differ for devices, it
> > > > requires to extend rte_eth_dev structure with new function pointer
> > > > "tx_pkt_prep" which can be implemented in the driver to prepare
> > > > and verify packets, in devices specific way, before burst, what
> > > > should to prevent application to send malformed packets.
> > > >
> > > > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max
> > > > and nb_mtu_seg_max, providing an information about max segments in
> > > > TSO and non TSO packets acceptable by device.
> > >
> > > We cannot acknowledge such notice without a prior design discussion.
> > > Please explain why you plan to work on this change and give a draft of
> the new structures (a RFC patch would be ideal).
> >
> > I think it is not really a deprecation note, but announce ABI change for
> rte_ethdev.h structures.
> 
> An ABI break requires a deprecation notice. So it is :)
> 
> > The plan is to implement what was proposed & discussed the following
> thread:
> > http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Please could you summarize it here?

Hi Thomas,

The implementation of rte_eth_tx_prep() will be similar to the rte_eth_tx_burst(), passing same arguments to the driver, so packets can be checked and modified before real burst will be done.

The API for new function will be implemented in the fallowed way:

+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready it
+ * actually sent. A return value equal to *nb_pkts* means that all packets
+ * are valid and ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep) {
+		rte_errno = -ENOTSUP;
+		return 0;
+	}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
+}

Tomasz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
  2016-07-20 15:01 ` Thomas Monjalon
@ 2016-07-21 15:24 ` Tomasz Kulasek
  2016-07-21 22:48   ` Ananyev, Konstantin
  2016-07-28 12:04   ` Avi Kivity
  2016-07-31  7:46 ` [dpdk-dev] [PATCH] " Vlad Zolotarov
  2 siblings, 2 replies; 27+ messages in thread
From: Tomasz Kulasek @ 2016-07-21 15:24 UTC (permalink / raw)
  To: dev

This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
changes in rte_eth_dev and rte_eth_desc_lim structures.

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (eg. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like eg. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1. Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2. Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

eg.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* drop or restore invalid packets */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */



Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 doc/guides/rel_notes/deprecation.rst |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..485aacb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,10 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
+  extended with new function pointer ``tx_pkt_prep`` allowing verification
+  and processing of packet burst to meet HW specific requirements before
+  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
+  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
+  segments limit to be transmitted by device for TSO/non-TSO packets.
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
@ 2016-07-21 22:48   ` Ananyev, Konstantin
  2016-07-27  8:59     ` Thomas Monjalon
  2016-07-31  7:50     ` Vlad Zolotarov
  2016-07-28 12:04   ` Avi Kivity
  1 sibling, 2 replies; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-21 22:48 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev



> 
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
> 
> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.
> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.
> 
> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> 4) Fields in packet may require different initialization (like eg. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.
> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.
> 
> 
> PROPOSAL:
> ---------
> 
> To help user to deal with all these varieties we propose to:
> 
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 2. Also new fields will be introduced in rte_eth_desc_lim:
>    nb_seg_max and nb_mtu_seg_max, providing an information about max
>    segments in TSO and non-TSO packets acceptable by device.
> 
>    This information is useful for application to not create/limit
>    malicious packet.
> 
> 
> APPLICATION (CASE OF USE):
> --------------------------
> 
> 1) Application should to initialize burst of packets to send, set
>    required tx offload flags and required fields, like l2_len, l3_len,
>    l4_len, and tso_segsz
> 
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>    required to send packets through the NIC.
> 
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>    and/or restore invalid if function fails.
> 
> eg.
> 
> 	for (i = 0; i < nb_pkts; i++) {
> 
> 		/* initialize or process packet */
> 
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
> 
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
> 
> 		/* drop or restore invalid packets */
> 
> 	}
> 
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
> 
> 	/* Free any unsent packets. */
> 
> 
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>    will be removed in 16.11.
>    It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 22:48   ` Ananyev, Konstantin
@ 2016-07-27  8:59     ` Thomas Monjalon
  2016-07-27 17:10       ` Jerin Jacob
  2016-07-31  7:50     ` Vlad Zolotarov
  1 sibling, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2016-07-27  8:59 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev, Ananyev, Konstantin

> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> > +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> > +  and processing of packet burst to meet HW specific requirements before
> > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

I think I understand you want to split the TX processing:
	1/ modify/write in mbufs
	2/ write in HW
and let application decide:
	- where the TX prep is done (which core)
	- what to do if the TX prep fail
So adding some processing in this first part becomes "not too expensive" or
"manageable" from the application point of view.

If I well understand the intent,

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
(except typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27  8:59     ` Thomas Monjalon
@ 2016-07-27 17:10       ` Jerin Jacob
  2016-07-27 17:33         ` Ananyev, Konstantin
  0 siblings, 1 reply; 27+ messages in thread
From: Jerin Jacob @ 2016-07-27 17:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Kulasek, TomaszX, dev, Ananyev,  Konstantin

On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > ---
> > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> > > +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> > > +  and processing of packet burst to meet HW specific requirements before
> > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > 
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> I think I understand you want to split the TX processing:
> 	1/ modify/write in mbufs
> 	2/ write in HW
> and let application decide:
> 	- where the TX prep is done (which core)

In what basics applications knows when and where to call tx_pkt_prep in fast path.
if all the time it needs to call before tx_burst then the PMD won't have/don't need this
callback waste cycles in fast path.Is this the expected behavior ?
Anything think it as compile time to make other PMDs wont suffer because
of this change.


> 	- what to do if the TX prep fail
> So adding some processing in this first part becomes "not too expensive" or
> "manageable" from the application point of view.
> 
> If I well understand the intent,
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> (except typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:10       ` Jerin Jacob
@ 2016-07-27 17:33         ` Ananyev, Konstantin
  2016-07-27 17:41           ` Jerin Jacob
  0 siblings, 1 reply; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-27 17:33 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: Kulasek, TomaszX, dev



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, July 27, 2016 6:11 PM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> 
> On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > ---
> > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure
> > > > +will be
> > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > +verification
> > > > +  and processing of packet burst to meet HW specific requirements
> > > > +before
> > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information
> > > > +about number of
> > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > >
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > I think I understand you want to split the TX processing:
> > 	1/ modify/write in mbufs
> > 	2/ write in HW
> > and let application decide:
> > 	- where the TX prep is done (which core)
> 
> In what basics applications knows when and where to call tx_pkt_prep in fast path.
> if all the time it needs to call before tx_burst then the PMD won't have/don't need this callback waste cycles in fast path.Is this the expected
> behavior ?
> Anything think it as compile time to make other PMDs wont suffer because of this change.

Not sure what suffering you are talking about...
Current model - i.e. when application does preparations (or doesn't if none is required)
on its own and just call tx_burst() would still be there.
If the app doesn't want to use tx_prep() by some reason - that still ok,
and decision is up to the particular app. 
Konstantin

> 
> 
> > 	- what to do if the TX prep fail
> > So adding some processing in this first part becomes "not too
> > expensive" or "manageable" from the application point of view.
> >
> > If I well understand the intent,
> >
> > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:33         ` Ananyev, Konstantin
@ 2016-07-27 17:41           ` Jerin Jacob
  2016-07-27 20:51             ` Ananyev, Konstantin
  0 siblings, 1 reply; 27+ messages in thread
From: Jerin Jacob @ 2016-07-27 17:41 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, Kulasek, TomaszX, dev

On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, July 27, 2016 6:11 PM
> > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> > 
> > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > ---
> > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure
> > > > > +will be
> > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > +verification
> > > > > +  and processing of packet burst to meet HW specific requirements
> > > > > +before
> > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information
> > > > > +about number of
> > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > >
> > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > >
> > > I think I understand you want to split the TX processing:
> > > 	1/ modify/write in mbufs
> > > 	2/ write in HW
> > > and let application decide:
> > > 	- where the TX prep is done (which core)
> > 
> > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > if all the time it needs to call before tx_burst then the PMD won't have/don't need this callback waste cycles in fast path.Is this the expected
> > behavior ?
> > Anything think it as compile time to make other PMDs wont suffer because of this change.
> 
> Not sure what suffering you are talking about...
> Current model - i.e. when application does preparations (or doesn't if none is required)
> on its own and just call tx_burst() would still be there.
> If the app doesn't want to use tx_prep() by some reason - that still ok,
> and decision is up to the particular app. 

So my question is in what basics application decides to call the preparation.
Can you tell me the use case in application perspective?
and what if the PMD does not implement that callback then it is of waste
cycles. Right?

Jerin


> Konstantin
> 
> > 
> > 
> > > 	- what to do if the TX prep fail
> > > So adding some processing in this first part becomes "not too
> > > expensive" or "manageable" from the application point of view.
> > >
> > > If I well understand the intent,
> > >
> > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:41           ` Jerin Jacob
@ 2016-07-27 20:51             ` Ananyev, Konstantin
  2016-07-28  2:13               ` Jerin Jacob
  0 siblings, 1 reply; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-27 20:51 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev


> 
> On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > rte_eth_dev structure
> > >
> > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > ---
> > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > +structure will be
> > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > +verification
> > > > > > +  and processing of packet burst to meet HW specific
> > > > > > +requirements before
> > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > +information about number of
> > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > >
> > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > >
> > > > I think I understand you want to split the TX processing:
> > > > 	1/ modify/write in mbufs
> > > > 	2/ write in HW
> > > > and let application decide:
> > > > 	- where the TX prep is done (which core)
> > >
> > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > if all the time it needs to call before tx_burst then the PMD won't
> > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> >
> > Not sure what suffering you are talking about...
> > Current model - i.e. when application does preparations (or doesn't if
> > none is required) on its own and just call tx_burst() would still be there.
> > If the app doesn't want to use tx_prep() by some reason - that still
> > ok, and decision is up to the particular app.
> 
> So my question is in what basics application decides to call the preparation.
> Can you tell me the use case in application perspective?

I suppose one most common use-case when application uses HW TX offloads,
and don't' to cope on its own which L3/L4 header fields need to be filled
for that particular dev_type/hw_offload combination.
Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
Please read the original Tomasz's patch, I think he explained possible use-cases 
with lot of details.  

> and what if the PMD does not implement that callback then it is of waste cycles. Right?

If you refer as lost cycles here something like:
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
then yes.
Though comparing to actual work need to be done for most HW TX offloads,
I think it is neglectable.
Again, as I said before, it is totally voluntary for the application.
Konstantin 

> 
> Jerin
> 
> 
> > Konstantin
> >
> > >
> > >
> > > > 	- what to do if the TX prep fail
> > > > So adding some processing in this first part becomes "not too
> > > > expensive" or "manageable" from the application point of view.
> > > >
> > > > If I well understand the intent,
> > > >
> > > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except
> > > > typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 20:51             ` Ananyev, Konstantin
@ 2016-07-28  2:13               ` Jerin Jacob
  2016-07-28 10:36                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 27+ messages in thread
From: Jerin Jacob @ 2016-07-28  2:13 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Wed, Jul 27, 2016 at 08:51:09PM +0000, Ananyev, Konstantin wrote:
> 
> > 
> > On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > > rte_eth_dev structure
> > > >
> > > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > > ---
> > > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > > +structure will be
> > > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > > +verification
> > > > > > > +  and processing of packet burst to meet HW specific
> > > > > > > +requirements before
> > > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > > +information about number of
> > > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > > >
> > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > >
> > > > > I think I understand you want to split the TX processing:
> > > > > 	1/ modify/write in mbufs
> > > > > 	2/ write in HW
> > > > > and let application decide:
> > > > > 	- where the TX prep is done (which core)
> > > >
> > > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > > if all the time it needs to call before tx_burst then the PMD won't
> > > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> > >
> > > Not sure what suffering you are talking about...
> > > Current model - i.e. when application does preparations (or doesn't if
> > > none is required) on its own and just call tx_burst() would still be there.
> > > If the app doesn't want to use tx_prep() by some reason - that still
> > > ok, and decision is up to the particular app.
> > 
> > So my question is in what basics application decides to call the preparation.
> > Can you tell me the use case in application perspective?
> 
> I suppose one most common use-case when application uses HW TX offloads,
> and don't' to cope on its own which L3/L4 header fields need to be filled
> for that particular dev_type/hw_offload combination.

If it does not cope up then it can skip tx'ing in the actual tx burst
itself and move the "skipped" tx packets to end of the list in the tx
burst so that application can take the action on "skipped" packet after
the tx burst


> Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> Please read the original Tomasz's patch, I think he explained possible use-cases 
> with lot of details.

Sorry, it is not very clear in terms of use cases.

In HW perspective, It it tries to avoid the illegal state. But not sure
calling "back to back" tx prepare and then tx burst how does it improve the
situation as the check illegal state check introduce in actual tx burst
it self.

In SW perspective, its try to avoid sending malformed packets. In my
view the same can achieved with existing tx burst it self as PMD is the
one finally send the packets on the wire.

proposal quote:

1. Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.


> 
> > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> 
> If you refer as lost cycles here something like:
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> then yes.
> Though comparing to actual work need to be done for most HW TX offloads,
> I think it is neglectable.

Not sure.

> Again, as I said before, it is totally voluntary for the application.

Not according to proposal. It can't be too as application has no idea
what PMD driver does with "prep" what is the implication on a HW if
application does not

Jerin

> Konstantin 
> 
> > 
> > Jerin
> > 
> > 
> > > Konstantin
> > >
> > > >
> > > >
> > > > > 	- what to do if the TX prep fail
> > > > > So adding some processing in this first part becomes "not too
> > > > > expensive" or "manageable" from the application point of view.
> > > > >
> > > > > If I well understand the intent,
> > > > >
> > > > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except
> > > > > typos ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28  2:13               ` Jerin Jacob
@ 2016-07-28 10:36                 ` Ananyev, Konstantin
  2016-07-28 11:38                   ` Jerin Jacob
  0 siblings, 1 reply; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-28 10:36 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev



> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > > > rte_eth_dev structure
> > > > >
> > > > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > > > ---
> > > > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > > > +structure will be
> > > > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > > > +verification
> > > > > > > > +  and processing of packet burst to meet HW specific
> > > > > > > > +requirements before
> > > > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > > > +information about number of
> > > > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > > > >
> > > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > > >
> > > > > > I think I understand you want to split the TX processing:
> > > > > > 	1/ modify/write in mbufs
> > > > > > 	2/ write in HW
> > > > > > and let application decide:
> > > > > > 	- where the TX prep is done (which core)
> > > > >
> > > > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > > > if all the time it needs to call before tx_burst then the PMD won't
> > > > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> > > >
> > > > Not sure what suffering you are talking about...
> > > > Current model - i.e. when application does preparations (or doesn't if
> > > > none is required) on its own and just call tx_burst() would still be there.
> > > > If the app doesn't want to use tx_prep() by some reason - that still
> > > > ok, and decision is up to the particular app.
> > >
> > > So my question is in what basics application decides to call the preparation.
> > > Can you tell me the use case in application perspective?
> >
> > I suppose one most common use-case when application uses HW TX offloads,
> > and don't' to cope on its own which L3/L4 header fields need to be filled
> > for that particular dev_type/hw_offload combination.
> 
> If it does not cope up then it can skip tx'ing in the actual tx burst
> itself and move the "skipped" tx packets to end of the list in the tx
> burst so that application can take the action on "skipped" packet after
> the tx burst

Sorry, that's too cryptic for me.
Can you reword it somehow?

> 
> 
> > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > Please read the original Tomasz's patch, I think he explained possible use-cases
> > with lot of details.
> 
> Sorry, it is not very clear in terms of use cases.

Ok, what I meant to say:
Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
- setup ipv4 header cksum field.
- calculate the pseudo header checksum
- setup tcp/udp cksum field.

Rules how these calculations need to be done and which fields need to be updated,
may vary depending on HW underneath and requested offloads.
tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
in a transparent way.

Another main purpose of tx_prep(): for multi-segment packets is to check
that number of segments doesn't exceed  HW limit.
Again right now users have to do that on their own.

> 
> In HW perspective, It it tries to avoid the illegal state. But not sure
> calling "back to back" tx prepare and then tx burst how does it improve the
> situation as the check illegal state check introduce in actual tx burst
> it self.
> 
> In SW perspective, its try to avoid sending malformed packets. In my
> view the same can achieved with existing tx burst it self as PMD is the
> one finally send the packets on the wire.

Ok, so your question is: why not to put that functionality into
tx_burst() itself, right?
For few reasons:
1. putting that functionality into tx_burst() would introduce unnecessary
    slowdown for cases when that functionality is not needed
    (one segment per packet, no HW offloads).
2. User might don't want to use tx_prep() - he/she might have its
    own analog, which he/she belives is faster/smarter,etc.
3.  Having it a s separate function would allow user control when/where
      to call it, let say only for some packets, or probably call tx_prep()
      on one core, and do actual tx_burst() for these packets on the other. 
       
> 
> proposal quote:
> 
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 
> >
> > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> >
> > If you refer as lost cycles here something like:
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> > then yes.
> > Though comparing to actual work need to be done for most HW TX offloads,
> > I think it is neglectable.
> 
> Not sure.
> 
> > Again, as I said before, it is totally voluntary for the application.
> 
> Not according to proposal. It can't be too as application has no idea
> what PMD driver does with "prep" what is the implication on a HW if
> application does not

Why application writer wouldn't have an idea? 
We would document what tx_prep() supposed to do, and in what cases user don't need it.
Then it would be up to the user:
- not to use it at all (one segment per packet, no HW TX offloads)
- not to use tx_prep(), and make necessary preparations himself,
  that what people have to do now.
- use tx_prep()

Konstantin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 10:36                 ` Ananyev, Konstantin
@ 2016-07-28 11:38                   ` Jerin Jacob
  2016-07-28 12:07                     ` Avi Kivity
  2016-07-28 13:01                     ` Ananyev, Konstantin
  0 siblings, 2 replies; 27+ messages in thread
From: Jerin Jacob @ 2016-07-28 11:38 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
> > If it does not cope up then it can skip tx'ing in the actual tx burst
> > itself and move the "skipped" tx packets to end of the list in the tx
> > burst so that application can take the action on "skipped" packet after
> > the tx burst
> 
> Sorry, that's too cryptic for me.
> Can you reword it somehow?

OK. 
1) lets say application requests 32 packets to send it using tx_burst.
2) packets are from p0 to p31
3) in driver due to some reason, it is not able to send the packets due to some
constraints in the driver(say expect p2 and p16 everything else sent
successfully by the driver)
4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and
return 30
5) application can take action on p2 and p16 based the return value of
30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]


> 
> > 
> > 
> > > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > > Please read the original Tomasz's patch, I think he explained possible use-cases
> > > with lot of details.
> > 
> > Sorry, it is not very clear in terms of use cases.
> 
> Ok, what I meant to say:
> Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> - setup ipv4 header cksum field.
> - calculate the pseudo header checksum
> - setup tcp/udp cksum field.
> 
> Rules how these calculations need to be done and which fields need to be updated,
> may vary depending on HW underneath and requested offloads.
> tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
> in a transparent way.

Not sure I understand it completely. Bit contradicting with below
statement
|We would document what tx_prep() supposed to do, and in what cases user
|don't need it.

How about introducing a new ethdev generic eal command-line mode OR
new ethdev_configure hint that PMD driver is in "tx_prep->tx_burst" mode
instead of just tx_burst? That way no fast-path performance degradation
for the PMD that does not need it


> 
> Another main purpose of tx_prep(): for multi-segment packets is to check
> that number of segments doesn't exceed  HW limit.
> Again right now users have to do that on their own.
> 
> > 
> > In HW perspective, It it tries to avoid the illegal state. But not sure
> > calling "back to back" tx prepare and then tx burst how does it improve the
> > situation as the check illegal state check introduce in actual tx burst
> > it self.
> > 
> > In SW perspective, its try to avoid sending malformed packets. In my
> > view the same can achieved with existing tx burst it self as PMD is the
> > one finally send the packets on the wire.
> 
> Ok, so your question is: why not to put that functionality into
> tx_burst() itself, right?
> For few reasons:
> 1. putting that functionality into tx_burst() would introduce unnecessary
>     slowdown for cases when that functionality is not needed
>     (one segment per packet, no HW offloads).

These parameters can be configured on init time

> 2. User might don't want to use tx_prep() - he/she might have its
>     own analog, which he/she belives is faster/smarter,etc.

That's the current mode. Right?
> 3.  Having it a s separate function would allow user control when/where
>       to call it, let say only for some packets, or probably call tx_prep()
>       on one core, and do actual tx_burst() for these packets on the other. 
Why to process it under tx_prep() as application can always process the
packet in one core

> > 
> > proposal quote:
> > 
> > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> >    packet burst to be safely transmitted on device for desired HW
> >    offloads (set/reset checksum field according to the hardware
> >    requirements) and check HW constraints (number of segments per
> >    packet, etc).
> > 
> >    While the limitations and requirements may differ for devices, it
> >    requires to extend rte_eth_dev structure with new function pointer
> >    "tx_pkt_prep" which can be implemented in the driver to prepare and
> >    verify packets, in devices specific way, before burst, what should to
> >    prevent application to send malformed packets.
> > 
> > 
> > >
> > > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> > >
> > > If you refer as lost cycles here something like:
> > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> > > then yes.
> > > Though comparing to actual work need to be done for most HW TX offloads,
> > > I think it is neglectable.
> > 
> > Not sure.
> > 
> > > Again, as I said before, it is totally voluntary for the application.
> > 
> > Not according to proposal. It can't be too as application has no idea
> > what PMD driver does with "prep" what is the implication on a HW if
> > application does not
> 
> Why application writer wouldn't have an idea? 
> We would document what tx_prep() supposed to do, and in what cases user don't need it.

But how he/she detect that on that run-time ?

> Then it would be up to the user:
> - not to use it at all (one segment per packet, no HW TX offloads)

We already have TX flags for that

> - not to use tx_prep(), and make necessary preparations himself,
>   that what people have to do now.
> - use tx_prep()
> 
> Konstantin
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
  2016-07-21 22:48   ` Ananyev, Konstantin
@ 2016-07-28 12:04   ` Avi Kivity
  1 sibling, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2016-07-28 12:04 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: Vladislav Zolotarov, Takuya ASADA

On 07/21/2016 06:24 PM, Tomasz Kulasek wrote:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
>
> As discussed in that thread:
>
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
>
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
>
>   - Max number of fragments per packet allowed
>   - Max number of fragments per TSO segments
>   - The way pseudo-header checksum should be pre-calculated
>   - L3/L4 header fields filling
>   - etc.
>
>
> MOTIVATION:
> -----------
>
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>     However, this work is sometimes required, and now, it's an
>     application issue.
>
> 2) Different hardware may have different requirements for TX offloads,
>     other subset can be supported and so on.
>
> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>     device. These parameters may be vary for different devices.
>
>     For example i40e HW allows 8 fragments per packet, but that is after
>     TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
>
> 4) Fields in packet may require different initialization (like eg. will
>     require pseudo-header checksum precalculation, sometimes in a
>     different way depending on packet type, and so on). Now application
>     needs to care about it.
>
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>     prepare packet burst in acceptable form for specific device.
>
> 6) Some additional checks may be done in debug mode keeping tx_burst
>     implementation clean.

Thanks a lot for this.  Seastar suffered from this issue and had to 
apply NIC-specific workarounds.

The proposal will work well for seastar.

>
> PROPOSAL:
> ---------
>
> To help user to deal with all these varieties we propose to:
>
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>     packet burst to be safely transmitted on device for desired HW
>     offloads (set/reset checksum field according to the hardware
>     requirements) and check HW constraints (number of segments per
>     packet, etc).
>
>     While the limitations and requirements may differ for devices, it
>     requires to extend rte_eth_dev structure with new function pointer
>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>     verify packets, in devices specific way, before burst, what should to
>     prevent application to send malformed packets.
>
> 2. Also new fields will be introduced in rte_eth_desc_lim:
>     nb_seg_max and nb_mtu_seg_max, providing an information about max
>     segments in TSO and non-TSO packets acceptable by device.
>
>     This information is useful for application to not create/limit
>     malicious packet.
>
>
> APPLICATION (CASE OF USE):
> --------------------------
>
> 1) Application should to initialize burst of packets to send, set
>     required tx offload flags and required fields, like l2_len, l3_len,
>     l4_len, and tso_segsz
>
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>     required to send packets through the NIC.
>
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>     and/or restore invalid if function fails.
>
> eg.
>
> 	for (i = 0; i < nb_pkts; i++) {
>
> 		/* initialize or process packet */
>
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
>
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
>
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
>
> 		/* drop or restore invalid packets */
>
> 	}
>
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
>
> 	/* Free any unsent packets. */
>
>
>
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>   * The mempool functions for single/multi producer/consumer are deprecated and
>     will be removed in 16.11.
>     It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 11:38                   ` Jerin Jacob
@ 2016-07-28 12:07                     ` Avi Kivity
  2016-07-28 13:01                     ` Ananyev, Konstantin
  1 sibling, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2016-07-28 12:07 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin; +Cc: Thomas Monjalon, dev



On 07/28/2016 02:38 PM, Jerin Jacob wrote:
> On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
>>> If it does not cope up then it can skip tx'ing in the actual tx burst
>>> itself and move the "skipped" tx packets to end of the list in the tx
>>> burst so that application can take the action on "skipped" packet after
>>> the tx burst
>> Sorry, that's too cryptic for me.
>> Can you reword it somehow?
> OK.
> 1) lets say application requests 32 packets to send it using tx_burst.
> 2) packets are from p0 to p31
> 3) in driver due to some reason, it is not able to send the packets due to some
> constraints in the driver(say expect p2 and p16 everything else sent
> successfully by the driver)
> 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and
> return 30
> 5) application can take action on p2 and p16 based the return value of
> 30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]

That can cause reordering; while it is legal, it reduces tcp performance.

Better to preserve the application-provided order.


>
>>>
>>>> Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
>>>> Please read the original Tomasz's patch, I think he explained possible use-cases
>>>> with lot of details.
>>> Sorry, it is not very clear in terms of use cases.
>> Ok, what I meant to say:
>> Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
>> - setup ipv4 header cksum field.
>> - calculate the pseudo header checksum
>> - setup tcp/udp cksum field.
>>
>> Rules how these calculations need to be done and which fields need to be updated,
>> may vary depending on HW underneath and requested offloads.
>> tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
>> in a transparent way.
> Not sure I understand it completely. Bit contradicting with below
> statement
> |We would document what tx_prep() supposed to do, and in what cases user
> |don't need it.
>
> How about introducing a new ethdev generic eal command-line mode OR
> new ethdev_configure hint that PMD driver is in "tx_prep->tx_burst" mode
> instead of just tx_burst? That way no fast-path performance degradation
> for the PMD that does not need it
>
>
>> Another main purpose of tx_prep(): for multi-segment packets is to check
>> that number of segments doesn't exceed  HW limit.
>> Again right now users have to do that on their own.
>>
>>> In HW perspective, It it tries to avoid the illegal state. But not sure
>>> calling "back to back" tx prepare and then tx burst how does it improve the
>>> situation as the check illegal state check introduce in actual tx burst
>>> it self.
>>>
>>> In SW perspective, its try to avoid sending malformed packets. In my
>>> view the same can achieved with existing tx burst it self as PMD is the
>>> one finally send the packets on the wire.
>> Ok, so your question is: why not to put that functionality into
>> tx_burst() itself, right?
>> For few reasons:
>> 1. putting that functionality into tx_burst() would introduce unnecessary
>>      slowdown for cases when that functionality is not needed
>>      (one segment per packet, no HW offloads).
> These parameters can be configured on init time
>
>> 2. User might don't want to use tx_prep() - he/she might have its
>>      own analog, which he/she belives is faster/smarter,etc.
> That's the current mode. Right?
>> 3.  Having it a s separate function would allow user control when/where
>>        to call it, let say only for some packets, or probably call tx_prep()
>>        on one core, and do actual tx_burst() for these packets on the other.
> Why to process it under tx_prep() as application can always process the
> packet in one core
>
>>> proposal quote:
>>>
>>> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>>>     packet burst to be safely transmitted on device for desired HW
>>>     offloads (set/reset checksum field according to the hardware
>>>     requirements) and check HW constraints (number of segments per
>>>     packet, etc).
>>>
>>>     While the limitations and requirements may differ for devices, it
>>>     requires to extend rte_eth_dev structure with new function pointer
>>>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>>>     verify packets, in devices specific way, before burst, what should to
>>>     prevent application to send malformed packets.
>>>
>>>
>>>>> and what if the PMD does not implement that callback then it is of waste cycles. Right?
>>>> If you refer as lost cycles here something like:
>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
>>>> then yes.
>>>> Though comparing to actual work need to be done for most HW TX offloads,
>>>> I think it is neglectable.
>>> Not sure.
>>>
>>>> Again, as I said before, it is totally voluntary for the application.
>>> Not according to proposal. It can't be too as application has no idea
>>> what PMD driver does with "prep" what is the implication on a HW if
>>> application does not
>> Why application writer wouldn't have an idea?
>> We would document what tx_prep() supposed to do, and in what cases user don't need it.
> But how he/she detect that on that run-time ?
>
>> Then it would be up to the user:
>> - not to use it at all (one segment per packet, no HW TX offloads)
> We already have TX flags for that
>
>> - not to use tx_prep(), and make necessary preparations himself,
>>    that what people have to do now.
>> - use tx_prep()
>>
>> Konstantin
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 11:38                   ` Jerin Jacob
  2016-07-28 12:07                     ` Avi Kivity
@ 2016-07-28 13:01                     ` Ananyev, Konstantin
  2016-07-28 13:58                       ` Olivier MATZ
  2016-07-28 13:59                       ` Jerin Jacob
  1 sibling, 2 replies; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-28 13:01 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, July 28, 2016 12:39 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> 
> On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
> > > If it does not cope up then it can skip tx'ing in the actual tx
> > > burst itself and move the "skipped" tx packets to end of the list in
> > > the tx burst so that application can take the action on "skipped"
> > > packet after the tx burst
> >
> > Sorry, that's too cryptic for me.
> > Can you reword it somehow?
> 
> OK.
> 1) lets say application requests 32 packets to send it using tx_burst.
> 2) packets are from p0 to p31
> 3) in driver due to some reason, it is not able to send the packets due to some constraints in the driver(say expect p2 and p16 everything
> else sent successfully by the driver)
> 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and return 30
> 5) application can take action on p2 and p16 based the return value of 30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]

That would introduce packets reordering and unnecessary complicate the PMD TX functions.
Again it would require changes in *all* existing PMD tx functions.
So we don't plan to do things that way.

> 
> 
> >
> > >
> > >
> > > > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > > > Please read the original Tomasz's patch, I think he explained
> > > > possible use-cases with lot of details.
> > >
> > > Sorry, it is not very clear in terms of use cases.
> >
> > Ok, what I meant to say:
> > Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> > - setup ipv4 header cksum field.
> > - calculate the pseudo header checksum
> > - setup tcp/udp cksum field.
> >
> > Rules how these calculations need to be done and which fields need to
> > be updated, may vary depending on HW underneath and requested offloads.
> > tx_prep() - supposed to hide all these nuances from user and allow him
> > to use TX HW offloads in a transparent way.
> 
> Not sure I understand it completely. Bit contradicting with below statement
> |We would document what tx_prep() supposed to do, and in what cases user
> |don't need it.

How that contradicts?
Right now to make HW TX offloads to work user is required to do particular actions:
1. set mbuf.ol_flags properly.
2. setup mbuf.tx_offload fields properly.
3. update L3/L4 header fields in a particular way.

We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
Though if he still prefers to do #3  by himself - that's ok too.
 
> 
> How about introducing a new ethdev generic eal command-line mode OR new ethdev_configure hint that PMD driver is in "tx_prep-
> >tx_burst" mode instead of just tx_burst? That way no fast-path performance degradation for the PMD that does not need it

User might want to send different packets over different devices,
or even over different queues over the same device.
Or even he might want to call tx_prep() for one group of packets,
but skip for different group of packets for the same TX queue. 
So I think we should allow user to decide when/where to call it.

> 
> 
> >
> > Another main purpose of tx_prep(): for multi-segment packets is to
> > check that number of segments doesn't exceed  HW limit.
> > Again right now users have to do that on their own.
> >
> > >
> > > In HW perspective, It it tries to avoid the illegal state. But not
> > > sure calling "back to back" tx prepare and then tx burst how does it
> > > improve the situation as the check illegal state check introduce in
> > > actual tx burst it self.
> > >
> > > In SW perspective, its try to avoid sending malformed packets. In my
> > > view the same can achieved with existing tx burst it self as PMD is
> > > the one finally send the packets on the wire.
> >
> > Ok, so your question is: why not to put that functionality into
> > tx_burst() itself, right?
> > For few reasons:
> > 1. putting that functionality into tx_burst() would introduce unnecessary
> >     slowdown for cases when that functionality is not needed
> >     (one segment per packet, no HW offloads).
> 
> These parameters can be configured on init time

No always, see above.

> 
> > 2. User might don't want to use tx_prep() - he/she might have its
> >     own analog, which he/she belives is faster/smarter,etc.
> 
> That's the current mode. Right?

Yes.

> > 3.  Having it a s separate function would allow user control when/where
> >       to call it, let say only for some packets, or probably call tx_prep()
> >       on one core, and do actual tx_burst() for these packets on the other.


> Why to process it under tx_prep() as application can always process the packet in one core

Because not every application wants to do it over the same core.
Some apps would like to do it on the same core, some apps would like to do it on different core.
With proposed API both models are possible.

> 
> > >
> > > proposal quote:
> > >
> > > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> > >    packet burst to be safely transmitted on device for desired HW
> > >    offloads (set/reset checksum field according to the hardware
> > >    requirements) and check HW constraints (number of segments per
> > >    packet, etc).
> > >
> > >    While the limitations and requirements may differ for devices, it
> > >    requires to extend rte_eth_dev structure with new function pointer
> > >    "tx_pkt_prep" which can be implemented in the driver to prepare and
> > >    verify packets, in devices specific way, before burst, what should to
> > >    prevent application to send malformed packets.
> > >
> > >
> > > >
> > > > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> > > >
> > > > If you refer as lost cycles here something like:
> > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> > > > yes.
> > > > Though comparing to actual work need to be done for most HW TX
> > > > offloads, I think it is neglectable.
> > >
> > > Not sure.
> > >
> > > > Again, as I said before, it is totally voluntary for the application.
> > >
> > > Not according to proposal. It can't be too as application has no
> > > idea what PMD driver does with "prep" what is the implication on a
> > > HW if application does not
> >
> > Why application writer wouldn't have an idea?
> > We would document what tx_prep() supposed to do, and in what cases user don't need it.
> 
> But how he/she detect that on that run-time ?

By the application logic for example.
If let say is doing the l2fwd for that group of packets, it would know
that it doesn't need to do tx_prep().

To be honest, I don't understand what is your concern here.
That proposed change doesn't break any existing functionality,
doesn't introduce any new requirements to the existing API, 
and wouldn't introduce any performance regression for existing apps.
It is a an extension, and user is free not to use it, if it doesn't fit his needs.
>From other side there are users who are interested in that functionality,
and they do have use-cases for  it.
So what worries you?
Konstantin

> 
> > Then it would be up to the user:
> > - not to use it at all (one segment per packet, no HW TX offloads)
> 
> We already have TX flags for that
> 
> > - not to use tx_prep(), and make necessary preparations himself,
> >   that what people have to do now.
> > - use tx_prep()
> >
> > Konstantin
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:01                     ` Ananyev, Konstantin
@ 2016-07-28 13:58                       ` Olivier MATZ
  2016-07-28 14:21                         ` Ananyev, Konstantin
  2016-07-28 13:59                       ` Jerin Jacob
  1 sibling, 1 reply; 27+ messages in thread
From: Olivier MATZ @ 2016-07-28 13:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, Jerin Jacob; +Cc: Thomas Monjalon, dev

Hi,

Jumping into this thread, it looks it's the last pending patch remaining 
for the release.

For reference, the idea of tx_prep() was mentionned some time ago in
http://dpdk.org/ml/archives/dev/2014-May/002504.html

Few comments below.

On 07/28/2016 03:01 PM, Ananyev, Konstantin wrote:
> Right now to make HW TX offloads to work user is required to do particular actions:
> 1. set mbuf.ol_flags properly.
> 2. setup mbuf.tx_offload fields properly.
> 3. update L3/L4 header fields in a particular way.
>
> We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
> Though if he still prefers to do #3  by himself - that's ok too.

I think moving #3 out of the application is a good idea. Today, for TSO, 
the offload dpdk API requires to set a specific pseudo header checksum 
(which does not include the ip len, as expected by Intel drivers), and 
set the IP checksum to 0.

In our app, the network stack sets the TCP checksum to the standard 
pseudo header checksum, and before sending the mbuf:
- packets are split in sw if the driver does not support tso
- the tcp csum is patched to match dpdk api if the driver supports tso

In the patchs I've recently sent adding tso support for virtio-pmd, it 
conforms to the dpdk API (phdr csum without ip len), so the tx function 
need to patch the mbuf data inside the driver, which is something what 
we want to avoid, for some good reasons explained by Konstantin.

So, I think having a tx_prep would also be the occasion to standardize a 
bit the dpdk offload api, and let the driver-specific stuff in tx_prep().

Konstantin, any opinion about this?

>>> Another main purpose of tx_prep(): for multi-segment packets is to
>>> check that number of segments doesn't exceed  HW limit.
>>> Again right now users have to do that on their own.

If calling tx_prep() is optional, does it mean that this check may be 
done twice? (once in tx_prep() and once in tx_burst())

What would be the advantage of doing this check in tx_prep() instead of 
keeping it in tx_burst(), as it does not touch the mbuf data?

>>> 3.  Having it a s separate function would allow user control when/where
>>>        to call it, let say only for some packets, or probably call tx_prep()
>>>        on one core, and do actual tx_burst() for these packets on the other.

Yes, from what I remember, the pipeline model was the main reason why we 
do not just modify the packet in tx_burst(). Right?

>>>>> If you refer as lost cycles here something like:
>>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
>>>>> yes.
>>>>> Though comparing to actual work need to be done for most HW TX
>>>>> offloads, I think it is neglectable.
>>>>
>>>> Not sure.

I think doing this test on a per-bulk basis should not impact performance.

> To be honest, I don't understand what is your concern here.
> That proposed change doesn't break any existing functionality,
> doesn't introduce any new requirements to the existing API,
> and wouldn't introduce any performance regression for existing apps.
> It is a an extension, and user is free not to use it, if it doesn't fit his needs.
>  From other side there are users who are interested in that functionality,
> and they do have use-cases for  it.

In my opinion, using tx_prep() will implicitly become mandatory as soon 
as the application want to do offload. An application that won't use it 
will have to prepare the mbuf, and this preparation will depend on the 
device, which is not acceptable inside an application.


So, to conclude, the api change notification looks good to me, even if 
there is still some room to discuss the implementation details.


Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:01                     ` Ananyev, Konstantin
  2016-07-28 13:58                       ` Olivier MATZ
@ 2016-07-28 13:59                       ` Jerin Jacob
  2016-07-28 14:52                         ` Thomas Monjalon
  1 sibling, 1 reply; 27+ messages in thread
From: Jerin Jacob @ 2016-07-28 13:59 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Thu, Jul 28, 2016 at 01:01:16PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > > >
> > > > Not according to proposal. It can't be too as application has no
> > > > idea what PMD driver does with "prep" what is the implication on a
> > > > HW if application does not
> > >
> > > Why application writer wouldn't have an idea?
> > > We would document what tx_prep() supposed to do, and in what cases user don't need it.
> > 
> > But how he/she detect that on that run-time ?
> 
> By the application logic for example.
> If let say is doing the l2fwd for that group of packets, it would know
> that it doesn't need to do tx_prep().
> 
> To be honest, I don't understand what is your concern here.
> That proposed change doesn't break any existing functionality,
> doesn't introduce any new requirements to the existing API, 
> and wouldn't introduce any performance regression for existing apps.

Yes for the existing application but no for ANY application that uses tx_prep() in future,
that run on the PMD where callback is NULL(one/two PMDs vs N PMDs)

> It is a an extension, and user is free not to use it, if it doesn't fit his needs.

If it is a workaround for a specific HW then why to change the normative
"fastpath" ethdev specification. You could give your fixup as internal
PMD driver routine and be done with it. It is as simple as that.

> From other side there are users who are interested in that functionality,
> and they do have use-cases for  it.
> So what worries you?
Above things worries me, I wouldn't have cared if the changes are not comes
in fastpath and I don't think this sort of issues will never get fixed any time
soon in this community.

So I given up.

Jerin

> Konstantin
> 
> > 
> > > Then it would be up to the user:
> > > - not to use it at all (one segment per packet, no HW TX offloads)
> > 
> > We already have TX flags for that
> > 
> > > - not to use tx_prep(), and make necessary preparations himself,
> > >   that what people have to do now.
> > > - use tx_prep()
> > >
> > > Konstantin
> > >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:58                       ` Olivier MATZ
@ 2016-07-28 14:21                         ` Ananyev, Konstantin
  0 siblings, 0 replies; 27+ messages in thread
From: Ananyev, Konstantin @ 2016-07-28 14:21 UTC (permalink / raw)
  To: Olivier MATZ, Jerin Jacob; +Cc: Thomas Monjalon, dev

Hi Olivier,

> 
> Hi,
> 
> Jumping into this thread, it looks it's the last pending patch remaining for the release.
> 
> For reference, the idea of tx_prep() was mentionned some time ago in http://dpdk.org/ml/archives/dev/2014-May/002504.html
> 
> Few comments below.
> 
> On 07/28/2016 03:01 PM, Ananyev, Konstantin wrote:
> > Right now to make HW TX offloads to work user is required to do particular actions:
> > 1. set mbuf.ol_flags properly.
> > 2. setup mbuf.tx_offload fields properly.
> > 3. update L3/L4 header fields in a particular way.
> >
> > We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
> > Though if he still prefers to do #3  by himself - that's ok too.
> 
> I think moving #3 out of the application is a good idea. Today, for TSO, the offload dpdk API requires to set a specific pseudo header
> checksum (which does not include the ip len, as expected by Intel drivers), and set the IP checksum to 0.
> 
> In our app, the network stack sets the TCP checksum to the standard pseudo header checksum, and before sending the mbuf:
> - packets are split in sw if the driver does not support tso
> - the tcp csum is patched to match dpdk api if the driver supports tso
> 
> In the patchs I've recently sent adding tso support for virtio-pmd, it conforms to the dpdk API (phdr csum without ip len), so the tx function
> need to patch the mbuf data inside the driver, which is something what we want to avoid, for some good reasons explained by Konstantin.

Yep, that would be another good use-case for tx_prep() I suppose.

> 
> So, I think having a tx_prep would also be the occasion to standardize a bit the dpdk offload api, and let the driver-specific stuff in tx_prep().
> 
> Konstantin, any opinion about this?

Yes, that sounds like a good thing to me.

> 
> >>> Another main purpose of tx_prep(): for multi-segment packets is to
> >>> check that number of segments doesn't exceed  HW limit.
> >>> Again right now users have to do that on their own.
> 
> If calling tx_prep() is optional, does it mean that this check may be done twice? (once in tx_prep() and once in tx_burst())

I meant 'optional' in a way, that if user doesn't want to use tx_prep() and
do step #3 from the above on his own (what happens now) that is still ok.
But I think step #3 (modify packet's data) still needs to be done before tx_burst()  is called for the packets.

> 
> What would be the advantage of doing this check in tx_prep() instead of keeping it in tx_burst(), as it does not touch the mbuf data?
> 
> >>> 3.  Having it a s separate function would allow user control when/where
> >>>        to call it, let say only for some packets, or probably call tx_prep()
> >>>        on one core, and do actual tx_burst() for these packets on the other.
> 
> Yes, from what I remember, the pipeline model was the main reason why we do not just modify the packet in tx_burst(). Right?

Yes.

> 
> >>>>> If you refer as lost cycles here something like:
> >>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> >>>>> yes.
> >>>>> Though comparing to actual work need to be done for most HW TX
> >>>>> offloads, I think it is neglectable.
> >>>>
> >>>> Not sure.
> 
> I think doing this test on a per-bulk basis should not impact performance.
> 
> > To be honest, I don't understand what is your concern here.
> > That proposed change doesn't break any existing functionality, doesn't
> > introduce any new requirements to the existing API, and wouldn't
> > introduce any performance regression for existing apps.
> > It is a an extension, and user is free not to use it, if it doesn't fit his needs.
> >  From other side there are users who are interested in that
> > functionality, and they do have use-cases for  it.
> 
> In my opinion, using tx_prep() will implicitly become mandatory as soon as the application want to do offload. An application that won't use
> it will have to prepare the mbuf, and this preparation will depend on the device, which is not acceptable inside an application.

Yes, I also hope that most apps that do use TX offloads will start to use it,
as I think it will be much more convenient way, then what we have right now.
I just liked to emphasis that user wouldn't be forced to.
Konstantin

> 
> 
> So, to conclude, the api change notification looks good to me, even if there is still some room to discuss the implementation details.
> 
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:59                       ` Jerin Jacob
@ 2016-07-28 14:52                         ` Thomas Monjalon
  2016-07-28 16:25                           ` Jerin Jacob
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2016-07-28 14:52 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Ananyev, Konstantin, dev

2016-07-28 19:29, Jerin Jacob:
> Above things worries me, I wouldn't have cared if the changes are not comes
> in fastpath and I don't think this sort of issues will never get fixed any time
> soon in this community.
> 
> So I given up.

I feel something goes wrong here but I cannot understand your sentence.
Please could you reword/explain Jerin?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 14:52                         ` Thomas Monjalon
@ 2016-07-28 16:25                           ` Jerin Jacob
  2016-07-28 17:07                             ` Thomas Monjalon
  0 siblings, 1 reply; 27+ messages in thread
From: Jerin Jacob @ 2016-07-28 16:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ananyev, Konstantin, dev

On Thu, Jul 28, 2016 at 04:52:45PM +0200, Thomas Monjalon wrote:
> 2016-07-28 19:29, Jerin Jacob:
> > Above things worries me, I wouldn't have cared if the changes are not comes
> > in fastpath and I don't think this sort of issues will never get fixed any time
> > soon in this community.
> > 
> > So I given up.
> 
> I feel something goes wrong here but I cannot understand your sentence.
> Please could you reword/explain Jerin?

I guess you have removed the context from the email. Never mind.

1) IMHO, Introducing a new fast path API which has "performance impact"
on existing other PMD should get the consensus from the other PMD maintainers.
At least, bare minimum, send a patch much in advance with the
implementation of ethdev API as well as PMD
driver implementation to get feedback from other developers _before_ ABI
change announcement rather just debating on hypothetical points.

2) What I can understand from the discussion is that it is the
workaround for an HW limitation.
At this point, I am not sure tx_prep is the only way to address it and
do other PMD have similar
restriction?. If yes, Can we have abstract it in a proper way the usage
part will be very clear from PMD and application perspective?

Jerin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 16:25                           ` Jerin Jacob
@ 2016-07-28 17:07                             ` Thomas Monjalon
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2016-07-28 17:07 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin, Tomasz Kulasek; +Cc: dev

2016-07-28 21:55, Jerin Jacob:
> On Thu, Jul 28, 2016 at 04:52:45PM +0200, Thomas Monjalon wrote:
> > 2016-07-28 19:29, Jerin Jacob:
> > > Above things worries me, I wouldn't have cared if the changes are not comes
> > > in fastpath and I don't think this sort of issues will never get fixed any time
> > > soon in this community.
> > > 
> > > So I given up.
> > 
> > I feel something goes wrong here but I cannot understand your sentence.
> > Please could you reword/explain Jerin?
> 
> I guess you have removed the context from the email. Never mind.
> 
> 1) IMHO, Introducing a new fast path API which has "performance impact"
> on existing other PMD should get the consensus from the other PMD maintainers.
> At least, bare minimum, send a patch much in advance with the
> implementation of ethdev API as well as PMD
> driver implementation to get feedback from other developers _before_ ABI
> change announcement rather just debating on hypothetical points.

I totally agree with you and it was my first comment in this thread:
	http://dpdk.org/ml/archives/dev/2016-July/044366.html
Unfortunately it is difficult to have a formal process so it is not
so strict currently. You are welcome to suggest how to improve the
process for the next releases.

> 2) What I can understand from the discussion is that it is the
> workaround for an HW limitation.
> At this point, I am not sure tx_prep is the only way to address it and
> do other PMD have similar
> restriction?. If yes, Can we have abstract it in a proper way the usage
> part will be very clear from PMD and application perspective?

I feel the tx_prep can be interesting to solve a current problem.
However, as you say, there are maybe other solutions to consider.
That's why I think we can keep this deprecation notice and follow up
with a patch-based discussion. We will be able to discard this change
if something better is found.
As an example, we have just removed a deprecation notice which has
never been implemented:
	http://dpdk.org/browse/dpdk/commit/?id=16695af340
I know this process is not perfect and the ethdev API is far from perfect,
so we must continue improving our process to define a good API.

Konstantin, Tomasz,
I generally prefer waiting for a consensus. For this case, I'll make an
exception and apply the deprecation notice.
Please make an effort to better explain your next patches and meet
a clear consensus. We'll review your patches very carefully and keep
the right to reject them.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
  2016-07-20 15:01 ` Thomas Monjalon
  2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
@ 2016-07-31  7:46 ` Vlad Zolotarov
  2016-07-31  8:10   ` Vlad Zolotarov
  2 siblings, 1 reply; 27+ messages in thread
From: Vlad Zolotarov @ 2016-07-31  7:46 UTC (permalink / raw)
  To: Tomasz Kulasek, dev



On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
>
> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> necessary preparations of packet burst to be safely transmitted on
> device for desired HW offloads (set/reset checksum field according to
> the hardware requirements) and check HW constraints (number of segments
> per packet, etc).
>
> While the limitations and requirements may differ for devices, it
> requires to extend rte_eth_dev structure with new function pointer
> "tx_pkt_prep" which can be implemented in the driver to prepare and
> verify packets, in devices specific way, before burst, what should to
> prevent application to send malformed packets.
>
> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> nb_mtu_seg_max, providing an information about max segments in TSO and
> non TSO packets acceptable by device.
>
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Acked-by: Vlad Zolotarov <vladz@scylladb.com>

> ---
>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>   * The mempool functions for single/multi producer/consumer are deprecated and
>     will be removed in 16.11.
>     It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 22:48   ` Ananyev, Konstantin
  2016-07-27  8:59     ` Thomas Monjalon
@ 2016-07-31  7:50     ` Vlad Zolotarov
  1 sibling, 0 replies; 27+ messages in thread
From: Vlad Zolotarov @ 2016-07-31  7:50 UTC (permalink / raw)
  To: Ananyev, Konstantin, Kulasek, TomaszX, dev



On 07/22/2016 01:48 AM, Ananyev, Konstantin wrote:
>
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> As discussed in that thread:
>>
>> http://dpdk.org/ml/archives/dev/2015-September/023603.html
>>
>> Different NIC models depending on HW offload requested might impose
>> different requirements on packets to be TX-ed in terms of:
>>
>>   - Max number of fragments per packet allowed
>>   - Max number of fragments per TSO segments
>>   - The way pseudo-header checksum should be pre-calculated
>>   - L3/L4 header fields filling
>>   - etc.
>>
>>
>> MOTIVATION:
>> -----------
>>
>> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>>     However, this work is sometimes required, and now, it's an
>>     application issue.
>>
>> 2) Different hardware may have different requirements for TX offloads,
>>     other subset can be supported and so on.
>>
>> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>>     device. These parameters may be vary for different devices.
>>
>>     For example i40e HW allows 8 fragments per packet, but that is after
>>     TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
>>
>> 4) Fields in packet may require different initialization (like eg. will
>>     require pseudo-header checksum precalculation, sometimes in a
>>     different way depending on packet type, and so on). Now application
>>     needs to care about it.
>>
>> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>>     prepare packet burst in acceptable form for specific device.
>>
>> 6) Some additional checks may be done in debug mode keeping tx_burst
>>     implementation clean.
>>
>>
>> PROPOSAL:
>> ---------
>>
>> To help user to deal with all these varieties we propose to:
>>
>> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>>     packet burst to be safely transmitted on device for desired HW
>>     offloads (set/reset checksum field according to the hardware
>>     requirements) and check HW constraints (number of segments per
>>     packet, etc).
>>
>>     While the limitations and requirements may differ for devices, it
>>     requires to extend rte_eth_dev structure with new function pointer
>>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>>     verify packets, in devices specific way, before burst, what should to
>>     prevent application to send malformed packets.
>>
>> 2. Also new fields will be introduced in rte_eth_desc_lim:
>>     nb_seg_max and nb_mtu_seg_max, providing an information about max
>>     segments in TSO and non-TSO packets acceptable by device.
>>
>>     This information is useful for application to not create/limit
>>     malicious packet.
>>
>>
>> APPLICATION (CASE OF USE):
>> --------------------------
>>
>> 1) Application should to initialize burst of packets to send, set
>>     required tx offload flags and required fields, like l2_len, l3_len,
>>     l4_len, and tso_segsz
>>
>> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>>     required to send packets through the NIC.
>>
>> 3) The result of rte_eth_tx_prep can be used to send valid packets
>>     and/or restore invalid if function fails.
>>
>> eg.
>>
>> 	for (i = 0; i < nb_pkts; i++) {
>>
>> 		/* initialize or process packet */
>>
>> 		bufs[i]->tso_segsz = 800;
>> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
>> 				| PKT_TX_IP_CKSUM;
>> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
>> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
>> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
>> 	}
>>
>> 	/* Prepare burst of TX packets */
>> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
>>
>> 	if (nb_prep < nb_pkts) {
>> 		printf("tx_prep failed\n");
>>
>> 		/* drop or restore invalid packets */
>>
>> 	}
>>
>> 	/* Send burst of TX packets */
>> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
>>
>> 	/* Free any unsent packets. */
>>
>>
>>
>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Acked-by: Vlad Zolotarov <vladz@scylladb.com>

>> ---
>>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..485aacb 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,10 @@ Deprecation Notices
>>   * The mempool functions for single/multi producer/consumer are deprecated and
>>     will be removed in 16.11.
>>     It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
>> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
>> +  and processing of packet burst to meet HW specific requirements before
>> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
>> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
>> +  segments limit to be transmitted by device for TSO/non-TSO packets.
>> --
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
>> 1.7.9.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-31  7:46 ` [dpdk-dev] [PATCH] " Vlad Zolotarov
@ 2016-07-31  8:10   ` Vlad Zolotarov
  0 siblings, 0 replies; 27+ messages in thread
From: Vlad Zolotarov @ 2016-07-31  8:10 UTC (permalink / raw)
  To: Tomasz Kulasek, dev



On 07/31/2016 10:46 AM, Vlad Zolotarov wrote:
>
>
> On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
>> necessary preparations of packet burst to be safely transmitted on
>> device for desired HW offloads (set/reset checksum field according to
>> the hardware requirements) and check HW constraints (number of segments
>> per packet, etc).
>>
>> While the limitations and requirements may differ for devices, it
>> requires to extend rte_eth_dev structure with new function pointer
>> "tx_pkt_prep" which can be implemented in the driver to prepare and
>> verify packets, in devices specific way, before burst, what should to
>> prevent application to send malformed packets.
>>
>> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
>> nb_mtu_seg_max, providing an information about max segments in TSO and
>> non TSO packets acceptable by device.
>>
>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>
> Acked-by: Vlad Zolotarov <vladz@scylladb.com>

One small comment however.
Although this function is a must we need a way to clearly understand 
which one the clusters are malformed since dropping the whole bulk is 
usually not an option and sending the malformed packets anyway may cause 
a HW hang, thus not an option as well.
Another thing - I've pulled the current master and I couldn't find the 
way an application may query the mentioned Tx offload HW limitation, 
e.g. maximum number of segments.
Knowing this limitation would avoid extra liniarization.

thanks,
vlad

>
>> ---
>>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst 
>> b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..485aacb 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,10 @@ Deprecation Notices
>>   * The mempool functions for single/multi producer/consumer are 
>> deprecated and
>>     will be removed in 16.11.
>>     It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure 
>> will be
>> +  extended with new function pointer ``tx_pkt_prep`` allowing 
>> verification
>> +  and processing of packet burst to meet HW specific requirements 
>> before
>> +  transmit. Also new fields will be added to the 
>> ``rte_eth_desc_lim`` structure:
>> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about 
>> number of
>> +  segments limit to be transmitted by device for TSO/non-TSO packets.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-07-31  8:10 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20 14:24 [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
2016-07-20 15:01 ` Thomas Monjalon
2016-07-20 15:13   ` Ananyev, Konstantin
2016-07-20 15:22     ` Thomas Monjalon
2016-07-20 15:42       ` Kulasek, TomaszX
2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
2016-07-21 22:48   ` Ananyev, Konstantin
2016-07-27  8:59     ` Thomas Monjalon
2016-07-27 17:10       ` Jerin Jacob
2016-07-27 17:33         ` Ananyev, Konstantin
2016-07-27 17:41           ` Jerin Jacob
2016-07-27 20:51             ` Ananyev, Konstantin
2016-07-28  2:13               ` Jerin Jacob
2016-07-28 10:36                 ` Ananyev, Konstantin
2016-07-28 11:38                   ` Jerin Jacob
2016-07-28 12:07                     ` Avi Kivity
2016-07-28 13:01                     ` Ananyev, Konstantin
2016-07-28 13:58                       ` Olivier MATZ
2016-07-28 14:21                         ` Ananyev, Konstantin
2016-07-28 13:59                       ` Jerin Jacob
2016-07-28 14:52                         ` Thomas Monjalon
2016-07-28 16:25                           ` Jerin Jacob
2016-07-28 17:07                             ` Thomas Monjalon
2016-07-31  7:50     ` Vlad Zolotarov
2016-07-28 12:04   ` Avi Kivity
2016-07-31  7:46 ` [dpdk-dev] [PATCH] " Vlad Zolotarov
2016-07-31  8:10   ` Vlad Zolotarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).