DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/6] add Tx preparation
@ 2016-08-26 16:22 Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 1/6] ethdev: " Tomasz Kulasek
                   ` (7 more replies)
  0 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: add txprep engine

 app/test-pmd/Makefile            |    3 +-
 app/test-pmd/testpmd.c           |    1 +
 app/test-pmd/testpmd.h           |    1 +
 app/test-pmd/txprep.c            |  412 ++++++++++++++++++++++++++++++++++++++
 drivers/net/e1000/e1000_ethdev.h |   11 +
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   46 ++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +
 drivers/net/e1000/igb_rxtx.c     |   50 ++++-
 drivers/net/fm10k/fm10k.h        |    9 +
 drivers/net/fm10k/fm10k_ethdev.c |    5 +
 drivers/net/fm10k/fm10k_rxtx.c   |   87 +++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   98 ++++++++-
 drivers/net/i40e/i40e_rxtx.h     |   10 +
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   74 +++++++
 lib/librte_mbuf/rte_mbuf.h       |    8 +
 lib/librte_net/Makefile          |    2 +-
 lib/librte_net/rte_pkt.h         |  132 ++++++++++++
 23 files changed, 1048 insertions(+), 9 deletions(-)
 create mode 100644 app/test-pmd/txprep.c
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-09-08  7:28   ` Jerin Jacob
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 2/6] e1000: " Tomasz Kulasek
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this function is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 lib/librte_ether/rte_ethdev.h |   74 +++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    8 +++
 lib/librte_net/Makefile       |    2 +-
 lib/librte_net/rte_pkt.h      |  132 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b0fe033..02569ca 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -696,6 +697,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1181,6 +1184,12 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet
+		device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1626,6 +1635,7 @@ enum rte_eth_dev_type {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2833,6 +2843,70 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep) {
+		rte_errno = -ENOTSUP;
+		return 0;
+	}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7ea66ed..72fd352 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -211,6 +211,14 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index ad2e482..b5abe84 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
 
 
 include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..a3c3e3c
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,132 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG) {
+
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM))
+			return -EINVAL;
+
+		if (m->tso_segsz == 0)
+			return -EINVAL;
+
+	}
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+
+	if (m->ol_flags & PKT_TX_IPV4) {
+		ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+
+		if (m->ol_flags & PKT_TX_IP_CKSUM)
+			ipv4_hdr->hdr_checksum = 0;
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		}
+	} else if (m->ol_flags & PKT_TX_IPV6) {
+		ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *, m->l2_len);
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 2/6] e1000: add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 1/6] ethdev: " Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 3/6] fm10k: " Tomasz Kulasek
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 +++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   46 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   50 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index ad104ed..1baf268 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1073,6 +1074,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 6d8750a..25ea5b3 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -67,6 +67,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,11 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -619,6 +625,44 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & E1000_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4e9e6a3..1f7ba4d 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 9d80a0b..1593dc2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -617,6 +618,52 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & IGB_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1363,6 +1410,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 3/6] fm10k: add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 1/6] ethdev: " Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 2/6] e1000: " Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 4/6] i40e: " Tomasz Kulasek
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 ++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 +++
 drivers/net/fm10k/fm10k_rxtx.c   |   87 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..83d2bfb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,12 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
+uint16_t fm10k_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 01f4a72..c9f450e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1441,6 +1441,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2749,8 +2751,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = fm10k_prep_pkts_simple;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2829,6 +2833,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..72b1931 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,12 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +590,81 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+	struct fm10k_tx_queue *q = tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & PKT_TX_TCP_SEG) {
+			uint8_t hdrlen = m->outer_l2_len + m->outer_l3_len + m->l2_len +
+				m->l3_len + m->l4_len;
+
+			if (q->hw_ring[q->next_free].flags & FM10K_TXD_FLAG_FTAG)
+				hdrlen += sizeof(struct fm10k_ftag);
+
+			if ((hdrlen < FM10K_TSO_MIN_HEADERLEN) ||
+					(hdrlen > FM10K_TSO_MAX_HEADERLEN) ||
+					(m->tso_segsz < FM10K_TSO_MINMSS)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & FM10K_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* fm10k vector TX path doesn't support tx offloads */
+uint16_t
+fm10k_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 4/6] i40e: add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
                   ` (2 preceding siblings ...)
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 3/6] fm10k: " Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 5/6] ixgbe: " Tomasz Kulasek
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   98 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |   10 ++++
 3 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d0aeb70..52a2abb 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -948,6 +948,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2614,6 +2615,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..1cd34f7 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,14 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1930,6 +1940,90 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if ((ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(ol_flags & I40E_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
+/* i40e simple path doesn't support tx offloads */
+uint16_t
+i40e_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+			rte_errno = -1;
+			return i;
+		}
+
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -1;
+			return i;
+		}
+	}
+
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3271,9 +3365,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = i40e_prep_pkts_simple;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..2ff7862 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,10 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+uint16_t i40e_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 5/6] ixgbe: add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
                   ` (3 preceding siblings ...)
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 4/6] i40e: " Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 6/6] testpmd: add txprep engine Tomasz Kulasek
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 4 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index fb618ef..1509979 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..09d96de 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,12 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t ixgbe_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8a306b0..87defa0 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -71,6 +71,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -906,6 +907,84 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & IXGBE_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* ixgbe simple path as well as vector TX doesn't support tx offloads */
+uint16_t
+ixgbe_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 		} else
 #endif
 		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
+		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
 		PMD_INIT_LOG(DEBUG,
@@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 6/6] testpmd: add txprep engine
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
                   ` (4 preceding siblings ...)
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 5/6] ixgbe: " Tomasz Kulasek
@ 2016-08-26 16:22 ` Tomasz Kulasek
  2016-08-26 17:31 ` [dpdk-dev] [PATCH 0/6] add Tx preparation Stephen Hemminger
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:22 UTC (permalink / raw)
  To: dev

This patch adds txprep engine to the testpmd application.

Txprep engine is intended to verify Tx preparation functionality
implemented in pmd driver.

It's based on the default "io" engine with the folowing changes:
 - Tx HW offloads are reset in incoming packet,
 - burst is passed to the Tx preparation function before tx burst,
 - added "txsplit" and "tso" functionality for outgoing packets.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/Makefile  |    3 +-
 app/test-pmd/testpmd.c |    1 +
 app/test-pmd/testpmd.h |    1 +
 app/test-pmd/txprep.c  |  412 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 416 insertions(+), 1 deletion(-)
 create mode 100644 app/test-pmd/txprep.c

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 2a0b5a5..7540244 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -49,6 +49,7 @@ SRCS-y += parameters.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline.c
 SRCS-y += config.c
 SRCS-y += iofwd.c
+SRCS-y += txprep.c
 SRCS-y += macfwd.c
 SRCS-y += macswap.c
 SRCS-y += flowgen.c
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1428974..5858a33 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -152,6 +152,7 @@ struct fwd_engine * fwd_engines[] = {
 	&rx_only_engine,
 	&tx_only_engine,
 	&csum_fwd_engine,
+	&txprep_fwd_engine,
 	&icmp_echo_engine,
 #ifdef RTE_LIBRTE_IEEE1588
 	&ieee1588_fwd_engine,
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2b281cc..0af9557 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -239,6 +239,7 @@ extern struct fwd_engine icmp_echo_engine;
 #ifdef RTE_LIBRTE_IEEE1588
 extern struct fwd_engine ieee1588_fwd_engine;
 #endif
+extern struct fwd_engine txprep_fwd_engine;
 
 extern struct fwd_engine * fwd_engines[]; /**< NULL terminated array. */
 
diff --git a/app/test-pmd/txprep.c b/app/test-pmd/txprep.c
new file mode 100644
index 0000000..688927e
--- /dev/null
+++ b/app/test-pmd/txprep.c
@@ -0,0 +1,412 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ring.h>
+#include <rte_memory.h>
+#include <rte_memcpy.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+#include <rte_interrupts.h>
+#include <rte_pci.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_string_fns.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#include "testpmd.h"
+
+/* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
+/*
+ * Helper function.
+ * Performs actual copying.
+ * Returns number of segments in the destination mbuf on success,
+ * or negative error code on failure.
+ */
+static int
+mbuf_copy_split(const struct rte_mbuf *ms, struct rte_mbuf *md[],
+	uint16_t seglen[], uint8_t nb_seg)
+{
+	uint32_t dlen, slen, tlen;
+	uint32_t i, len;
+	const struct rte_mbuf *m;
+	const uint8_t *src;
+	uint8_t *dst;
+
+	dlen = 0;
+	slen = 0;
+	tlen = 0;
+
+	dst = NULL;
+	src = NULL;
+
+	m = ms;
+	i = 0;
+	while (ms != NULL && i != nb_seg) {
+
+		if (slen == 0) {
+			slen = rte_pktmbuf_data_len(ms);
+			src = rte_pktmbuf_mtod(ms, const uint8_t *);
+		}
+
+		if (dlen == 0) {
+			dlen = RTE_MIN(seglen[i], slen);
+			md[i]->data_len = dlen;
+			md[i]->next = (i + 1 == nb_seg) ? NULL : md[i + 1];
+			dst = rte_pktmbuf_mtod(md[i], uint8_t *);
+		}
+
+		len = RTE_MIN(slen, dlen);
+		memcpy(dst, src, len);
+		tlen += len;
+		slen -= len;
+		dlen -= len;
+		src += len;
+		dst += len;
+
+		if (slen == 0)
+			ms = ms->next;
+		if (dlen == 0)
+			i++;
+	}
+
+	if (ms != NULL)
+		return -ENOBUFS;
+	else if (tlen != m->pkt_len)
+		return -EINVAL;
+
+	md[0]->nb_segs = nb_seg;
+	md[0]->pkt_len = tlen;
+	md[0]->vlan_tci = m->vlan_tci;
+	md[0]->vlan_tci_outer = m->vlan_tci_outer;
+	md[0]->ol_flags = m->ol_flags;
+	md[0]->tx_offload = m->tx_offload;
+
+	return nb_seg;
+}
+
+/*
+ * Allocate a new mbuf with up to tx_pkt_nb_segs segments.
+ * Copy packet contents and offload information into then new segmented mbuf.
+ */
+static struct rte_mbuf *
+pkt_copy_split(const struct rte_mbuf *pkt)
+{
+	int32_t n, rc;
+	uint32_t i, len, nb_seg;
+	struct rte_mempool *mp;
+	uint16_t seglen[RTE_MAX_SEGS_PER_PKT];
+	struct rte_mbuf *p, *md[RTE_MAX_SEGS_PER_PKT];
+
+	mp = current_fwd_lcore()->mbp;
+
+	if (tx_pkt_split == TX_PKT_SPLIT_RND)
+		nb_seg = random() % tx_pkt_nb_segs + 1;
+	else
+		nb_seg = tx_pkt_nb_segs;
+
+	memcpy(seglen, tx_pkt_seg_lengths, nb_seg * sizeof(seglen[0]));
+
+	/* calculate number of segments to use and their length. */
+	len = 0;
+	for (i = 0; i != nb_seg && len < pkt->pkt_len; i++) {
+		len += seglen[i];
+		md[i] = NULL;
+	}
+
+	n = pkt->pkt_len - len;
+
+	/* update size of the last segment to fit rest of the packet */
+	if (n >= 0) {
+		seglen[i - 1] += n;
+		len += n;
+	}
+
+	nb_seg = i;
+	while (i != 0) {
+		p = rte_pktmbuf_alloc(mp);
+		if (p == NULL) {
+			RTE_LOG(ERR, USER1,
+				"failed to allocate %u-th of %u mbuf "
+				"from mempool: %s\n",
+				nb_seg - i, nb_seg, mp->name);
+			break;
+		}
+
+		md[--i] = p;
+		if (rte_pktmbuf_tailroom(md[i]) < seglen[i]) {
+			RTE_LOG(ERR, USER1, "mempool %s, %u-th segment: "
+				"expected seglen: %u, "
+				"actual mbuf tailroom: %u\n",
+				mp->name, i, seglen[i],
+				rte_pktmbuf_tailroom(md[i]));
+			break;
+		}
+	}
+
+	/* all mbufs successfully allocated, do copy */
+	if (i == 0) {
+		rc = mbuf_copy_split(pkt, md, seglen, nb_seg);
+		if (rc < 0)
+			RTE_LOG(ERR, USER1,
+				"mbuf_copy_split for %p(len=%u, nb_seg=%hhu) "
+				"into %u segments failed with error code: %d\n",
+				pkt, pkt->pkt_len, pkt->nb_segs, nb_seg, rc);
+
+		/* figure out how many mbufs to free. */
+		i = RTE_MAX(rc, 0);
+	}
+
+	/* free unused mbufs */
+	for (; i != nb_seg; i++) {
+		rte_pktmbuf_free_seg(md[i]);
+		md[i] = NULL;
+	}
+
+	return md[0];
+}
+
+/*
+ * Forwarding of packets in I/O mode.
+ * Forward packets with tx_prep.
+ * This is the fastest possible forwarding operation, as it does not access
+ * to packets data.
+ */
+static void
+pkt_burst_txprep_forward(struct fwd_stream *fs)
+{
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_mbuf *p;
+	struct rte_port *txp;
+	int i;
+	uint16_t nb_rx;
+	uint16_t nb_prep;
+	uint16_t nb_tx;
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	uint64_t start_tsc;
+	uint64_t end_tsc;
+	uint64_t core_cycles;
+#endif
+	uint16_t tso_segsz = 0;
+	uint64_t ol_flags = 0;
+
+	struct ether_hdr *eth_hdr;
+	struct vlan_hdr *vlan_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	char *l3_hdr = NULL;
+
+	uint8_t l4_proto = 0;
+
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	start_tsc = rte_rdtsc();
+#endif
+
+	/*
+	 * Receive a burst of packets and forward them.
+	 */
+	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
+			nb_pkt_per_burst);
+	if (unlikely(nb_rx == 0))
+		return;
+
+	txp = &ports[fs->tx_port];
+	tso_segsz = txp->tso_segsz;
+
+	for (i = 0; i < nb_rx; i++) {
+
+		eth_hdr = rte_pktmbuf_mtod(pkts_burst[i], struct ether_hdr *);
+		ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
+				&eth_hdr->d_addr);
+		ether_addr_copy(&ports[fs->tx_port].eth_addr,
+				&eth_hdr->s_addr);
+
+		uint16_t ether_type = eth_hdr->ether_type;
+
+		pkts_burst[i]->l2_len = sizeof(struct ether_hdr);
+
+		ol_flags = 0;
+
+		if (tso_segsz > 0)
+			ol_flags |= PKT_TX_TCP_SEG;
+
+		if (ether_type == _htons(ETHER_TYPE_VLAN)) {
+			ol_flags |= PKT_TX_VLAN_PKT;
+			vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+			pkts_burst[i]->l2_len += sizeof(struct vlan_hdr);
+			ether_type = vlan_hdr->eth_proto;
+		}
+
+		switch (ether_type) {
+		case _htons(ETHER_TYPE_IPv4):
+			ol_flags |= (PKT_TX_IPV4 | PKT_TX_IP_CKSUM);
+			pkts_burst[i]->l3_len = sizeof(struct ipv4_hdr);
+			pkts_burst[i]->l4_len = sizeof(struct tcp_hdr);
+
+			ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr +
+					pkts_burst[i]->l2_len);
+			l3_hdr = (char *)ipv4_hdr;
+			pkts_burst[i]->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+			l4_proto = ipv4_hdr->next_proto_id;
+
+			break;
+		case _htons(ETHER_TYPE_IPv6):
+			ol_flags |= PKT_TX_IPV6;
+
+			ipv6_hdr = (struct ipv6_hdr *)((char *)eth_hdr +
+					pkts_burst[i]->l2_len);
+			l3_hdr = (char *)ipv6_hdr;
+			l4_proto = ipv6_hdr->proto;
+			pkts_burst[i]->l3_len = sizeof(struct ipv6_hdr);
+			break;
+		default:
+			printf("Unknown packet type\n");
+			break;
+		}
+
+		if (l4_proto == IPPROTO_TCP) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr = (struct tcp_hdr *)(l3_hdr + pkts_burst[i]->l3_len);
+			pkts_burst[i]->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+		} else if (l4_proto == IPPROTO_UDP) {
+			ol_flags |= PKT_TX_UDP_CKSUM;
+			pkts_burst[i]->l4_len = sizeof(struct udp_hdr);
+		}
+
+		pkts_burst[i]->tso_segsz = tso_segsz;
+		pkts_burst[i]->ol_flags = ol_flags;
+
+		/* Do split & copy for the packet. */
+		if (tx_pkt_split != TX_PKT_SPLIT_OFF) {
+			p = pkt_copy_split(pkts_burst[i]);
+			if (p != NULL) {
+				rte_pktmbuf_free(pkts_burst[i]);
+				pkts_burst[i] = p;
+			}
+		}
+
+		/* if verbose mode is enabled, dump debug info */
+		if (verbose_level > 0) {
+			printf("l2_len=%d, l3_len=%d, l4_len=%d, nb_segs=%d, tso_segz=%d\n",
+					pkts_burst[i]->l2_len, pkts_burst[i]->l3_len,
+					pkts_burst[i]->l4_len, pkts_burst[i]->nb_segs,
+					pkts_burst[i]->tso_segsz);
+		}
+	}
+
+	/*
+	 * Prepare burst to transmit
+	 */
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	if (nb_prep < nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
+	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
+#endif
+	fs->rx_packets += nb_rx;
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
+	fs->tx_packets += nb_tx;
+#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
+	fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
+#endif
+	if (unlikely(nb_tx < nb_rx)) {
+		fs->fwd_dropped += (nb_rx - nb_tx);
+		do {
+			rte_pktmbuf_free(pkts_burst[nb_tx]);
+		} while (++nb_tx < nb_rx);
+	}
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	end_tsc = rte_rdtsc();
+	core_cycles = (end_tsc - start_tsc);
+	fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
+#endif
+}
+
+static void
+txprep_fwd_begin(portid_t pi)
+{
+	struct rte_eth_dev_info dev_info;
+
+	rte_eth_dev_info_get(pi, &dev_info);
+	printf("  nb_seg_max=%d, nb_mtu_seg_max=%d\n",
+			dev_info.tx_desc_lim.nb_seg_max,
+			dev_info.tx_desc_lim.nb_mtu_seg_max);
+}
+
+static void
+txprep_fwd_end(portid_t pi __rte_unused)
+{
+	printf("txprep_fwd_end\n");
+}
+
+struct fwd_engine txprep_fwd_engine = {
+	.fwd_mode_name  = "txprep",
+	.port_fwd_begin = txprep_fwd_begin,
+	.port_fwd_end   = txprep_fwd_end,
+	.packet_fwd     = pkt_burst_txprep_forward,
+};
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
                   ` (5 preceding siblings ...)
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 6/6] testpmd: add txprep engine Tomasz Kulasek
@ 2016-08-26 17:31 ` Stephen Hemminger
  2016-08-31 12:34   ` Ananyev, Konstantin
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  7 siblings, 1 reply; 261+ messages in thread
From: Stephen Hemminger @ 2016-08-26 17:31 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

On Fri, 26 Aug 2016 18:22:52 +0200
Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:

> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.

Why not? You are adding an additional API burden on every application.

> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.

These need to be reported by API so that application can handle it.
Doing these transformations in tx_prep seems late in the process.

> 
> 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

Seems better to handle these limits as exceptions in i40e_tx_burst etc;
rather than a pre-step. Look at how Linux driver API works, several drivers
have to have an exception linearize path.

> 
> 4) Fields in packet may require different initialization (like e.g. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.

Once again, the driver should do this in Tx.


> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.

Most of this could be done by refactoring existing tx_burst in drivers.
Much of the code seems to be written as the "let's write a 2000 line
function because that is most efficient" rather than "let's write
small steps and let the compiler optimize it"

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH 0/6] add Tx preparation
  2016-08-26 17:31 ` [dpdk-dev] [PATCH 0/6] add Tx preparation Stephen Hemminger
@ 2016-08-31 12:34   ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-08-31 12:34 UTC (permalink / raw)
  To: Stephen Hemminger, Kulasek, TomaszX; +Cc: dev



> 
> On Fri, 26 Aug 2016 18:22:52 +0200
> Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:
> 
> > As discussed in that thread:
> >
> > http://dpdk.org/ml/archives/dev/2015-September/023603.html
> >
> > Different NIC models depending on HW offload requested might impose
> > different requirements on packets to be TX-ed in terms of:
> >
> >  - Max number of fragments per packet allowed
> >  - Max number of fragments per TSO segments
> >  - The way pseudo-header checksum should be pre-calculated
> >  - L3/L4 header fields filling
> >  - etc.
> >
> >
> > MOTIVATION:
> > -----------
> >
> > 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
> >    However, this work is sometimes required, and now, it's an
> >    application issue.
> 
> Why not? You are adding an additional API burden on every application.
> 
> >
> > 2) Different hardware may have different requirements for TX offloads,
> >    other subset can be supported and so on.
> 
> These need to be reported by API so that application can handle it.

If you read the patch description, you'll see that we do both:
- provide tx_prep()
- "2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet."

> Doing these transformations in tx_prep seems late in the process.

Why is that?
It is totally up to the application to decide ahat stage it wants to call tx_prep() for each packet -
just after it formed and mbuf to be TX-ed, or just before calling tx_burst() for it, or somewhere in btetween. 

> 
> >
> > 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
> >    device. These parameters may be vary for different devices.
> >
> >    For example i40e HW allows 8 fragments per packet, but that is after
> >    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> Seems better to handle these limits as exceptions in i40e_tx_burst etc; rather than a pre-step. Look at how Linux driver API works, several
> drivers have to have an exception linearize path.

Hmm, doesn't it contradicts with your statement above:
' Doing these transformations in tx_prep seems late in the process.'? :)
I suppose we all know that Linux kernel driver and DPDK PMD usage model is quite different. 
As a rule of thumb we try to avoid modifying packet data inside the tx_burst() itself.
Having this functionality in a different function gives upper layer a choice when it is better
to modify packet contents and hopefully hide/minimize memory access latencies.         

> 
> >
> > 4) Fields in packet may require different initialization (like e.g. will
> >    require pseudo-header checksum precalculation, sometimes in a
> >    different way depending on packet type, and so on). Now application
> >    needs to care about it.
> 
> Once again, the driver should do this in Tx.

Once again, I really doubt it should.

> 
> 
> >
> > 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
> >    prepare packet burst in acceptable form for specific device.
> >
> > 6) Some additional checks may be done in debug mode keeping tx_burst
> >    implementation clean.
> 
> Most of this could be done by refactoring existing tx_burst in drivers.
> Much of the code seems to be written as the "let's write a 2000 line function because that is most efficient" rather than "let's write small
> steps and let the compiler optimize it"

I don't see how that could be easily done inside tx_burst() without signifcatn performance loss.
Especially if we have a pipeline model, when we have one or several t produce mbufs to be TX-ed,
and one or several lcores that doing actual TX for these packets.

Konstantin
 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
  2016-08-26 16:22 ` [dpdk-dev] [PATCH 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-08  7:28   ` Jerin Jacob
  2016-09-08 16:09     ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Jerin Jacob @ 2016-09-08  7:28 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

On Fri, Aug 26, 2016 at 06:22:53PM +0200, Tomasz Kulasek wrote:
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:
> 
> int rte_validate_tx_offload(struct rte_mbuf *m)
> 	to validate general requirements for tx offload in packet such a
> 	flag completness. In current implementation this function is called
> 	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> 
> int rte_phdr_cksum_fix(struct rte_mbuf *m)
> 	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> 	before hardware tx checksum offload.
> 	 - for non-TSO tcp/udp packets full pseudo-header checksum is
> 	   counted and set.
> 	 - for TSO the IP payload length is not included.
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h |   74 +++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h    |    8 +++
>  lib/librte_net/Makefile       |    2 +-
>  lib/librte_net/rte_pkt.h      |  132 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 215 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_net/rte_pkt.h
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index b0fe033..02569ca 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -182,6 +182,7 @@ extern "C" {
>  #include <rte_pci.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
> +#include <rte_errno.h>
>  #include "rte_ether.h"
>  #include "rte_eth_ctrl.h"
>  #include "rte_dev_info.h"
> @@ -696,6 +697,8 @@ struct rte_eth_desc_lim {
>  	uint16_t nb_max;   /**< Max allowed number of descriptors. */
>  	uint16_t nb_min;   /**< Min allowed number of descriptors. */
>  	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
> +	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
>  };
>  
>  /**
> @@ -1181,6 +1184,12 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
>  				   uint16_t nb_pkts);
>  /**< @internal Send output packets on a transmit queue of an Ethernet device. */
>  
> +typedef uint16_t (*eth_tx_prep_t)(void *txq,
> +				   struct rte_mbuf **tx_pkts,
> +				   uint16_t nb_pkts);
> +/**< @internal Prepare output packets on a transmit queue of an Ethernet
> +		device. */
> +
>  typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
>  			       struct rte_eth_fc_conf *fc_conf);
>  /**< @internal Get current flow control parameter on an Ethernet device */
> @@ -1626,6 +1635,7 @@ enum rte_eth_dev_type {
>  struct rte_eth_dev {
>  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> @@ -2833,6 +2843,70 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
>  }
>  
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet device.
> + *
> + * The rte_eth_tx_prep() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prep() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.
> + *
> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.
> + *
> + * - Update and/or reset required checksums when tx offload is set for packet.
> + *
> + * The rte_eth_tx_prep() function returns the number of packets ready to be
> + * sent. A return value equal to *nb_pkts* means that all packets are valid and
> + * ready to be sent.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_pkts
> + *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
> + *   which contain the output packets.
> + * @param nb_pkts
> + *   The maximum number of packets to process.
> + * @return
> + *   The number of packets correct and ready to be sent. The return value can be
> + *   less than the value of the *tx_pkts* parameter when some packet doesn't
> + *   meet devices requirements with rte_errno set appropriately.
> + */
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	if (!dev->tx_pkt_prep) {
> +		rte_errno = -ENOTSUP;

rte_errno update may not be necessary here. see below

> +		return 0;
IMO, We should return "nb_pkts" here instead of 0(i.e, all the packets
are valid in-case PMD does not have tx_prep function) and in-case of "0"
the following check in the application also will fail for no reason
if (nb_prep < nb_pkts) {
	printf("tx_prep failed\n");
}


> +	}
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +	if (queue_id >= dev->data->nb_tx_queues) {
> +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> +		rte_errno = -EINVAL;
> +		return 0;
> +	}
> +#endif
> +
> +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> +			tx_pkts, nb_pkts);
> +}
> +

IMO, We need to provide a compile time option for rte_eth_tx_prep as
NOOP. Default option should be non NOOP but incase a _target_ want to
override to NOOP it should be possible, the reasons is:

- Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may
have only integrated NIC controller. On those targets, where integrated
NIC controller does not use tx_prep service it can made it as NOOP to
save cycles on following "rte_eth_tx_prep" and associated "if
(unlikely(nb_prep < nb_rx))" checks in the application.

/* Prepare burst of TX packets */
nb_prep = rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx);

if (unlikely(nb_prep < nb_rx)) {
        int i;
        for (i = nb_prep; i < nb_rx; i++)
                rte_pktmbuf_free(pkts_burst[i]);
}


Jerin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
  2016-09-08  7:28   ` Jerin Jacob
@ 2016-09-08 16:09     ` Kulasek, TomaszX
  2016-09-09  5:58       ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-09-08 16:09 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, September 8, 2016 09:29
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
> 

[...]

> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
> **tx_pkts,
> > +		uint16_t nb_pkts)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +	if (!dev->tx_pkt_prep) {
> > +		rte_errno = -ENOTSUP;
> 
> rte_errno update may not be necessary here. see below
> 
> > +		return 0;
> IMO, We should return "nb_pkts" here instead of 0(i.e, all the packets are
> valid in-case PMD does not have tx_prep function) and in-case of "0"
> the following check in the application also will fail for no reason if
> (nb_prep < nb_pkts) {
> 	printf("tx_prep failed\n");
> }
> 

Yes, it seems to be reasonable.

> 
> > +	}
> > +
> > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > +	if (queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> > +		rte_errno = -EINVAL;
> > +		return 0;
> > +	}
> > +#endif
> > +
> > +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> > +			tx_pkts, nb_pkts);
> > +}
> > +
> 
> IMO, We need to provide a compile time option for rte_eth_tx_prep as NOOP.
> Default option should be non NOOP but incase a _target_ want to override
> to NOOP it should be possible, the reasons is:
> 
> - Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
> only integrated NIC controller. On those targets, where integrated NIC
> controller does not use tx_prep service it can made it as NOOP to save
> cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
> < nb_rx))" checks in the application.
> 
> /* Prepare burst of TX packets */
> nb_prep = rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx);
> 
> if (unlikely(nb_prep < nb_rx)) {
>         int i;
>         for (i = nb_prep; i < nb_rx; i++)
>                 rte_pktmbuf_free(pkts_burst[i]); }
> 

You mean to have a code for NOOP like:


	/* Prepare burst of TX packets */
	nb_prep = nb_rx; /* rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx); */
 
	if (unlikely(nb_prep < nb_rx)) {
         int i;
         for (i = nb_prep; i < nb_rx; i++)
                 rte_pktmbuf_free(pkts_burst[i]); }


and let optimizer to remove unused parts?


IMHO it should be an application issue to use tx_prep or not.

While part of the job is done by the driver (verification and preparation), and part by application (error handling), such a global compile time option can introduce inconsistency, if application will not handle both cases.

If someone wants to turn off this functionality, it should be done on application level, e.g. with compilation option.
 
> 
> Jerin
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
  2016-09-08 16:09     ` Kulasek, TomaszX
@ 2016-09-09  5:58       ` Jerin Jacob
  0 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-09-09  5:58 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev

On Thu, Sep 08, 2016 at 04:09:05PM +0000, Kulasek, TomaszX wrote:
> Hi Jerin,

Hi TomaszX,

> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Thursday, September 8, 2016 09:29
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
> > 
> 
> [...]
> 
> > > +static inline uint16_t
> > > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
> > **tx_pkts,
> > > +		uint16_t nb_pkts)
> > > +{
> > > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > +
> > > +	if (!dev->tx_pkt_prep) {
> > > +		rte_errno = -ENOTSUP;
> > 
> > rte_errno update may not be necessary here. see below
> > 
> > > +		return 0;
> > IMO, We should return "nb_pkts" here instead of 0(i.e, all the packets are
> > valid in-case PMD does not have tx_prep function) and in-case of "0"
> > the following check in the application also will fail for no reason if
> > (nb_prep < nb_pkts) {
> > 	printf("tx_prep failed\n");
> > }
> > 
> 
> Yes, it seems to be reasonable.
> 
> > 
> > > +	}
> > > +
> > > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > > +	if (queue_id >= dev->data->nb_tx_queues) {
> > > +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> > > +		rte_errno = -EINVAL;
> > > +		return 0;
> > > +	}
> > > +#endif
> > > +
> > > +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> > > +			tx_pkts, nb_pkts);
> > > +}
> > > +
> > 
> > IMO, We need to provide a compile time option for rte_eth_tx_prep as NOOP.
> > Default option should be non NOOP but incase a _target_ want to override
> > to NOOP it should be possible, the reasons is:
> > 
> > - Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
> > only integrated NIC controller. On those targets, where integrated NIC
> > controller does not use tx_prep service it can made it as NOOP to save
> > cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
> > < nb_rx))" checks in the application.
> > 
> > /* Prepare burst of TX packets */
> > nb_prep = rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx);
> > 
> > if (unlikely(nb_prep < nb_rx)) {
> >         int i;
> >         for (i = nb_prep; i < nb_rx; i++)
> >                 rte_pktmbuf_free(pkts_burst[i]); }
> > 
> 
> You mean to have a code for NOOP like:
> 
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = nb_rx; /* rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx); */
>  
> 	if (unlikely(nb_prep < nb_rx)) {
>          int i;
>          for (i = nb_prep; i < nb_rx; i++)
>                  rte_pktmbuf_free(pkts_burst[i]); }
> 
> 
> and let optimizer to remove unused parts?

I thought of creating compile time NOOP like this,
CONFIG_RTE_LIBRTE_ETHDEV_TXPREP_SUPPORT=y in config/common_base and
and have two flavors of definitions for rte_eth_tx_prep

#ifdef RTE_LIBRTE_ETHDEV_TXPREP_SUPPORT
static inline uint16_t
rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
**tx_pkts, uint16_t nb_pkts)
{
	Proposed implementation
}
#else
static inline uint16_t
rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
**tx_pkts, uint16_t nb_pkts)
{
	(void)port_id;
	(void)queue_id;
	..
}
#endif

> 
> 
> IMHO it should be an application issue to use tx_prep or not.

Some cases even _target_(example: config/defconfig_arm64-*) can also decides that.
An example of such target is:
Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
only integrated NIC controller. On those targets/configs, where integrated NIC
controller does not use tx_prep service it can made it as NOOP to save
cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
< nb_rx))" checks in the application.

> 
> While part of the job is done by the driver (verification and preparation), and part by application (error handling), such a global compile time option can introduce inconsistency, if application will not handle both cases.

Each DPDK application build/compile against the target/config so I think
it is OK.

> 
> If someone wants to turn off this functionality, it should be done on application level, e.g. with compilation option.
>  

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] add Tx preparation
  2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
                   ` (6 preceding siblings ...)
  2016-08-26 17:31 ` [dpdk-dev] [PATCH 0/6] add Tx preparation Stephen Hemminger
@ 2016-09-12 14:44 ` Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 1/6] ethdev: " Tomasz Kulasek
                     ` (6 more replies)
  7 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: add txprep engine

 app/test-pmd/Makefile            |    3 +-
 app/test-pmd/testpmd.c           |    3 +
 app/test-pmd/testpmd.h           |    4 +-
 app/test-pmd/txprep.c            |  412 ++++++++++++++++++++++++++++++++++++++
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   46 ++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +
 drivers/net/e1000/igb_rxtx.c     |   50 ++++-
 drivers/net/fm10k/fm10k.h        |    9 +
 drivers/net/fm10k/fm10k_ethdev.c |    5 +
 drivers/net/fm10k/fm10k_rxtx.c   |   77 ++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   98 ++++++++-
 drivers/net/i40e/i40e_rxtx.h     |   10 +
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   85 ++++++++
 lib/librte_mbuf/rte_mbuf.h       |    8 +
 lib/librte_net/Makefile          |    2 +-
 lib/librte_net/rte_pkt.h         |  132 ++++++++++++
 24 files changed, 1054 insertions(+), 10 deletions(-)
 create mode 100644 app/test-pmd/txprep.c
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-19 13:03     ` Ananyev, Konstantin
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 2/6] e1000: " Tomasz Kulasek
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this funtion is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   85 ++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    8 +++
 lib/librte_net/Makefile       |    2 +-
 lib/librte_net/rte_pkt.h      |  132 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index 7830535..7ada9e0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b0fe033..4fa674d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -696,6 +697,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1181,6 +1184,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1626,6 +1634,7 @@ enum rte_eth_dev_type {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2833,6 +2842,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
+		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7ea66ed..72fd352 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -211,6 +211,14 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index ad2e482..b5abe84 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
 
 
 include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..a3c3e3c
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,132 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG) {
+
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM))
+			return -EINVAL;
+
+		if (m->tso_segsz == 0)
+			return -EINVAL;
+
+	}
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+
+	if (m->ol_flags & PKT_TX_IPV4) {
+		ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+
+		if (m->ol_flags & PKT_TX_IP_CKSUM)
+			ipv4_hdr->hdr_checksum = 0;
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		}
+	} else if (m->ol_flags & PKT_TX_IPV6) {
+		ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *, m->l2_len);
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] e1000: add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Tomasz Kulasek
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 +++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   46 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   50 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index ad104ed..1baf268 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1073,6 +1074,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 6d8750a..25ea5b3 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -67,6 +67,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,11 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -619,6 +625,44 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & E1000_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4e9e6a3..1f7ba4d 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 9d80a0b..1593dc2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -617,6 +618,52 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & IGB_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1363,6 +1410,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] fm10k: add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 1/6] ethdev: " Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 2/6] e1000: " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Tomasz Kulasek
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 +++
 drivers/net/fm10k/fm10k_rxtx.c   |   77 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..83d2bfb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,12 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
+uint16_t fm10k_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 01f4a72..c9f450e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1441,6 +1441,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2749,8 +2751,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = fm10k_prep_pkts_simple;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2829,6 +2833,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..a87b06f 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,12 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +590,71 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & FM10K_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* fm10k vector TX path doesn't support tx offloads */
+uint16_t
+fm10k_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] i40e: add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
                     ` (2 preceding siblings ...)
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Tomasz Kulasek
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   98 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |   10 ++++
 3 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d0aeb70..52a2abb 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -948,6 +948,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2614,6 +2615,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..1cd34f7 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,14 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1930,6 +1940,90 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if ((ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(ol_flags & I40E_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
+/* i40e simple path doesn't support tx offloads */
+uint16_t
+i40e_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+			rte_errno = -1;
+			return i;
+		}
+
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -1;
+			return i;
+		}
+	}
+
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3271,9 +3365,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = i40e_prep_pkts_simple;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..2ff7862 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,10 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+uint16_t i40e_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
                     ` (3 preceding siblings ...)
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-19 12:54     ` Ananyev, Konstantin
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine Tomasz Kulasek
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 4 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index fb618ef..1509979 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..09d96de 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,12 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t ixgbe_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8a306b0..87defa0 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -71,6 +71,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -906,6 +907,84 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
+				(m->ol_flags & IXGBE_TX_OFFLOAD_MASK)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* ixgbe simple path as well as vector TX doesn't support tx offloads */
+uint16_t
+ixgbe_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 		} else
 #endif
 		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
+		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
 		PMD_INIT_LOG(DEBUG,
@@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
                     ` (4 preceding siblings ...)
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Tomasz Kulasek
@ 2016-09-12 14:44   ` Tomasz Kulasek
  2016-09-19 12:59     ` Ananyev, Konstantin
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-12 14:44 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob

This patch adds txprep engine to the testpmd application.

Txprep engine is intended to verify Tx preparation functionality
implemented in pmd driver.

It's based on the default "io" engine with the folowing changes:
 - Tx HW offloads are reset in incoming packet,
 - burst is passed to the Tx preparation function before tx burst,
 - added "txsplit" and "tso" functionality for outgoing packets.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/Makefile  |    3 +-
 app/test-pmd/testpmd.c |    3 +
 app/test-pmd/testpmd.h |    4 +-
 app/test-pmd/txprep.c  |  412 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 420 insertions(+), 2 deletions(-)
 create mode 100644 app/test-pmd/txprep.c

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 2a0b5a5..3f9ad1c 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -49,6 +49,7 @@ SRCS-y += parameters.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline.c
 SRCS-y += config.c
 SRCS-y += iofwd.c
+SRCS-$(CONFIG_RTE_ETHDEV_TX_PREP) += txprep.c
 SRCS-y += macfwd.c
 SRCS-y += macswap.c
 SRCS-y += flowgen.c
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1428974..9b6c475 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -152,6 +152,9 @@ struct fwd_engine * fwd_engines[] = {
 	&rx_only_engine,
 	&tx_only_engine,
 	&csum_fwd_engine,
+#ifdef RTE_ETHDEV_TX_PREP
+	&txprep_fwd_engine,
+#endif
 	&icmp_echo_engine,
 #ifdef RTE_LIBRTE_IEEE1588
 	&ieee1588_fwd_engine,
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2b281cc..f800846 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -239,7 +239,9 @@ extern struct fwd_engine icmp_echo_engine;
 #ifdef RTE_LIBRTE_IEEE1588
 extern struct fwd_engine ieee1588_fwd_engine;
 #endif
-
+#ifdef RTE_ETHDEV_TX_PREP
+extern struct fwd_engine txprep_fwd_engine;
+#endif
 extern struct fwd_engine * fwd_engines[]; /**< NULL terminated array. */
 
 /**
diff --git a/app/test-pmd/txprep.c b/app/test-pmd/txprep.c
new file mode 100644
index 0000000..688927e
--- /dev/null
+++ b/app/test-pmd/txprep.c
@@ -0,0 +1,412 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ring.h>
+#include <rte_memory.h>
+#include <rte_memcpy.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+#include <rte_interrupts.h>
+#include <rte_pci.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_string_fns.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#include "testpmd.h"
+
+/* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8)))
+#else
+#define _htons(x) (x)
+#endif
+
+/*
+ * Helper function.
+ * Performs actual copying.
+ * Returns number of segments in the destination mbuf on success,
+ * or negative error code on failure.
+ */
+static int
+mbuf_copy_split(const struct rte_mbuf *ms, struct rte_mbuf *md[],
+	uint16_t seglen[], uint8_t nb_seg)
+{
+	uint32_t dlen, slen, tlen;
+	uint32_t i, len;
+	const struct rte_mbuf *m;
+	const uint8_t *src;
+	uint8_t *dst;
+
+	dlen = 0;
+	slen = 0;
+	tlen = 0;
+
+	dst = NULL;
+	src = NULL;
+
+	m = ms;
+	i = 0;
+	while (ms != NULL && i != nb_seg) {
+
+		if (slen == 0) {
+			slen = rte_pktmbuf_data_len(ms);
+			src = rte_pktmbuf_mtod(ms, const uint8_t *);
+		}
+
+		if (dlen == 0) {
+			dlen = RTE_MIN(seglen[i], slen);
+			md[i]->data_len = dlen;
+			md[i]->next = (i + 1 == nb_seg) ? NULL : md[i + 1];
+			dst = rte_pktmbuf_mtod(md[i], uint8_t *);
+		}
+
+		len = RTE_MIN(slen, dlen);
+		memcpy(dst, src, len);
+		tlen += len;
+		slen -= len;
+		dlen -= len;
+		src += len;
+		dst += len;
+
+		if (slen == 0)
+			ms = ms->next;
+		if (dlen == 0)
+			i++;
+	}
+
+	if (ms != NULL)
+		return -ENOBUFS;
+	else if (tlen != m->pkt_len)
+		return -EINVAL;
+
+	md[0]->nb_segs = nb_seg;
+	md[0]->pkt_len = tlen;
+	md[0]->vlan_tci = m->vlan_tci;
+	md[0]->vlan_tci_outer = m->vlan_tci_outer;
+	md[0]->ol_flags = m->ol_flags;
+	md[0]->tx_offload = m->tx_offload;
+
+	return nb_seg;
+}
+
+/*
+ * Allocate a new mbuf with up to tx_pkt_nb_segs segments.
+ * Copy packet contents and offload information into then new segmented mbuf.
+ */
+static struct rte_mbuf *
+pkt_copy_split(const struct rte_mbuf *pkt)
+{
+	int32_t n, rc;
+	uint32_t i, len, nb_seg;
+	struct rte_mempool *mp;
+	uint16_t seglen[RTE_MAX_SEGS_PER_PKT];
+	struct rte_mbuf *p, *md[RTE_MAX_SEGS_PER_PKT];
+
+	mp = current_fwd_lcore()->mbp;
+
+	if (tx_pkt_split == TX_PKT_SPLIT_RND)
+		nb_seg = random() % tx_pkt_nb_segs + 1;
+	else
+		nb_seg = tx_pkt_nb_segs;
+
+	memcpy(seglen, tx_pkt_seg_lengths, nb_seg * sizeof(seglen[0]));
+
+	/* calculate number of segments to use and their length. */
+	len = 0;
+	for (i = 0; i != nb_seg && len < pkt->pkt_len; i++) {
+		len += seglen[i];
+		md[i] = NULL;
+	}
+
+	n = pkt->pkt_len - len;
+
+	/* update size of the last segment to fit rest of the packet */
+	if (n >= 0) {
+		seglen[i - 1] += n;
+		len += n;
+	}
+
+	nb_seg = i;
+	while (i != 0) {
+		p = rte_pktmbuf_alloc(mp);
+		if (p == NULL) {
+			RTE_LOG(ERR, USER1,
+				"failed to allocate %u-th of %u mbuf "
+				"from mempool: %s\n",
+				nb_seg - i, nb_seg, mp->name);
+			break;
+		}
+
+		md[--i] = p;
+		if (rte_pktmbuf_tailroom(md[i]) < seglen[i]) {
+			RTE_LOG(ERR, USER1, "mempool %s, %u-th segment: "
+				"expected seglen: %u, "
+				"actual mbuf tailroom: %u\n",
+				mp->name, i, seglen[i],
+				rte_pktmbuf_tailroom(md[i]));
+			break;
+		}
+	}
+
+	/* all mbufs successfully allocated, do copy */
+	if (i == 0) {
+		rc = mbuf_copy_split(pkt, md, seglen, nb_seg);
+		if (rc < 0)
+			RTE_LOG(ERR, USER1,
+				"mbuf_copy_split for %p(len=%u, nb_seg=%hhu) "
+				"into %u segments failed with error code: %d\n",
+				pkt, pkt->pkt_len, pkt->nb_segs, nb_seg, rc);
+
+		/* figure out how many mbufs to free. */
+		i = RTE_MAX(rc, 0);
+	}
+
+	/* free unused mbufs */
+	for (; i != nb_seg; i++) {
+		rte_pktmbuf_free_seg(md[i]);
+		md[i] = NULL;
+	}
+
+	return md[0];
+}
+
+/*
+ * Forwarding of packets in I/O mode.
+ * Forward packets with tx_prep.
+ * This is the fastest possible forwarding operation, as it does not access
+ * to packets data.
+ */
+static void
+pkt_burst_txprep_forward(struct fwd_stream *fs)
+{
+	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+	struct rte_mbuf *p;
+	struct rte_port *txp;
+	int i;
+	uint16_t nb_rx;
+	uint16_t nb_prep;
+	uint16_t nb_tx;
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	uint64_t start_tsc;
+	uint64_t end_tsc;
+	uint64_t core_cycles;
+#endif
+	uint16_t tso_segsz = 0;
+	uint64_t ol_flags = 0;
+
+	struct ether_hdr *eth_hdr;
+	struct vlan_hdr *vlan_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	char *l3_hdr = NULL;
+
+	uint8_t l4_proto = 0;
+
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	start_tsc = rte_rdtsc();
+#endif
+
+	/*
+	 * Receive a burst of packets and forward them.
+	 */
+	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
+			nb_pkt_per_burst);
+	if (unlikely(nb_rx == 0))
+		return;
+
+	txp = &ports[fs->tx_port];
+	tso_segsz = txp->tso_segsz;
+
+	for (i = 0; i < nb_rx; i++) {
+
+		eth_hdr = rte_pktmbuf_mtod(pkts_burst[i], struct ether_hdr *);
+		ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
+				&eth_hdr->d_addr);
+		ether_addr_copy(&ports[fs->tx_port].eth_addr,
+				&eth_hdr->s_addr);
+
+		uint16_t ether_type = eth_hdr->ether_type;
+
+		pkts_burst[i]->l2_len = sizeof(struct ether_hdr);
+
+		ol_flags = 0;
+
+		if (tso_segsz > 0)
+			ol_flags |= PKT_TX_TCP_SEG;
+
+		if (ether_type == _htons(ETHER_TYPE_VLAN)) {
+			ol_flags |= PKT_TX_VLAN_PKT;
+			vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+			pkts_burst[i]->l2_len += sizeof(struct vlan_hdr);
+			ether_type = vlan_hdr->eth_proto;
+		}
+
+		switch (ether_type) {
+		case _htons(ETHER_TYPE_IPv4):
+			ol_flags |= (PKT_TX_IPV4 | PKT_TX_IP_CKSUM);
+			pkts_burst[i]->l3_len = sizeof(struct ipv4_hdr);
+			pkts_burst[i]->l4_len = sizeof(struct tcp_hdr);
+
+			ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr +
+					pkts_burst[i]->l2_len);
+			l3_hdr = (char *)ipv4_hdr;
+			pkts_burst[i]->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+			l4_proto = ipv4_hdr->next_proto_id;
+
+			break;
+		case _htons(ETHER_TYPE_IPv6):
+			ol_flags |= PKT_TX_IPV6;
+
+			ipv6_hdr = (struct ipv6_hdr *)((char *)eth_hdr +
+					pkts_burst[i]->l2_len);
+			l3_hdr = (char *)ipv6_hdr;
+			l4_proto = ipv6_hdr->proto;
+			pkts_burst[i]->l3_len = sizeof(struct ipv6_hdr);
+			break;
+		default:
+			printf("Unknown packet type\n");
+			break;
+		}
+
+		if (l4_proto == IPPROTO_TCP) {
+			ol_flags |= PKT_TX_TCP_CKSUM;
+			tcp_hdr = (struct tcp_hdr *)(l3_hdr + pkts_burst[i]->l3_len);
+			pkts_burst[i]->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+		} else if (l4_proto == IPPROTO_UDP) {
+			ol_flags |= PKT_TX_UDP_CKSUM;
+			pkts_burst[i]->l4_len = sizeof(struct udp_hdr);
+		}
+
+		pkts_burst[i]->tso_segsz = tso_segsz;
+		pkts_burst[i]->ol_flags = ol_flags;
+
+		/* Do split & copy for the packet. */
+		if (tx_pkt_split != TX_PKT_SPLIT_OFF) {
+			p = pkt_copy_split(pkts_burst[i]);
+			if (p != NULL) {
+				rte_pktmbuf_free(pkts_burst[i]);
+				pkts_burst[i] = p;
+			}
+		}
+
+		/* if verbose mode is enabled, dump debug info */
+		if (verbose_level > 0) {
+			printf("l2_len=%d, l3_len=%d, l4_len=%d, nb_segs=%d, tso_segz=%d\n",
+					pkts_burst[i]->l2_len, pkts_burst[i]->l3_len,
+					pkts_burst[i]->l4_len, pkts_burst[i]->nb_segs,
+					pkts_burst[i]->tso_segsz);
+		}
+	}
+
+	/*
+	 * Prepare burst to transmit
+	 */
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	if (nb_prep < nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
+	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
+#endif
+	fs->rx_packets += nb_rx;
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
+	fs->tx_packets += nb_tx;
+#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
+	fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
+#endif
+	if (unlikely(nb_tx < nb_rx)) {
+		fs->fwd_dropped += (nb_rx - nb_tx);
+		do {
+			rte_pktmbuf_free(pkts_burst[nb_tx]);
+		} while (++nb_tx < nb_rx);
+	}
+#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
+	end_tsc = rte_rdtsc();
+	core_cycles = (end_tsc - start_tsc);
+	fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles);
+#endif
+}
+
+static void
+txprep_fwd_begin(portid_t pi)
+{
+	struct rte_eth_dev_info dev_info;
+
+	rte_eth_dev_info_get(pi, &dev_info);
+	printf("  nb_seg_max=%d, nb_mtu_seg_max=%d\n",
+			dev_info.tx_desc_lim.nb_seg_max,
+			dev_info.tx_desc_lim.nb_mtu_seg_max);
+}
+
+static void
+txprep_fwd_end(portid_t pi __rte_unused)
+{
+	printf("txprep_fwd_end\n");
+}
+
+struct fwd_engine txprep_fwd_engine = {
+	.fwd_mode_name  = "txprep",
+	.port_fwd_begin = txprep_fwd_begin,
+	.port_fwd_end   = txprep_fwd_end,
+	.packet_fwd     = pkt_burst_txprep_forward,
+};
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Tomasz Kulasek
@ 2016-09-19 12:54     ` Ananyev, Konstantin
  2016-09-19 13:58       ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-19 12:54 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: jerin.jacob


Hi Tomasz,

> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
>  drivers/net/ixgbe/ixgbe_ethdev.h |    8 +++-
>  drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++++++++++++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
>  4 files changed, 94 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> index fb618ef..1509979 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
>  	.nb_max = IXGBE_MAX_RING_DESC,
>  	.nb_min = IXGBE_MIN_RING_DESC,
>  	.nb_align = IXGBE_TXD_ALIGN,
> +	.nb_seg_max = IXGBE_TX_MAX_SEG,
> +	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
>  };
> 
>  static const struct eth_dev_ops ixgbe_eth_dev_ops = { @@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
>  	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
>  	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
>  	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
> +	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
> 
>  	/*
>  	 * For secondary processes, we don't initialise any further as primary diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h
> b/drivers/net/ixgbe/ixgbe_ethdev.h
> index 4ff6338..09d96de 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -396,6 +396,12 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,  uint16_t ixgbe_xmit_pkts_simple(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  		uint16_t nb_pkts);
> 
> +uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts);
> +
> +uint16_t ixgbe_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts);
> +
>  int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
>  			      struct rte_eth_rss_conf *rss_conf);
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 8a306b0..87defa0 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   Copyright 2014 6WIND S.A.
>   *   All rights reserved.
>   *
> @@ -71,6 +71,7 @@
>  #include <rte_string_fns.h>
>  #include <rte_errno.h>
>  #include <rte_ip.h>
> +#include <rte_pkt.h>
> 
>  #include "ixgbe_logs.h"
>  #include "base/ixgbe_api.h"
> @@ -906,6 +907,84 @@ end_of_tx:
> 
>  /*********************************************************************
>   *
> + *  TX prep functions
> + *
> +
> +**********************************************************************/
> +uint16_t
> +ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> +nb_pkts) {
> +	int i, ret;
> +	struct rte_mbuf *m;
> +	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		m = tx_pkts[i];
> +
> +		/**
> +		 * Check if packet meets requirements for number of segments
> +		 *
> +		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
> +		 */
> +
> +		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
> +				(m->ol_flags & IXGBE_TX_OFFLOAD_MASK)) {


As a nit, it probably makes sense to:
#define IXGBE_TX_OFFLOAD_NOTSUP_MASK (PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)

and then here:
(m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK)

Might help to save few cycles.


> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +		ret = rte_validate_tx_offload(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +#endif
> +		ret = rte_phdr_cksum_fix(m);

We probable need to update rte_phdr_cksum_fix() to take
into account tx_offload outer lengths in case
PKT_TX_OUTER_IP_CKSUM is defined.
As both ixgbe and i40e can do it these days. 
Sorry for not spotting that earlier.


> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}
> +
> +/* ixgbe simple path as well as vector TX doesn't support tx offloads
> +*/ uint16_t ixgbe_prep_pkts_simple(void *tx_queue __rte_unused, struct
> +rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	int i;
> +	struct rte_mbuf *m;
> +	uint64_t ol_flags;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		m = tx_pkts[i];
> +		ol_flags = m->ol_flags;
> +
> +		/* simple tx path doesn't support multi-segments */
> +		if (m->nb_segs != 1) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +		/* For simple path (simple and vector) no tx offloads are supported */
> +		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}
> +
> +/*********************************************************************
> + *
>   *  RX functions
>   *
>   **********************************************************************/
> @@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
>  		} else
>  #endif
>  		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> +		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;

Shouldn't we setup ixgbe_prep_pkts_simple when vTX is selected too? 

>  	} else {
>  		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
>  		PMD_INIT_LOG(DEBUG,
> @@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
>  				(unsigned long)txq->tx_rs_thresh,
>  				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
>  		dev->tx_pkt_burst = ixgbe_xmit_pkts;
> +		dev->tx_pkt_prep = ixgbe_prep_pkts;
>  	}
>  }
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 2608b36..7bbd9b8 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> @@ -80,6 +80,8 @@
>  #define RTE_IXGBE_WAIT_100_US               100
>  #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
> 
> +#define IXGBE_TX_MAX_SEG                    40
> +
>  #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
>  #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
>  #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine Tomasz Kulasek
@ 2016-09-19 12:59     ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-19 12:59 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: jerin.jacob



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Monday, September 12, 2016 3:45 PM
> To: dev@dpdk.org
> Cc: jerin.jacob@caviumnetworks.com
> Subject: [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine
> 
> This patch adds txprep engine to the testpmd application.
> 
> Txprep engine is intended to verify Tx preparation functionality implemented in pmd driver.
> 
> It's based on the default "io" engine with the folowing changes:
>  - Tx HW offloads are reset in incoming packet,
>  - burst is passed to the Tx preparation function before tx burst,
>  - added "txsplit" and "tso" functionality for outgoing packets.

Do we really need whole new mode with headers parsing and packet splitting?
Can't we just modify testpmd csumonly mode to use tx_prep() instead?
Konstantin

> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  app/test-pmd/Makefile  |    3 +-
>  app/test-pmd/testpmd.c |    3 +
>  app/test-pmd/testpmd.h |    4 +-
>  app/test-pmd/txprep.c  |  412 ++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 420 insertions(+), 2 deletions(-)  create mode 100644 app/test-pmd/txprep.c
> 
> diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile index 2a0b5a5..3f9ad1c 100644
> --- a/app/test-pmd/Makefile
> +++ b/app/test-pmd/Makefile
> @@ -1,6 +1,6 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> +#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>  #   All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
> @@ -49,6 +49,7 @@ SRCS-y += parameters.c
>  SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline.c  SRCS-y += config.c  SRCS-y += iofwd.c
> +SRCS-$(CONFIG_RTE_ETHDEV_TX_PREP) += txprep.c
>  SRCS-y += macfwd.c
>  SRCS-y += macswap.c
>  SRCS-y += flowgen.c
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 1428974..9b6c475 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -152,6 +152,9 @@ struct fwd_engine * fwd_engines[] = {
>  	&rx_only_engine,
>  	&tx_only_engine,
>  	&csum_fwd_engine,
> +#ifdef RTE_ETHDEV_TX_PREP
> +	&txprep_fwd_engine,
> +#endif
>  	&icmp_echo_engine,
>  #ifdef RTE_LIBRTE_IEEE1588
>  	&ieee1588_fwd_engine,
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index 2b281cc..f800846 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -239,7 +239,9 @@ extern struct fwd_engine icmp_echo_engine;  #ifdef RTE_LIBRTE_IEEE1588  extern struct fwd_engine
> ieee1588_fwd_engine;  #endif
> -
> +#ifdef RTE_ETHDEV_TX_PREP
> +extern struct fwd_engine txprep_fwd_engine; #endif
>  extern struct fwd_engine * fwd_engines[]; /**< NULL terminated array. */
> 
>  /**
> diff --git a/app/test-pmd/txprep.c b/app/test-pmd/txprep.c new file mode 100644 index 0000000..688927e
> --- /dev/null
> +++ b/app/test-pmd/txprep.c
> @@ -0,0 +1,412 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <inttypes.h>
> +
> +#include <sys/queue.h>
> +#include <sys/stat.h>
> +
> +#include <rte_common.h>
> +#include <rte_byteorder.h>
> +#include <rte_log.h>
> +#include <rte_debug.h>
> +#include <rte_cycles.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_launch.h>
> +#include <rte_eal.h>
> +#include <rte_per_lcore.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_ring.h>
> +#include <rte_memory.h>
> +#include <rte_memcpy.h>
> +#include <rte_mempool.h>
> +#include <rte_mbuf.h>
> +#include <rte_interrupts.h>
> +#include <rte_pci.h>
> +#include <rte_ether.h>
> +#include <rte_ethdev.h>
> +#include <rte_string_fns.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +#include <rte_udp.h>
> +
> +#include "testpmd.h"
> +
> +/* We cannot use rte_cpu_to_be_16() on a constant in a switch/case */
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN #define _htons(x)
> +((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8))) #else
> +#define _htons(x) (x) #endif
> +
> +/*
> + * Helper function.
> + * Performs actual copying.
> + * Returns number of segments in the destination mbuf on success,
> + * or negative error code on failure.
> + */
> +static int
> +mbuf_copy_split(const struct rte_mbuf *ms, struct rte_mbuf *md[],
> +	uint16_t seglen[], uint8_t nb_seg)
> +{
> +	uint32_t dlen, slen, tlen;
> +	uint32_t i, len;
> +	const struct rte_mbuf *m;
> +	const uint8_t *src;
> +	uint8_t *dst;
> +
> +	dlen = 0;
> +	slen = 0;
> +	tlen = 0;
> +
> +	dst = NULL;
> +	src = NULL;
> +
> +	m = ms;
> +	i = 0;
> +	while (ms != NULL && i != nb_seg) {
> +
> +		if (slen == 0) {
> +			slen = rte_pktmbuf_data_len(ms);
> +			src = rte_pktmbuf_mtod(ms, const uint8_t *);
> +		}
> +
> +		if (dlen == 0) {
> +			dlen = RTE_MIN(seglen[i], slen);
> +			md[i]->data_len = dlen;
> +			md[i]->next = (i + 1 == nb_seg) ? NULL : md[i + 1];
> +			dst = rte_pktmbuf_mtod(md[i], uint8_t *);
> +		}
> +
> +		len = RTE_MIN(slen, dlen);
> +		memcpy(dst, src, len);
> +		tlen += len;
> +		slen -= len;
> +		dlen -= len;
> +		src += len;
> +		dst += len;
> +
> +		if (slen == 0)
> +			ms = ms->next;
> +		if (dlen == 0)
> +			i++;
> +	}
> +
> +	if (ms != NULL)
> +		return -ENOBUFS;
> +	else if (tlen != m->pkt_len)
> +		return -EINVAL;
> +
> +	md[0]->nb_segs = nb_seg;
> +	md[0]->pkt_len = tlen;
> +	md[0]->vlan_tci = m->vlan_tci;
> +	md[0]->vlan_tci_outer = m->vlan_tci_outer;
> +	md[0]->ol_flags = m->ol_flags;
> +	md[0]->tx_offload = m->tx_offload;
> +
> +	return nb_seg;
> +}
> +
> +/*
> + * Allocate a new mbuf with up to tx_pkt_nb_segs segments.
> + * Copy packet contents and offload information into then new segmented mbuf.
> + */
> +static struct rte_mbuf *
> +pkt_copy_split(const struct rte_mbuf *pkt) {
> +	int32_t n, rc;
> +	uint32_t i, len, nb_seg;
> +	struct rte_mempool *mp;
> +	uint16_t seglen[RTE_MAX_SEGS_PER_PKT];
> +	struct rte_mbuf *p, *md[RTE_MAX_SEGS_PER_PKT];
> +
> +	mp = current_fwd_lcore()->mbp;
> +
> +	if (tx_pkt_split == TX_PKT_SPLIT_RND)
> +		nb_seg = random() % tx_pkt_nb_segs + 1;
> +	else
> +		nb_seg = tx_pkt_nb_segs;
> +
> +	memcpy(seglen, tx_pkt_seg_lengths, nb_seg * sizeof(seglen[0]));
> +
> +	/* calculate number of segments to use and their length. */
> +	len = 0;
> +	for (i = 0; i != nb_seg && len < pkt->pkt_len; i++) {
> +		len += seglen[i];
> +		md[i] = NULL;
> +	}
> +
> +	n = pkt->pkt_len - len;
> +
> +	/* update size of the last segment to fit rest of the packet */
> +	if (n >= 0) {
> +		seglen[i - 1] += n;
> +		len += n;
> +	}
> +
> +	nb_seg = i;
> +	while (i != 0) {
> +		p = rte_pktmbuf_alloc(mp);
> +		if (p == NULL) {
> +			RTE_LOG(ERR, USER1,
> +				"failed to allocate %u-th of %u mbuf "
> +				"from mempool: %s\n",
> +				nb_seg - i, nb_seg, mp->name);
> +			break;
> +		}
> +
> +		md[--i] = p;
> +		if (rte_pktmbuf_tailroom(md[i]) < seglen[i]) {
> +			RTE_LOG(ERR, USER1, "mempool %s, %u-th segment: "
> +				"expected seglen: %u, "
> +				"actual mbuf tailroom: %u\n",
> +				mp->name, i, seglen[i],
> +				rte_pktmbuf_tailroom(md[i]));
> +			break;
> +		}
> +	}
> +
> +	/* all mbufs successfully allocated, do copy */
> +	if (i == 0) {
> +		rc = mbuf_copy_split(pkt, md, seglen, nb_seg);
> +		if (rc < 0)
> +			RTE_LOG(ERR, USER1,
> +				"mbuf_copy_split for %p(len=%u, nb_seg=%hhu) "
> +				"into %u segments failed with error code: %d\n",
> +				pkt, pkt->pkt_len, pkt->nb_segs, nb_seg, rc);
> +
> +		/* figure out how many mbufs to free. */
> +		i = RTE_MAX(rc, 0);
> +	}
> +
> +	/* free unused mbufs */
> +	for (; i != nb_seg; i++) {
> +		rte_pktmbuf_free_seg(md[i]);
> +		md[i] = NULL;
> +	}
> +
> +	return md[0];
> +}
> +
> +/*
> + * Forwarding of packets in I/O mode.
> + * Forward packets with tx_prep.
> + * This is the fastest possible forwarding operation, as it does not
> +access
> + * to packets data.
> + */
> +static void
> +pkt_burst_txprep_forward(struct fwd_stream *fs) {
> +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> +	struct rte_mbuf *p;
> +	struct rte_port *txp;
> +	int i;
> +	uint16_t nb_rx;
> +	uint16_t nb_prep;
> +	uint16_t nb_tx;
> +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> +	uint64_t start_tsc;
> +	uint64_t end_tsc;
> +	uint64_t core_cycles;
> +#endif
> +	uint16_t tso_segsz = 0;
> +	uint64_t ol_flags = 0;
> +
> +	struct ether_hdr *eth_hdr;
> +	struct vlan_hdr *vlan_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	char *l3_hdr = NULL;
> +
> +	uint8_t l4_proto = 0;
> +
> +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> +	start_tsc = rte_rdtsc();
> +#endif
> +
> +	/*
> +	 * Receive a burst of packets and forward them.
> +	 */
> +	nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, pkts_burst,
> +			nb_pkt_per_burst);
> +	if (unlikely(nb_rx == 0))
> +		return;
> +
> +	txp = &ports[fs->tx_port];
> +	tso_segsz = txp->tso_segsz;
> +
> +	for (i = 0; i < nb_rx; i++) {
> +
> +		eth_hdr = rte_pktmbuf_mtod(pkts_burst[i], struct ether_hdr *);
> +		ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
> +				&eth_hdr->d_addr);
> +		ether_addr_copy(&ports[fs->tx_port].eth_addr,
> +				&eth_hdr->s_addr);
> +
> +		uint16_t ether_type = eth_hdr->ether_type;
> +
> +		pkts_burst[i]->l2_len = sizeof(struct ether_hdr);
> +
> +		ol_flags = 0;
> +
> +		if (tso_segsz > 0)
> +			ol_flags |= PKT_TX_TCP_SEG;
> +
> +		if (ether_type == _htons(ETHER_TYPE_VLAN)) {
> +			ol_flags |= PKT_TX_VLAN_PKT;
> +			vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
> +			pkts_burst[i]->l2_len += sizeof(struct vlan_hdr);
> +			ether_type = vlan_hdr->eth_proto;
> +		}
> +
> +		switch (ether_type) {
> +		case _htons(ETHER_TYPE_IPv4):
> +			ol_flags |= (PKT_TX_IPV4 | PKT_TX_IP_CKSUM);
> +			pkts_burst[i]->l3_len = sizeof(struct ipv4_hdr);
> +			pkts_burst[i]->l4_len = sizeof(struct tcp_hdr);
> +
> +			ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr +
> +					pkts_burst[i]->l2_len);
> +			l3_hdr = (char *)ipv4_hdr;
> +			pkts_burst[i]->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
> +			l4_proto = ipv4_hdr->next_proto_id;
> +
> +			break;
> +		case _htons(ETHER_TYPE_IPv6):
> +			ol_flags |= PKT_TX_IPV6;
> +
> +			ipv6_hdr = (struct ipv6_hdr *)((char *)eth_hdr +
> +					pkts_burst[i]->l2_len);
> +			l3_hdr = (char *)ipv6_hdr;
> +			l4_proto = ipv6_hdr->proto;
> +			pkts_burst[i]->l3_len = sizeof(struct ipv6_hdr);
> +			break;
> +		default:
> +			printf("Unknown packet type\n");
> +			break;
> +		}
> +
> +		if (l4_proto == IPPROTO_TCP) {
> +			ol_flags |= PKT_TX_TCP_CKSUM;
> +			tcp_hdr = (struct tcp_hdr *)(l3_hdr + pkts_burst[i]->l3_len);
> +			pkts_burst[i]->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
> +		} else if (l4_proto == IPPROTO_UDP) {
> +			ol_flags |= PKT_TX_UDP_CKSUM;
> +			pkts_burst[i]->l4_len = sizeof(struct udp_hdr);
> +		}
> +
> +		pkts_burst[i]->tso_segsz = tso_segsz;
> +		pkts_burst[i]->ol_flags = ol_flags;
> +
> +		/* Do split & copy for the packet. */
> +		if (tx_pkt_split != TX_PKT_SPLIT_OFF) {
> +			p = pkt_copy_split(pkts_burst[i]);
> +			if (p != NULL) {
> +				rte_pktmbuf_free(pkts_burst[i]);
> +				pkts_burst[i] = p;
> +			}
> +		}
> +
> +		/* if verbose mode is enabled, dump debug info */
> +		if (verbose_level > 0) {
> +			printf("l2_len=%d, l3_len=%d, l4_len=%d, nb_segs=%d, tso_segz=%d\n",
> +					pkts_burst[i]->l2_len, pkts_burst[i]->l3_len,
> +					pkts_burst[i]->l4_len, pkts_burst[i]->nb_segs,
> +					pkts_burst[i]->tso_segsz);
> +		}
> +	}
> +
> +	/*
> +	 * Prepare burst to transmit
> +	 */
> +	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
> +nb_rx);
> +
> +	if (nb_prep < nb_rx)
> +		printf("Preparing packet burst to transmit failed: %s\n",
> +				rte_strerror(rte_errno));
> +
> +#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
> +	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
> +#endif
> +	fs->rx_packets += nb_rx;
> +	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
> +	fs->tx_packets += nb_tx;
> +#ifdef RTE_TEST_PMD_RECORD_BURST_STATS
> +	fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
> +#endif
> +	if (unlikely(nb_tx < nb_rx)) {
> +		fs->fwd_dropped += (nb_rx - nb_tx);
> +		do {
> +			rte_pktmbuf_free(pkts_burst[nb_tx]);
> +		} while (++nb_tx < nb_rx);
> +	}
> +#ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
> +	end_tsc = rte_rdtsc();
> +	core_cycles = (end_tsc - start_tsc);
> +	fs->core_cycles = (uint64_t) (fs->core_cycles + core_cycles); #endif }
> +
> +static void
> +txprep_fwd_begin(portid_t pi)
> +{
> +	struct rte_eth_dev_info dev_info;
> +
> +	rte_eth_dev_info_get(pi, &dev_info);
> +	printf("  nb_seg_max=%d, nb_mtu_seg_max=%d\n",
> +			dev_info.tx_desc_lim.nb_seg_max,
> +			dev_info.tx_desc_lim.nb_mtu_seg_max);
> +}
> +
> +static void
> +txprep_fwd_end(portid_t pi __rte_unused) {
> +	printf("txprep_fwd_end\n");
> +}
> +
> +struct fwd_engine txprep_fwd_engine = {
> +	.fwd_mode_name  = "txprep",
> +	.port_fwd_begin = txprep_fwd_begin,
> +	.port_fwd_end   = txprep_fwd_end,
> +	.packet_fwd     = pkt_burst_txprep_forward,
> +};
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-19 13:03     ` Ananyev, Konstantin
  2016-09-19 15:29       ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-19 13:03 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: jerin.jacob

Hi Tomasz,

> 
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:
> 
> int rte_validate_tx_offload(struct rte_mbuf *m)
> 	to validate general requirements for tx offload in packet such a
> 	flag completness. In current implementation this funtion is called
> 	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> 
> int rte_phdr_cksum_fix(struct rte_mbuf *m)
> 	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> 	before hardware tx checksum offload.
> 	 - for non-TSO tcp/udp packets full pseudo-header checksum is
> 	   counted and set.
> 	 - for TSO the IP payload length is not included.
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  config/common_base            |    1 +
>  lib/librte_ether/rte_ethdev.h |   85 ++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h    |    8 +++
>  lib/librte_net/Makefile       |    2 +-
>  lib/librte_net/rte_pkt.h      |  132 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 227 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_net/rte_pkt.h
> 
> diff --git a/config/common_base b/config/common_base
> index 7830535..7ada9e0 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
>  CONFIG_RTE_LIBRTE_IEEE1588=n
>  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
>  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> +CONFIG_RTE_ETHDEV_TX_PREP=y
> 
>  #
>  # Support NIC bypass logic
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index b0fe033..4fa674d 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -182,6 +182,7 @@ extern "C" {
>  #include <rte_pci.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
> +#include <rte_errno.h>
>  #include "rte_ether.h"
>  #include "rte_eth_ctrl.h"
>  #include "rte_dev_info.h"
> @@ -696,6 +697,8 @@ struct rte_eth_desc_lim {
>  	uint16_t nb_max;   /**< Max allowed number of descriptors. */
>  	uint16_t nb_min;   /**< Min allowed number of descriptors. */
>  	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
> +	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
>  };
> 
>  /**
> @@ -1181,6 +1184,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
>  				   uint16_t nb_pkts);
>  /**< @internal Send output packets on a transmit queue of an Ethernet device. */
> 
> +typedef uint16_t (*eth_tx_prep_t)(void *txq,
> +				   struct rte_mbuf **tx_pkts,
> +				   uint16_t nb_pkts);
> +/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
> +
>  typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
>  			       struct rte_eth_fc_conf *fc_conf);
>  /**< @internal Get current flow control parameter on an Ethernet device */
> @@ -1626,6 +1634,7 @@ enum rte_eth_dev_type {
>  struct rte_eth_dev {
>  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> @@ -2833,6 +2842,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
>  }
> 
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet device.
> + *
> + * The rte_eth_tx_prep() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prep() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.
> + *
> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.
> + *
> + * - Update and/or reset required checksums when tx offload is set for packet.
> + *
> + * The rte_eth_tx_prep() function returns the number of packets ready to be
> + * sent. A return value equal to *nb_pkts* means that all packets are valid and
> + * ready to be sent.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_pkts
> + *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
> + *   which contain the output packets.
> + * @param nb_pkts
> + *   The maximum number of packets to process.
> + * @return
> + *   The number of packets correct and ready to be sent. The return value can be
> + *   less than the value of the *tx_pkts* parameter when some packet doesn't
> + *   meet devices requirements with rte_errno set appropriately.
> + */
> +
> +#ifdef RTE_ETHDEV_TX_PREP

Sorry for being a bit late on that discussion, but what the point of having
that config macro (RTE_ETHDEV_TX_PREP ) at all?
As I can see right now, if driver doesn't setup tx_pkt_prep, then nb_pkts
would be return anyway...

BTW, there is my another question - should it be that way?
Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if dev->tx_pk_prep == NULL?

> +
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	if (!dev->tx_pkt_prep)
> +		return nb_pkts;
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +	if (queue_id >= dev->data->nb_tx_queues) {
> +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> +		rte_errno = -EINVAL;
> +		return 0;
> +	}
> +#endif
> +
> +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> +			tx_pkts, nb_pkts);
> +}
> +
> +#else
> +
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> +{
> +	return nb_pkts;
> +}
> +
> +#endif
> +
>  typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
>  		void *userdata);
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 7ea66ed..72fd352 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -211,6 +211,14 @@ extern "C" {
>   */
>  #define PKT_TX_OUTER_IPV4   (1ULL << 59)
> 
> +#define PKT_TX_OFFLOAD_MASK (    \
> +		PKT_TX_IP_CKSUM |        \
> +		PKT_TX_L4_MASK |         \
> +		PKT_TX_OUTER_IP_CKSUM |  \
> +		PKT_TX_TCP_SEG |         \
> +		PKT_TX_QINQ_PKT |        \
> +		PKT_TX_VLAN_PKT)
> +
>  /**
>   * Packet outer header is IPv6. This flag must be set when using any
>   * outer offload feature (L4 checksum) to tell the NIC that the outer
> diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
> index ad2e482..b5abe84 100644
> --- a/lib/librte_net/Makefile
> +++ b/lib/librte_net/Makefile
> @@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> 
>  # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
> 
> 
>  include $(RTE_SDK)/mk/rte.install.mk
> diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
> new file mode 100644
> index 0000000..a3c3e3c
> --- /dev/null
> +++ b/lib/librte_net/rte_pkt.h
> @@ -0,0 +1,132 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_PKT_H_
> +#define _RTE_PKT_H_
> +
> +#include <rte_ip.h>
> +#include <rte_udp.h>
> +#include <rte_tcp.h>
> +#include <rte_sctp.h>
> +
> +/**
> + * Validate general requirements for tx offload in packet.
> + */
> +static inline int
> +rte_validate_tx_offload(struct rte_mbuf *m)
> +{
> +	uint64_t ol_flags = m->ol_flags;
> +
> +	/* Does packet set any of available offloads? */
> +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> +		return 0;
> +
> +	/* IP checksum can be counted only for IPv4 packet */
> +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> +		return -EINVAL;
> +
> +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> +		/* IP type not set */
> +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> +			return -EINVAL;
> +
> +	if (ol_flags & PKT_TX_TCP_SEG) {
> +
> +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> +		if ((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM))
> +			return -EINVAL;
> +
> +		if (m->tso_segsz == 0)
> +			return -EINVAL;
> +
> +	}


I suppose these 2 if(ol_flags & PKT_TX_TCP_SEG) above could be united into one.

> +
> +	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
> +	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/**
> + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
> + * hardware tx checksum.
> + * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
> + * For TSO the IP payload length is not included.
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct udp_hdr *udp_hdr;
> +
> +	if (m->ol_flags & PKT_TX_IPV4) {
> +		ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> +
> +		if (m->ol_flags & PKT_TX_IP_CKSUM)
> +			ipv4_hdr->hdr_checksum = 0;
> +
> +		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> +			/* non-TSO udp */
> +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
> +					m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
> +		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
> +				(m->ol_flags & PKT_TX_TCP_SEG)) {
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
> +					m->l3_len);
> +			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
> +		}
> +	} else if (m->ol_flags & PKT_TX_IPV6) {
> +		ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *, m->l2_len);
> +
> +		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> +			/* non-TSO udp */
> +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *, m->l2_len +
> +					m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
> +		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
> +				(m->ol_flags & PKT_TX_TCP_SEG)) {
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *, m->l2_len +
> +					m->l3_len);
> +			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
> +		}
> +	}
> +	return 0;
> +}

We probably need to take into account that in some cases outer_l*_len could be setup here
(in case of tunneled packets).

Konstantin

> +
> +#endif /* _RTE_PKT_H_ */
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
  2016-09-19 12:54     ` Ananyev, Konstantin
@ 2016-09-19 13:58       ` Kulasek, TomaszX
  2016-09-19 15:23         ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-09-19 13:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jerin.jacob

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, September 19, 2016 14:55
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: jerin.jacob@caviumnetworks.com
> Subject: RE: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
> 
> 
> Hi Tomasz,
> 

[...]

> > +uint16_t
> > +ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> > +nb_pkts) {
> > +	int i, ret;
> > +	struct rte_mbuf *m;
> > +	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		m = tx_pkts[i];
> > +
> > +		/**
> > +		 * Check if packet meets requirements for number of segments
> > +		 *
> > +		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and
> non-TSO
> > +		 */
> > +
> > +		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +
> > +		if ((m->ol_flags & PKT_TX_OFFLOAD_MASK) !=
> > +				(m->ol_flags & IXGBE_TX_OFFLOAD_MASK)) {
> 
> 
> As a nit, it probably makes sense to:
> #define IXGBE_TX_OFFLOAD_NOTSUP_MASK (PKT_TX_OFFLOAD_MASK ^
> IXGBE_TX_OFFLOAD_MASK)
> 
> and then here:
> (m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK)
> 
> Might help to save few cycles.
> 

Ok.

> 
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +
> > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > +		ret = rte_validate_tx_offload(m);
> > +		if (ret != 0) {
> > +			rte_errno = ret;
> > +			return i;
> > +		}
> > +#endif
> > +		ret = rte_phdr_cksum_fix(m);
> 
> We probable need to update rte_phdr_cksum_fix() to take into account
> tx_offload outer lengths in case PKT_TX_OUTER_IP_CKSUM is defined.
> As both ixgbe and i40e can do it these days.
> Sorry for not spotting that earlier.
> 

Ok.

> 
> > +		if (ret != 0) {
> > +			rte_errno = ret;
> > +			return i;
> > +		}
> > +	}
> > +
> > +	return i;
> > +}
> > +

[...]

> >
> > **********************************************************************
> > / @@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev,
> > struct ixgbe_tx_queue *txq)
> >  		} else
> >  #endif
> >  		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> > +		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> 
> Shouldn't we setup ixgbe_prep_pkts_simple when vTX is selected too?
> 


It is, but source code is formatted like below:

#ifdef RTE_IXGBE_INC_VECTOR
		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
					ixgbe_txq_vec_setup(txq) == 0)) {
			PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
			dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;
		} else
#endif
		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;


> >  	} else {
> >  		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
> >  		PMD_INIT_LOG(DEBUG,
> > @@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev,
> struct ixgbe_tx_queue *txq)
> >  				(unsigned long)txq->tx_rs_thresh,
> >  				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
> >  		dev->tx_pkt_burst = ixgbe_xmit_pkts;
> > +		dev->tx_pkt_prep = ixgbe_prep_pkts;
> >  	}
> >  }
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h
> > b/drivers/net/ixgbe/ixgbe_rxtx.h index 2608b36..7bbd9b8 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > @@ -80,6 +80,8 @@
> >  #define RTE_IXGBE_WAIT_100_US               100
> >  #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
> >
> > +#define IXGBE_TX_MAX_SEG                    40
> > +
> >  #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
> >  #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
> >  #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
> > --
> > 1.7.9.5

Tomasz.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
  2016-09-19 13:58       ` Kulasek, TomaszX
@ 2016-09-19 15:23         ` Ananyev, Konstantin
  2016-09-20  7:15           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-19 15:23 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: jerin.jacob

> 
> [...]
> 
> > >
> > > ********************************************************************
> > > ** / @@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct rte_eth_dev
> > > *dev, struct ixgbe_tx_queue *txq)
> > >  		} else
> > >  #endif
> > >  		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> > > +		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> >
> > Shouldn't we setup ixgbe_prep_pkts_simple when vTX is selected too?
> >
> 
> 
> It is, but source code is formatted like below:
> 
> #ifdef RTE_IXGBE_INC_VECTOR
> 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
> 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
> 					ixgbe_txq_vec_setup(txq) == 0)) {
> 			PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
> 			dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;

Yep, so I thought we need a:
dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
here too, no?

Konstantin


> 		} else
> #endif
> 		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> 		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> 
> 
> > >  	} else {
> > >  		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
> > >  		PMD_INIT_LOG(DEBUG,
> > > @@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev,
> > struct ixgbe_tx_queue *txq)
> > >  				(unsigned long)txq->tx_rs_thresh,
> > >  				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
> > >  		dev->tx_pkt_burst = ixgbe_xmit_pkts;
> > > +		dev->tx_pkt_prep = ixgbe_prep_pkts;
> > >  	}
> > >  }
> > >
> > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > b/drivers/net/ixgbe/ixgbe_rxtx.h index 2608b36..7bbd9b8 100644
> > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > @@ -80,6 +80,8 @@
> > >  #define RTE_IXGBE_WAIT_100_US               100
> > >  #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
> > >
> > > +#define IXGBE_TX_MAX_SEG                    40
> > > +
> > >  #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
> > >  #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
> > >  #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
> > > --
> > > 1.7.9.5
> 
> Tomasz.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-19 13:03     ` Ananyev, Konstantin
@ 2016-09-19 15:29       ` Kulasek, TomaszX
  2016-09-19 16:06         ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-09-19 15:29 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: jerin.jacob

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, September 19, 2016 15:03
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: jerin.jacob@caviumnetworks.com
> Subject: RE: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> >

[...]

> > +
> > +#ifdef RTE_ETHDEV_TX_PREP
> 
> Sorry for being a bit late on that discussion, but what the point of
> having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> As I can see right now, if driver doesn't setup tx_pkt_prep, then nb_pkts
> would be return anyway...
> 
> BTW, there is my another question - should it be that way?
> Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if dev->tx_pk_prep
> == NULL?
> 

It's an answer to the Jerin's request discussed here: http://dpdk.org/ml/archives/dev/2016-September/046437.html

When driver doesn't support tx_prep, default behavior is "we don't know requirements, so we have nothing to do here". It will simplify application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.

As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same thread, I still don't think It's the best solution of the problem described by him. I have added it here for further discussion.

Jerin, have you something to add?

Tomasz.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-19 15:29       ` Kulasek, TomaszX
@ 2016-09-19 16:06         ` Jerin Jacob
  2016-09-20  9:06           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Jerin Jacob @ 2016-09-19 16:06 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: Ananyev, Konstantin, dev

On Mon, Sep 19, 2016 at 03:29:07PM +0000, Kulasek, TomaszX wrote:
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Monday, September 19, 2016 15:03
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Cc: jerin.jacob@caviumnetworks.com
> > Subject: RE: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
> > 
> > Hi Tomasz,
> > 
> > >
> 
> [...]
> 
> > > +
> > > +#ifdef RTE_ETHDEV_TX_PREP
> > 
> > Sorry for being a bit late on that discussion, but what the point of
> > having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > As I can see right now, if driver doesn't setup tx_pkt_prep, then nb_pkts
> > would be return anyway...
> > 
> > BTW, there is my another question - should it be that way?
> > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if dev->tx_pk_prep
> > == NULL?
> > 
> 
> It's an answer to the Jerin's request discussed here: http://dpdk.org/ml/archives/dev/2016-September/046437.html
> 
> When driver doesn't support tx_prep, default behavior is "we don't know requirements, so we have nothing to do here". It will simplify application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> 
> As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same thread, I still don't think It's the best solution of the problem described by him. I have added it here for further discussion.
> 
> Jerin, have you something to add?

Nothing very specific to add here. I think, I have tried to share the rational in,
http://dpdk.org/ml/archives/dev/2016-September/046437.html

> 
> Tomasz.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
  2016-09-19 15:23         ` Ananyev, Konstantin
@ 2016-09-20  7:15           ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-20  7:15 UTC (permalink / raw)
  To: Kulasek, TomaszX, 'dev@dpdk.org'
  Cc: 'jerin.jacob@caviumnetworks.com'



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, September 19, 2016 4:24 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: jerin.jacob@caviumnetworks.com
> Subject: RE: [dpdk-dev] [PATCH v2 5/6] ixgbe: add Tx preparation
> 
> >
> > [...]
> >
> > > >
> > > > ******************************************************************
> > > > **
> > > > ** / @@ -2290,6 +2369,7 @@ ixgbe_set_tx_function(struct
> > > > rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
> > > >  		} else
> > > >  #endif
> > > >  		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> > > > +		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> > >
> > > Shouldn't we setup ixgbe_prep_pkts_simple when vTX is selected too?
> > >
> >
> >
> > It is, but source code is formatted like below:
> >
> > #ifdef RTE_IXGBE_INC_VECTOR
> > 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
> > 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
> > 					ixgbe_txq_vec_setup(txq) == 0)) {
> > 			PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
> > 			dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;
> 
> Yep, so I thought we need a:
> dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> here too, no?

Or if we decide, not to setup tx_pkt_prep at all for simple TX path -
set NULL for both?
But I think for both simple and vector TX the tx_prep value should be the same.
Konstantin


> 
> Konstantin
> 
> 
> > 		} else
> > #endif
> > 		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
> > 		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
> >
> >
> > > >  	} else {
> > > >  		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
> > > >  		PMD_INIT_LOG(DEBUG,
> > > > @@ -2301,6 +2381,7 @@ ixgbe_set_tx_function(struct rte_eth_dev
> > > > *dev,
> > > struct ixgbe_tx_queue *txq)
> > > >  				(unsigned long)txq->tx_rs_thresh,
> > > >  				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
> > > >  		dev->tx_pkt_burst = ixgbe_xmit_pkts;
> > > > +		dev->tx_pkt_prep = ixgbe_prep_pkts;
> > > >  	}
> > > >  }
> > > >
> > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index 2608b36..7bbd9b8 100644
> > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > @@ -80,6 +80,8 @@
> > > >  #define RTE_IXGBE_WAIT_100_US               100
> > > >  #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
> > > >
> > > > +#define IXGBE_TX_MAX_SEG                    40
> > > > +
> > > >  #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
> > > >  #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
> > > >  #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
> > > > --
> > > > 1.7.9.5
> >
> > Tomasz.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-19 16:06         ` Jerin Jacob
@ 2016-09-20  9:06           ` Ananyev, Konstantin
  2016-09-21  8:29             ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-20  9:06 UTC (permalink / raw)
  To: Jerin Jacob, Kulasek, TomaszX; +Cc: dev


Hi Jerin,

> > > >
> >
> > [...]
> >
> > > > +
> > > > +#ifdef RTE_ETHDEV_TX_PREP
> > >
> > > Sorry for being a bit late on that discussion, but what the point of
> > > having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > As I can see right now, if driver doesn't setup tx_pkt_prep, then
> > > nb_pkts would be return anyway...
> > >
> > > BTW, there is my another question - should it be that way?
> > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > dev->tx_pk_prep == NULL?
> > >
> >
> > It's an answer to the Jerin's request discussed here:
> > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> >
> > When driver doesn't support tx_prep, default behavior is "we don't know requirements, so we have nothing to do here". It will simplify
> application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> >
> > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same thread, I still don't think It's the best solution of the problem
> described by him. I have added it here for further discussion.
> >
> > Jerin, have you something to add?
> 
> Nothing very specific to add here. I think, I have tried to share the rational in, http://dpdk.org/ml/archives/dev/2016-
> September/046437.html
> 

Ok, not sure I am fully understand your intention here.
I think I understand why you propose rte_eth_tx_prep() to do:
	if (!dev->tx_pkt_prep)
		return nb_pkts;

That allows drivers to NOOP the tx_prep functionality without paying the
price for callback invocation.
What I don't understand, why with that in place we also need a NOOP for
the whole rte_eth_tx_prep():
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
+		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif

What are you trying to save here: just reading ' dev->tx_pkt_prep'?
If so, then it seems not that performance critical for me.
Something else?
>From my point of view NOOP on the driver level is more than enough.
Again I would prefer to introduce new config option, if possible.

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-20  9:06           ` Ananyev, Konstantin
@ 2016-09-21  8:29             ` Jerin Jacob
  2016-09-22  9:36               ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Jerin Jacob @ 2016-09-21  8:29 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, dev

On Tue, Sep 20, 2016 at 09:06:42AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Jerin,

Hi Konstantin,

> 
> > > > >
> > >
> > > [...]
> > >
> > > > > +
> > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > >
> > > > Sorry for being a bit late on that discussion, but what the point of
> > > > having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > As I can see right now, if driver doesn't setup tx_pkt_prep, then
> > > > nb_pkts would be return anyway...
> > > >
> > > > BTW, there is my another question - should it be that way?
> > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > dev->tx_pk_prep == NULL?
> > > >
> > >
> > > It's an answer to the Jerin's request discussed here:
> > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > >
> > > When driver doesn't support tx_prep, default behavior is "we don't know requirements, so we have nothing to do here". It will simplify
> > application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> > >
> > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same thread, I still don't think It's the best solution of the problem
> > described by him. I have added it here for further discussion.
> > >
> > > Jerin, have you something to add?
> > 
> > Nothing very specific to add here. I think, I have tried to share the rational in, http://dpdk.org/ml/archives/dev/2016-
> > September/046437.html
> > 
> 
> Ok, not sure I am fully understand your intention here.
> I think I understand why you propose rte_eth_tx_prep() to do:
> 	if (!dev->tx_pkt_prep)
> 		return nb_pkts;
> 
> That allows drivers to NOOP the tx_prep functionality without paying the
> price for callback invocation.

In true sense, returning the nb_pkts makes it functional NOP as well(i.e The PMD
does not have any limitation on Tx side, so all packets are _good_(no
preparation is required))


> What I don't understand, why with that in place we also need a NOOP for
> the whole rte_eth_tx_prep():
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> +{
> +	return nb_pkts;
> +}
> +
> +#endif
> 
> What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> If so, then it seems not that performance critical for me.
> Something else?

The proposed scheme can make it as true NOOP from compiler perspective too if a
target decided to do that,
I have checked the instruction generation with arm Assembly, a non true
compiler NOOP has following instructions overhead at minimum.

	# 1 load
	# 1  mov
	if (!dev->tx_pkt_prep)
		return nb_pkts;

	# compile can't predict this function needs be executed or not so
	# pressure on register allocation and mostly likely it call for
	# some stack push and pop based load on outer function(as it is an
	# inline function)

	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);

	# 1 branch
	if (unlikely(nb_prep < nb_rx)) {
		# bunch of code not executed, but pressure on i cache
		int i;
		for (i = nb_prep; i < nb_rx; i++)
	                 rte_pktmbuf_free(pkts_burst[i]);
	}

>From a server target(IA or high-end armv8) with external PCIe based system makes sense to have
RTE_ETHDEV_TX_PREP option enabled(which is the case in proposed patch) but
the low end arm platforms with
- limited cores
- less i cache
- IPC == 1
- running around 1GHz
- most importantly, _integrated_ nic controller with no external PCIE
  support
does not make much sense to waste the cycles/time for it.
cycle saved is cycle earned.

Since DPDK compilation is based on _target_, I really don't see any
issue with this approach nor it does not hurt anything on server target.
So, IMO, It should be upto the target to decide what works better for the target.

Jerin

> From my point of view NOOP on the driver level is more than enough.
> Again I would prefer to introduce new config option, if possible.
> 
> Konstantin
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-21  8:29             ` Jerin Jacob
@ 2016-09-22  9:36               ` Ananyev, Konstantin
  2016-09-22  9:59                 ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-22  9:36 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Kulasek, TomaszX, dev


Hi Jerin,

> >
> > Hi Jerin,
> 
> Hi Konstantin,
> 
> >
> > > > > >
> > > >
> > > > [...]
> > > >
> > > > > > +
> > > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > > >
> > > > > Sorry for being a bit late on that discussion, but what the
> > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > > As I can see right now, if driver doesn't setup tx_pkt_prep,
> > > > > then nb_pkts would be return anyway...
> > > > >
> > > > > BTW, there is my another question - should it be that way?
> > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > > dev->tx_pk_prep == NULL?
> > > > >
> > > >
> > > > It's an answer to the Jerin's request discussed here:
> > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > > >
> > > > When driver doesn't support tx_prep, default behavior is "we don't
> > > > know requirements, so we have nothing to do here". It will
> > > > simplify
> > > application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> > > >
> > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same
> > > > thread, I still don't think It's the best solution of the problem
> > > described by him. I have added it here for further discussion.
> > > >
> > > > Jerin, have you something to add?
> > >
> > > Nothing very specific to add here. I think, I have tried to share
> > > the rational in, http://dpdk.org/ml/archives/dev/2016-
> > > September/046437.html
> > >
> >
> > Ok, not sure I am fully understand your intention here.
> > I think I understand why you propose rte_eth_tx_prep() to do:
> > 	if (!dev->tx_pkt_prep)
> > 		return nb_pkts;
> >
> > That allows drivers to NOOP the tx_prep functionality without paying
> > the price for callback invocation.
> 
> In true sense, returning the nb_pkts makes it functional NOP as well(i.e The PMD does not have any limitation on Tx side, so all
> packets are _good_(no preparation is required))
> 
> 
> > What I don't understand, why with that in place we also need a NOOP
> > for the whole rte_eth_tx_prep():
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) {
> > +	return nb_pkts;
> > +}
> > +
> > +#endif
> >
> > What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> > If so, then it seems not that performance critical for me.
> > Something else?
> 
> The proposed scheme can make it as true NOOP from compiler perspective too if a target decided to do that, I have checked the
> instruction generation with arm Assembly, a non true compiler NOOP has following instructions overhead at minimum.
> 
> 	# 1 load
> 	# 1  mov
> 	if (!dev->tx_pkt_prep)
> 		return nb_pkts;

Yep.

> 
> 	# compile can't predict this function needs be executed or not so
> 	# pressure on register allocation and mostly likely it call for
> 	# some stack push and pop based load on outer function(as it is an
> 	# inline function)


Well, I suppose compiler wouldn't try to fill function argument registers before the branch above. 

> 
> 	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
> 
> 	# 1 branch
> 	if (unlikely(nb_prep < nb_rx)) {
> 		# bunch of code not executed, but pressure on i cache
> 		int i;
> 		for (i = nb_prep; i < nb_rx; i++)
> 	                 rte_pktmbuf_free(pkts_burst[i]);
> 	}
> 
> From a server target(IA or high-end armv8) with external PCIe based system makes sense to have RTE_ETHDEV_TX_PREP option
> enabled(which is the case in proposed patch) but the low end arm platforms with
> - limited cores
> - less i cache
> - IPC == 1
> - running around 1GHz
> - most importantly, _integrated_ nic controller with no external PCIE
>   support
> does not make much sense to waste the cycles/time for it.
> cycle saved is cycle earned.

Ok, so it is all to save one memory de-refrence and a comparison plus branch.
Do you have aby estimation how badly it would hit low-end cpu performance?
The reason I am asking: obviously I would prefer to avoid to introduce new build config option
(that's the common dpdk coding practice these days) unless it is really important.  

> 
> Since DPDK compilation is based on _target_, I really don't see any issue with this approach nor it does not hurt anything on server
> target.
> So, IMO, It should be upto the target to decide what works better for the target.
> 
> Jerin
> 
> > From my point of view NOOP on the driver level is more than enough.
> > Again I would prefer to introduce new config option, if possible.
> >
> > Konstantin
> >

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-22  9:36               ` Ananyev, Konstantin
@ 2016-09-22  9:59                 ` Jerin Jacob
  2016-09-23  9:41                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Jerin Jacob @ 2016-09-22  9:59 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, dev

On Thu, Sep 22, 2016 at 09:36:15AM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,

> 
> Hi Jerin,
> 
> > >
> > > Hi Jerin,
> > 
> > Hi Konstantin,
> > 
> > >
> > > > > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > > +
> > > > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > > > >
> > > > > > Sorry for being a bit late on that discussion, but what the
> > > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > > > As I can see right now, if driver doesn't setup tx_pkt_prep,
> > > > > > then nb_pkts would be return anyway...
> > > > > >
> > > > > > BTW, there is my another question - should it be that way?
> > > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > > > dev->tx_pk_prep == NULL?
> > > > > >
> > > > >
> > > > > It's an answer to the Jerin's request discussed here:
> > > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > > > >
> > > > > When driver doesn't support tx_prep, default behavior is "we don't
> > > > > know requirements, so we have nothing to do here". It will
> > > > > simplify
> > > > application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> > > > >
> > > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same
> > > > > thread, I still don't think It's the best solution of the problem
> > > > described by him. I have added it here for further discussion.
> > > > >
> > > > > Jerin, have you something to add?
> > > >
> > > > Nothing very specific to add here. I think, I have tried to share
> > > > the rational in, http://dpdk.org/ml/archives/dev/2016-
> > > > September/046437.html
> > > >
> > >
> > > Ok, not sure I am fully understand your intention here.
> > > I think I understand why you propose rte_eth_tx_prep() to do:
> > > 	if (!dev->tx_pkt_prep)
> > > 		return nb_pkts;
> > >
> > > That allows drivers to NOOP the tx_prep functionality without paying
> > > the price for callback invocation.
> > 
> > In true sense, returning the nb_pkts makes it functional NOP as well(i.e The PMD does not have any limitation on Tx side, so all
> > packets are _good_(no preparation is required))
> > 
> > 
> > > What I don't understand, why with that in place we also need a NOOP
> > > for the whole rte_eth_tx_prep():
> > > +static inline uint16_t
> > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> > > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) {
> > > +	return nb_pkts;
> > > +}
> > > +
> > > +#endif
> > >
> > > What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> > > If so, then it seems not that performance critical for me.
> > > Something else?
> > 
> > The proposed scheme can make it as true NOOP from compiler perspective too if a target decided to do that, I have checked the
> > instruction generation with arm Assembly, a non true compiler NOOP has following instructions overhead at minimum.
> > 
> > 	# 1 load
> > 	# 1  mov
> > 	if (!dev->tx_pkt_prep)
> > 		return nb_pkts;
> 
> Yep.
> 
> > 
> > 	# compile can't predict this function needs be executed or not so
> > 	# pressure on register allocation and mostly likely it call for
> > 	# some stack push and pop based load on outer function(as it is an
> > 	# inline function)
> 
> 
> Well, I suppose compiler wouldn't try to fill function argument registers before the branch above. 

Not the case with arm gcc compiler(may be based outer function load).
The recent, external pool manager function pointer conversion
reduced around 700kpps/core in local cache mode(even though the
function pointers are not executed)

> 
> > 
> > 	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
> > 
> > 	# 1 branch
> > 	if (unlikely(nb_prep < nb_rx)) {
> > 		# bunch of code not executed, but pressure on i cache
> > 		int i;
> > 		for (i = nb_prep; i < nb_rx; i++)
> > 	                 rte_pktmbuf_free(pkts_burst[i]);
> > 	}
> > 
> > From a server target(IA or high-end armv8) with external PCIe based system makes sense to have RTE_ETHDEV_TX_PREP option
> > enabled(which is the case in proposed patch) but the low end arm platforms with
> > - limited cores
> > - less i cache
> > - IPC == 1
> > - running around 1GHz
> > - most importantly, _integrated_ nic controller with no external PCIE
> >   support
> > does not make much sense to waste the cycles/time for it.
> > cycle saved is cycle earned.
> 
> Ok, so it is all to save one memory de-refrence and a comparison plus branch.
> Do you have aby estimation how badly it would hit low-end cpu performance?

around 400kpps/core. On four core systems, around 2 mpps.(4 core with
10G*2 ports)

> The reason I am asking: obviously I would prefer to avoid to introduce new build config option
> (that's the common dpdk coding practice these days) unless it is really important.  
Practice is something we need to revisit based on the new use case/usage.
I think, the scheme of non external pcie based NW cards is new to DPDK.

> 
> > 
> > Since DPDK compilation is based on _target_, I really don't see any issue with this approach nor it does not hurt anything on server
> > target.
> > So, IMO, It should be upto the target to decide what works better for the target.
> > 
> > Jerin
> > 
> > > From my point of view NOOP on the driver level is more than enough.
> > > Again I would prefer to introduce new config option, if possible.
> > >
> > > Konstantin
> > >

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-22  9:59                 ` Jerin Jacob
@ 2016-09-23  9:41                   ` Ananyev, Konstantin
  2016-09-23 10:29                     ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-23  9:41 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Kulasek, TomaszX, dev

Hi Jerin,

> 
> Hi Konstantin,
> 
> >
> > Hi Jerin,
> >
> > > >
> > > > Hi Jerin,
> > >
> > > Hi Konstantin,
> > >
> > > >
> > > > > > > >
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > +
> > > > > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > > > > >
> > > > > > > Sorry for being a bit late on that discussion, but what the
> > > > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > > > > As I can see right now, if driver doesn't setup tx_pkt_prep,
> > > > > > > then nb_pkts would be return anyway...
> > > > > > >
> > > > > > > BTW, there is my another question - should it be that way?
> > > > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > > > > dev->tx_pk_prep == NULL?
> > > > > > >
> > > > > >
> > > > > > It's an answer to the Jerin's request discussed here:
> > > > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > > > > >
> > > > > > When driver doesn't support tx_prep, default behavior is "we
> > > > > > don't know requirements, so we have nothing to do here". It
> > > > > > will simplify
> > > > > application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> > > > > >
> > > > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the
> > > > > > same thread, I still don't think It's the best solution of the
> > > > > > problem
> > > > > described by him. I have added it here for further discussion.
> > > > > >
> > > > > > Jerin, have you something to add?
> > > > >
> > > > > Nothing very specific to add here. I think, I have tried to
> > > > > share the rational in, http://dpdk.org/ml/archives/dev/2016-
> > > > > September/046437.html
> > > > >
> > > >
> > > > Ok, not sure I am fully understand your intention here.
> > > > I think I understand why you propose rte_eth_tx_prep() to do:
> > > > 	if (!dev->tx_pkt_prep)
> > > > 		return nb_pkts;
> > > >
> > > > That allows drivers to NOOP the tx_prep functionality without
> > > > paying the price for callback invocation.
> > >
> > > In true sense, returning the nb_pkts makes it functional NOP as
> > > well(i.e The PMD does not have any limitation on Tx side, so all
> > > packets are _good_(no preparation is required))
> > >
> > >
> > > > What I don't understand, why with that in place we also need a
> > > > NOOP for the whole rte_eth_tx_prep():
> > > > +static inline uint16_t
> > > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> > > > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) {
> > > > +	return nb_pkts;
> > > > +}
> > > > +
> > > > +#endif
> > > >
> > > > What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> > > > If so, then it seems not that performance critical for me.
> > > > Something else?
> > >
> > > The proposed scheme can make it as true NOOP from compiler
> > > perspective too if a target decided to do that, I have checked the instruction generation with arm Assembly, a non true compiler
> NOOP has following instructions overhead at minimum.
> > >
> > > 	# 1 load
> > > 	# 1  mov
> > > 	if (!dev->tx_pkt_prep)
> > > 		return nb_pkts;
> >
> > Yep.
> >
> > >
> > > 	# compile can't predict this function needs be executed or not so
> > > 	# pressure on register allocation and mostly likely it call for
> > > 	# some stack push and pop based load on outer function(as it is an
> > > 	# inline function)
> >
> >
> > Well, I suppose compiler wouldn't try to fill function argument registers before the branch above.
> 
> Not the case with arm gcc compiler(may be based outer function load).

Ok, so for my own curiosity (I am not very familiar with the ARM arch):
 gcc generates several conditional execution instructions in a row to spill/fill 
function arguments registers, and that comes at a price at execution time if condition is not met? 

> The recent, external pool manager function pointer conversion reduced around 700kpps/core in local cache mode(even though the
> function pointers are not executed)
> 
> >
> > >
> > > 	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts,
> > > nb_pkts);
> > >
> > > 	# 1 branch
> > > 	if (unlikely(nb_prep < nb_rx)) {
> > > 		# bunch of code not executed, but pressure on i cache
> > > 		int i;
> > > 		for (i = nb_prep; i < nb_rx; i++)
> > > 	                 rte_pktmbuf_free(pkts_burst[i]);
> > > 	}
> > >
> > > From a server target(IA or high-end armv8) with external PCIe based
> > > system makes sense to have RTE_ETHDEV_TX_PREP option enabled(which
> > > is the case in proposed patch) but the low end arm platforms with
> > > - limited cores
> > > - less i cache
> > > - IPC == 1
> > > - running around 1GHz
> > > - most importantly, _integrated_ nic controller with no external PCIE
> > >   support
> > > does not make much sense to waste the cycles/time for it.
> > > cycle saved is cycle earned.
> >
> > Ok, so it is all to save one memory de-refrence and a comparison plus branch.
> > Do you have aby estimation how badly it would hit low-end cpu performance?
> 
> around 400kpps/core. On four core systems, around 2 mpps.(4 core with
> 10G*2 ports)

So it is about ~7% for 2x10G, correct?
I agree that seems big enough to keep the config option,
even though I am not quite happy with introducing new config option. 
So no more objections from my side here.
Thanks 
Konstantin

> 
> > The reason I am asking: obviously I would prefer to avoid to introduce
> > new build config option (that's the common dpdk coding practice these days) unless it is really important.
> Practice is something we need to revisit based on the new use case/usage.
> I think, the scheme of non external pcie based NW cards is new to DPDK.
> 
> >
> > >
> > > Since DPDK compilation is based on _target_, I really don't see any
> > > issue with this approach nor it does not hurt anything on server target.
> > > So, IMO, It should be upto the target to decide what works better for the target.
> > >
> > > Jerin
> > >
> > > > From my point of view NOOP on the driver level is more than enough.
> > > > Again I would prefer to introduce new config option, if possible.
> > > >
> > > > Konstantin
> > > >

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add Tx preparation
  2016-09-23  9:41                   ` Ananyev, Konstantin
@ 2016-09-23 10:29                     ` Jerin Jacob
  0 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-09-23 10:29 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, dev

On Fri, Sep 23, 2016 at 09:41:52AM +0000, Ananyev, Konstantin wrote:
Hi Konstantin,
> Hi Jerin,
> 
> > 
> > Hi Konstantin,
> > 
> > >
> > > Hi Jerin,
> > >
> > > > >
> > > > > Hi Jerin,
> > > >
> > > > Hi Konstantin,
> > > >
> > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +#ifdef RTE_ETHDEV_TX_PREP
> > > > > > > >
> > > > > > > > Sorry for being a bit late on that discussion, but what the
> > > > > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all?
> > > > > > > > As I can see right now, if driver doesn't setup tx_pkt_prep,
> > > > > > > > then nb_pkts would be return anyway...
> > > > > > > >
> > > > > > > > BTW, there is my another question - should it be that way?
> > > > > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if
> > > > > > > > dev->tx_pk_prep == NULL?
> > > > > > > >
> > > > > > >
> > > > > > > It's an answer to the Jerin's request discussed here:
> > > > > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html
> > > > > > >
> > > > > > > When driver doesn't support tx_prep, default behavior is "we
> > > > > > > don't know requirements, so we have nothing to do here". It
> > > > > > > will simplify
> > > > > > application logic and improve performance for these drivers, I think. Catching this error with every burst may be problematic.
> > > > > > >
> > > > > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the
> > > > > > > same thread, I still don't think It's the best solution of the
> > > > > > > problem
> > > > > > described by him. I have added it here for further discussion.
> > > > > > >
> > > > > > > Jerin, have you something to add?
> > > > > >
> > > > > > Nothing very specific to add here. I think, I have tried to
> > > > > > share the rational in, http://dpdk.org/ml/archives/dev/2016-
> > > > > > September/046437.html
> > > > > >
> > > > >
> > > > > Ok, not sure I am fully understand your intention here.
> > > > > I think I understand why you propose rte_eth_tx_prep() to do:
> > > > > 	if (!dev->tx_pkt_prep)
> > > > > 		return nb_pkts;
> > > > >
> > > > > That allows drivers to NOOP the tx_prep functionality without
> > > > > paying the price for callback invocation.
> > > >
> > > > In true sense, returning the nb_pkts makes it functional NOP as
> > > > well(i.e The PMD does not have any limitation on Tx side, so all
> > > > packets are _good_(no preparation is required))
> > > >
> > > >
> > > > > What I don't understand, why with that in place we also need a
> > > > > NOOP for the whole rte_eth_tx_prep():
> > > > > +static inline uint16_t
> > > > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> > > > > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) {
> > > > > +	return nb_pkts;
> > > > > +}
> > > > > +
> > > > > +#endif
> > > > >
> > > > > What are you trying to save here: just reading ' dev->tx_pkt_prep'?
> > > > > If so, then it seems not that performance critical for me.
> > > > > Something else?
> > > >
> > > > The proposed scheme can make it as true NOOP from compiler
> > > > perspective too if a target decided to do that, I have checked the instruction generation with arm Assembly, a non true compiler
> > NOOP has following instructions overhead at minimum.
> > > >
> > > > 	# 1 load
> > > > 	# 1  mov
> > > > 	if (!dev->tx_pkt_prep)
> > > > 		return nb_pkts;
> > >
> > > Yep.
> > >
> > > >
> > > > 	# compile can't predict this function needs be executed or not so
> > > > 	# pressure on register allocation and mostly likely it call for
> > > > 	# some stack push and pop based load on outer function(as it is an
> > > > 	# inline function)
> > >
> > >
> > > Well, I suppose compiler wouldn't try to fill function argument registers before the branch above.
> > 
> > Not the case with arm gcc compiler(may be based outer function load).
> 
> Ok, so for my own curiosity (I am not very familiar with the ARM arch):
>  gcc generates several conditional execution instructions in a row to spill/fill 
> function arguments registers, and that comes at a price at execution time if condition is not met? 

Yes. That's what I see(at least for gcc 5.3 + arm64 back-end case) when I was debugging
external mempool function pointer performance regression issue.
The sad part is, I couldn't see any gcc option to override it.


> 
> > The recent, external pool manager function pointer conversion reduced around 700kpps/core in local cache mode(even though the
> > function pointers are not executed)
> > 
> > >
> > > >
> > > > 	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts,
> > > > nb_pkts);
> > > >
> > > > 	# 1 branch
> > > > 	if (unlikely(nb_prep < nb_rx)) {
> > > > 		# bunch of code not executed, but pressure on i cache
> > > > 		int i;
> > > > 		for (i = nb_prep; i < nb_rx; i++)
> > > > 	                 rte_pktmbuf_free(pkts_burst[i]);
> > > > 	}
> > > >
> > > > From a server target(IA or high-end armv8) with external PCIe based
> > > > system makes sense to have RTE_ETHDEV_TX_PREP option enabled(which
> > > > is the case in proposed patch) but the low end arm platforms with
> > > > - limited cores
> > > > - less i cache
> > > > - IPC == 1
> > > > - running around 1GHz
> > > > - most importantly, _integrated_ nic controller with no external PCIE
> > > >   support
> > > > does not make much sense to waste the cycles/time for it.
> > > > cycle saved is cycle earned.
> > >
> > > Ok, so it is all to save one memory de-refrence and a comparison plus branch.
> > > Do you have aby estimation how badly it would hit low-end cpu performance?
> > 
> > around 400kpps/core. On four core systems, around 2 mpps.(4 core with
> > 10G*2 ports)
> 
> So it is about ~7% for 2x10G, correct?
> I agree that seems big enough to keep the config option,
> even though I am not quite happy with introducing new config option. 
> So no more objections from my side here.

Thanks.

That's was for very low end cpus.
So even if it is for high-end cpu case, event if it calls for 100kpps drop/core,
The Cavium configuration like 96 cores + >200G case will at least 9.6mpps worth
of cycles drop.


> Thanks 
> Konstantin
> 
> > 
> > > The reason I am asking: obviously I would prefer to avoid to introduce
> > > new build config option (that's the common dpdk coding practice these days) unless it is really important.
> > Practice is something we need to revisit based on the new use case/usage.
> > I think, the scheme of non external pcie based NW cards is new to DPDK.
> > 
> > >
> > > >
> > > > Since DPDK compilation is based on _target_, I really don't see any
> > > > issue with this approach nor it does not hurt anything on server target.
> > > > So, IMO, It should be upto the target to decide what works better for the target.
> > > >
> > > > Jerin
> > > >
> > > > > From my point of view NOOP on the driver level is more than enough.
> > > > > Again I would prefer to introduce new config option, if possible.
> > > > >
> > > > > Konstantin
> > > > >

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 0/6] add Tx preparation
  2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
                     ` (5 preceding siblings ...)
  2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine Tomasz Kulasek
@ 2016-09-28 11:10   ` Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: " Tomasz Kulasek
                       ` (6 more replies)
  6 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   97 +++++++++++++++------------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 ++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    9 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   79 +++++++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |  100 +++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |   10 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 ++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   85 +++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   85 ++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |    8 +++
 lib/librte_net/Makefile          |    2 +-
 lib/librte_net/rte_pkt.h         |  133 ++++++++++++++++++++++++++++++++++++++
 21 files changed, 699 insertions(+), 51 deletions(-)
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] ethdev: add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-29 10:40       ` Ananyev, Konstantin
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 2/6] e1000: " Tomasz Kulasek
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this function is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   85 ++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    8 +++
 lib/librte_net/Makefile       |    2 +-
 lib/librte_net/rte_pkt.h      |  133 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 228 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index 7830535..7ada9e0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 96575e8..6594544 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1184,6 +1187,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1629,6 +1637,7 @@ enum rte_eth_dev_type {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2837,6 +2846,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
+		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 23b7bf8..8b73261 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -211,6 +211,14 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index ad2e482..b5abe84 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
 
 
 include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..72903ac
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,133 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG)
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (m->ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if (m->ol_flags & PKT_TX_IPV4) {
+		ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+				inner_l3_offset);
+
+		if (m->ol_flags & PKT_TX_IP_CKSUM)
+			ipv4_hdr->hdr_checksum = 0;
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		}
+	} else if (m->ol_flags & PKT_TX_IPV6) {
+		ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+				inner_l3_offset);
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] e1000: add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Tomasz Kulasek
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index c5bf294..46515d4 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1073,6 +1074,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..44009d6 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index f7cfa18..5baacd0 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index a47c640..d466888 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1362,6 +1411,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] fm10k: add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: " Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 2/6] e1000: " Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Tomasz Kulasek
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    9 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 +++
 drivers/net/fm10k/fm10k_rxtx.c   |   79 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..83d2bfb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,12 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
+uint16_t fm10k_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0ecc167..17c45e1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1441,6 +1441,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2749,8 +2751,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = fm10k_prep_pkts_simple;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2829,6 +2833,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..e4c531f 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +593,70 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* fm10k vector TX path doesn't support tx offloads */
+uint16_t
+fm10k_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] i40e: add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
                       ` (2 preceding siblings ...)
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Tomasz Kulasek
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |  100 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |   10 ++++
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index b04c833..c1ee7e6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -948,6 +948,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2614,6 +2615,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..5d1820c 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1930,6 +1943,89 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
+/* i40e simple path doesn't support tx offloads */
+uint16_t
+i40e_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+			rte_errno = -1;
+			return i;
+		}
+
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -1;
+			return i;
+		}
+	}
+
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3271,9 +3367,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = i40e_prep_pkts_simple;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..2ff7862 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,10 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+uint16_t i40e_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] ixgbe: add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
                       ` (3 preceding siblings ...)
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-29 11:09       ` Ananyev, Konstantin
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   85 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 73a406b..fa6f045 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..09d96de 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,12 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t ixgbe_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8b99282..2489db4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,83 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/* ixgbe simple path as well as vector TX doesn't support tx offloads */
+uint16_t
+ixgbe_prep_pkts_simple(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* simple tx path doesn't support multi-segments */
+		if (m->nb_segs != 1) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* For simple path (simple and vector) no tx offloads are supported */
+		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2280,6 +2361,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2300,6 +2382,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] testpmd: use Tx preparation in csum engine
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
                       ` (4 preceding siblings ...)
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Tomasz Kulasek
@ 2016-09-28 11:10     ` Tomasz Kulasek
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-28 11:10 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   97 ++++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 43 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 21cb78f..8fcf814 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -110,15 +110,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -368,11 +359,9 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
@@ -381,15 +370,11 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (info->tso_segsz != 0) {
+		if (info->tso_segsz != 0)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
@@ -639,7 +624,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
-	uint16_t i;
+	uint16_t nb_prep;
+	uint16_t i, n;
 	uint64_t ol_flags;
 	uint16_t testpmd_ol_flags;
 	uint32_t retry;
@@ -847,31 +833,56 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
-	/*
-	 * Retry if necessary
-	 */
-	if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) {
-		retry = 0;
-		while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
-			rte_delay_us(burst_tx_delay_time);
-			nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
-					&pkts_burst[nb_tx], nb_rx - nb_tx);
+
+	n = 0;
+
+	do {
+		nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, &pkts_burst[n],
+				nb_rx - n);
+
+		if (nb_prep != nb_rx - n) {
+			printf("Preparing packet burst to transmit failed: %s\n",
+					rte_strerror(rte_errno));
+			/* Drop malicious packet */
+			rte_pktmbuf_free(pkts_burst[n + nb_prep]);
+			fs->fwd_dropped++;
+		}
+
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, &pkts_burst[n],
+				nb_prep);
+
+		/*
+		 * Retry if necessary
+		 */
+		if (unlikely(nb_tx < nb_prep) && fs->retry_enabled) {
+			retry = 0;
+			while ((nb_tx < nb_prep) && (retry++ < burst_tx_retry_num)) {
+				rte_delay_us(burst_tx_delay_time);
+				nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
+						&pkts_burst[nb_tx + n], nb_prep - nb_tx);
+			}
 		}
-	}
-	fs->tx_packets += nb_tx;
-	fs->rx_bad_ip_csum += rx_bad_ip_csum;
-	fs->rx_bad_l4_csum += rx_bad_l4_csum;
+
+		fs->tx_packets += nb_tx;
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
-	fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
+		fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
 #endif
-	if (unlikely(nb_tx < nb_rx)) {
-		fs->fwd_dropped += (nb_rx - nb_tx);
-		do {
-			rte_pktmbuf_free(pkts_burst[nb_tx]);
-		} while (++nb_tx < nb_rx);
-	}
+		if (unlikely(nb_tx < nb_prep)) {
+			fs->fwd_dropped += (nb_prep - nb_tx);
+			do {
+				rte_pktmbuf_free(pkts_burst[nb_tx]);
+			} while (++nb_tx < nb_prep);
+		}
+
+		/* If tx_prep failed, skip malicious packet */
+		n += (nb_prep + 1);
+
+	} while (n < nb_rx);
+
+	fs->rx_bad_ip_csum += rx_bad_ip_csum;
+	fs->rx_bad_l4_csum += rx_bad_l4_csum;
+
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	end_tsc = rte_rdtsc();
 	core_cycles = (end_tsc - start_tsc);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add Tx preparation
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-29 10:40       ` Ananyev, Konstantin
  2016-09-29 13:04         ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-29 10:40 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

Hi Tomasz,

> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Wednesday, September 28, 2016 12:11 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Subject: [PATCH v3 1/6] ethdev: add Tx preparation
> 
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:
> 
> int rte_validate_tx_offload(struct rte_mbuf *m)
> 	to validate general requirements for tx offload in packet such a
> 	flag completness. In current implementation this function is called
> 	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> 
> int rte_phdr_cksum_fix(struct rte_mbuf *m)
> 	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> 	before hardware tx checksum offload.
> 	 - for non-TSO tcp/udp packets full pseudo-header checksum is
> 	   counted and set.
> 	 - for TSO the IP payload length is not included.
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  config/common_base            |    1 +
>  lib/librte_ether/rte_ethdev.h |   85 ++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h    |    8 +++
>  lib/librte_net/Makefile       |    2 +-
>  lib/librte_net/rte_pkt.h      |  133 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 228 insertions(+), 1 deletion(-)  create mode 100644 lib/librte_net/rte_pkt.h
> 

....

> diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h new file mode 100644 index 0000000..72903ac
> --- /dev/null
> +++ b/lib/librte_net/rte_pkt.h
> @@ -0,0 +1,133 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_PKT_H_
> +#define _RTE_PKT_H_
> +
> +#include <rte_ip.h>
> +#include <rte_udp.h>
> +#include <rte_tcp.h>
> +#include <rte_sctp.h>
> +
> +/**
> + * Validate general requirements for tx offload in packet.
> + */
> +static inline int
> +rte_validate_tx_offload(struct rte_mbuf *m) {
> +	uint64_t ol_flags = m->ol_flags;
> +
> +	/* Does packet set any of available offloads? */
> +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> +		return 0;
> +
> +	/* IP checksum can be counted only for IPv4 packet */
> +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> +		return -EINVAL;
> +
> +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))

Not sure what you are trying to test here?
Is that PKT_TX_TCP_SEG is set?
If so, then the test condition doesn't look correct to me.

> +		/* IP type not set */
> +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> +			return -EINVAL;
> +
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> +		if ((m->tso_segsz == 0) ||
> +				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
> +			return -EINVAL;
> +

Why not just:
If ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_SEG) {

         uint64_t f = ol_flags & PKT_TX_L4_MASK;

         if ((f  & (PKT_TX_IPV4 | PKT_TX_IPV6)) == 0 || f == PKT_TX_IPV4 || m->tso_segsz == 0)
		return -EINVAL;
}

Instead of 2 ifs around TCP_SEG above?

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: add Tx preparation
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Tomasz Kulasek
@ 2016-09-29 11:09       ` Ananyev, Konstantin
  2016-09-29 15:12         ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-29 11:09 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

Hi Tomasz,

> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
>  drivers/net/ixgbe/ixgbe_ethdev.h |    8 +++-
>  drivers/net/ixgbe/ixgbe_rxtx.c   |   85 +++++++++++++++++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
>  4 files changed, 96 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 73a406b..fa6f045 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
>  	.nb_max = IXGBE_MAX_RING_DESC,
>  	.nb_min = IXGBE_MIN_RING_DESC,
>  	.nb_align = IXGBE_TXD_ALIGN,
> +	.nb_seg_max = IXGBE_TX_MAX_SEG,
> +	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
>  };
> 
>  static const struct eth_dev_ops ixgbe_eth_dev_ops = { @@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
>  	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
>  	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
>  	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
> +	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
> 
>  	/*
>  	 * For secondary processes, we don't initialise any further as primary diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h
> b/drivers/net/ixgbe/ixgbe_ethdev.h
> index 4ff6338..09d96de 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -396,6 +396,12 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,  uint16_t
> ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		uint16_t nb_pkts);
> 
> +uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts);
> +
> +uint16_t ixgbe_prep_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts);
> +
>  int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
>  			      struct rte_eth_rss_conf *rss_conf);
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 8b99282..2489db4 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   Copyright 2014 6WIND S.A.
>   *   All rights reserved.
>   *
> @@ -70,6 +70,7 @@
>  #include <rte_string_fns.h>
>  #include <rte_errno.h>
>  #include <rte_ip.h>
> +#include <rte_pkt.h>
> 
>  #include "ixgbe_logs.h"
>  #include "base/ixgbe_api.h"
> @@ -87,6 +88,9 @@
>  		PKT_TX_TCP_SEG |		 \
>  		PKT_TX_OUTER_IP_CKSUM)
> 
> +#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
> +		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
> +
>  #if 1
>  #define RTE_PMD_USE_PREFETCH
>  #endif
> @@ -905,6 +909,83 @@ end_of_tx:
> 
>  /*********************************************************************
>   *
> + *  TX prep functions
> + *
> +
> +**********************************************************************/
> +uint16_t
> +ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> +nb_pkts) {
> +	int i, ret;
> +	struct rte_mbuf *m;
> +	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		m = tx_pkts[i];
> +
> +		/**
> +		 * Check if packet meets requirements for number of segments
> +		 *
> +		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
> +		 */
> +
> +		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +		if (m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +		ret = rte_validate_tx_offload(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +#endif
> +		ret = rte_phdr_cksum_fix(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}
> +
> +/* ixgbe simple path as well as vector TX doesn't support tx offloads
> +*/ uint16_t ixgbe_prep_pkts_simple(void *tx_queue __rte_unused, struct
> +rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	int i;
> +	struct rte_mbuf *m;
> +	uint64_t ol_flags;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		m = tx_pkts[i];
> +		ol_flags = m->ol_flags;
> +
> +		/* simple tx path doesn't support multi-segments */
> +		if (m->nb_segs != 1) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +		/* For simple path (simple and vector) no tx offloads are supported */
> +		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}

Just thought about it once again:
As now inside rte_eth_tx_prep() we do now:
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;

Then there might be a better approach to set 
dev->tx_pkt_prep = NULL
for simple and vector TX functions?

After all, prep_simple() does nothing but returns an error if conditions are not met.
And if simple TX was already selected, then that means that user deliberately disabled all
HW TX offloads in favor of faster TX and there is no point to slow him down with extra checks here.
Same for i40e and fm10k.
What is your opinion?

Konstantin

> +
> +/*********************************************************************
> + *
>   *  RX functions
>   *
>   **********************************************************************/
> @@ -2280,6 +2361,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
>  	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
>  			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
>  		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
> +		dev->tx_pkt_prep = ixgbe_prep_pkts_simple;
>  #ifdef RTE_IXGBE_INC_VECTOR
>  		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
>  				(rte_eal_process_type() != RTE_PROC_PRIMARY || @@ -2300,6 +2382,7 @@
> ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
>  				(unsigned long)txq->tx_rs_thresh,
>  				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
>  		dev->tx_pkt_burst = ixgbe_xmit_pkts;
> +		dev->tx_pkt_prep = ixgbe_prep_pkts;
>  	}
>  }
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 2608b36..7bbd9b8 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> @@ -80,6 +80,8 @@
>  #define RTE_IXGBE_WAIT_100_US               100
>  #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
> 
> +#define IXGBE_TX_MAX_SEG                    40
> +
>  #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
>  #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
>  #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add Tx preparation
  2016-09-29 10:40       ` Ananyev, Konstantin
@ 2016-09-29 13:04         ` Kulasek, TomaszX
  2016-09-29 13:57           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-09-29 13:04 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, September 29, 2016 12:41
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v3 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> ....
> 
> > diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h new file
> mode 100644 index 0000000..72903ac
> > --- /dev/null
> > +++ b/lib/librte_net/rte_pkt.h
> > @@ -0,0 +1,133 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_PKT_H_
> > +#define _RTE_PKT_H_
> > +
> > +#include <rte_ip.h>
> > +#include <rte_udp.h>
> > +#include <rte_tcp.h>
> > +#include <rte_sctp.h>
> > +
> > +/**
> > + * Validate general requirements for tx offload in packet.
> > + */
> > +static inline int
> > +rte_validate_tx_offload(struct rte_mbuf *m) {
> > +	uint64_t ol_flags = m->ol_flags;
> > +
> > +	/* Does packet set any of available offloads? */
> > +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> > +		return 0;
> > +
> > +	/* IP checksum can be counted only for IPv4 packet */
> > +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> > +		return -EINVAL;
> > +
> > +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> 
> Not sure what you are trying to test here?
> Is that PKT_TX_TCP_SEG is set?
> If so, then the test condition doesn't look correct to me.
> 
> > +		/* IP type not set */
> > +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> > +			return -EINVAL;
> > +

	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
		/* IP type not set */
		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
			return -EINVAL;

For L4 checksums (L4_MASK == TCP_CSUM|UDP_CSUM|SCTP_CSUM) as well as for TCP_SEG, the flag PKT_TX_IPV4 or PKT_TX_IPV6 must be set. L4_MASK doesn't include TCP_SEG bit, so I added it to have one condition for all these cases (check if IPV4/6 flag is set when required).

More detailed check, only for TCP_SEG is below (tso_segsz and IP_CSUM flag for IPV4):

> > +	if (ol_flags & PKT_TX_TCP_SEG)
> > +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> > +		if ((m->tso_segsz == 0) ||
> > +				((ol_flags & PKT_TX_IPV4) && !(ol_flags &
> PKT_TX_IP_CKSUM)))
> > +			return -EINVAL;
> > +
> 
> Why not just:
> If ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_SEG) {

PKT_TX_L4_MASK doesn't include PKT_TX_TCP_SEG, so it will always be false.

> 
>          uint64_t f = ol_flags & PKT_TX_L4_MASK;
> 
>          if ((f  & (PKT_TX_IPV4 | PKT_TX_IPV6)) == 0 || f == PKT_TX_IPV4 || m-
> >tso_segsz == 0)
> 		return -EINVAL;
> }
> 
> Instead of 2 ifs around TCP_SEG above?
> 
> Konstantin
> 

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add Tx preparation
  2016-09-29 13:04         ` Kulasek, TomaszX
@ 2016-09-29 13:57           ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-29 13:57 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev



> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Thursday, September 29, 2016 2:04 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v3 1/6] ethdev: add Tx preparation
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, September 29, 2016 12:41
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Subject: RE: [PATCH v3 1/6] ethdev: add Tx preparation
> >
> > Hi Tomasz,
> >
> > ....
> >
> > > diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h new file
> > mode 100644 index 0000000..72903ac
> > > --- /dev/null
> > > +++ b/lib/librte_net/rte_pkt.h
> > > @@ -0,0 +1,133 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> > > + *   All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + *     * Redistributions of source code must retain the above copyright
> > > + *       notice, this list of conditions and the following disclaimer.
> > > + *     * Redistributions in binary form must reproduce the above copyright
> > > + *       notice, this list of conditions and the following disclaimer in
> > > + *       the documentation and/or other materials provided with the
> > > + *       distribution.
> > > + *     * Neither the name of Intel Corporation nor the names of its
> > > + *       contributors may be used to endorse or promote products derived
> > > + *       from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> > NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > FITNESS FOR
> > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > COPYRIGHT
> > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> > BUT NOT
> > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> > LOSS OF USE,
> > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > AND ON ANY
> > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> > TORT
> > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> > OF THE USE
> > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > DAMAGE.
> > > + */
> > > +
> > > +#ifndef _RTE_PKT_H_
> > > +#define _RTE_PKT_H_
> > > +
> > > +#include <rte_ip.h>
> > > +#include <rte_udp.h>
> > > +#include <rte_tcp.h>
> > > +#include <rte_sctp.h>
> > > +
> > > +/**
> > > + * Validate general requirements for tx offload in packet.
> > > + */
> > > +static inline int
> > > +rte_validate_tx_offload(struct rte_mbuf *m) {
> > > +	uint64_t ol_flags = m->ol_flags;
> > > +
> > > +	/* Does packet set any of available offloads? */
> > > +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> > > +		return 0;
> > > +
> > > +	/* IP checksum can be counted only for IPv4 packet */
> > > +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> > > +		return -EINVAL;
> > > +
> > > +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> >
> > Not sure what you are trying to test here?
> > Is that PKT_TX_TCP_SEG is set?
> > If so, then the test condition doesn't look correct to me.
> >
> > > +		/* IP type not set */
> > > +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> > > +			return -EINVAL;
> > > +
> 
> 	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> 		/* IP type not set */
> 		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> 			return -EINVAL;
> 
> For L4 checksums (L4_MASK == TCP_CSUM|UDP_CSUM|SCTP_CSUM) as well as for TCP_SEG, the flag PKT_TX_IPV4 or PKT_TX_IPV6
> must be set. L4_MASK doesn't include TCP_SEG bit, so I added it to have one condition for all these cases (check if IPV4/6 flag is set
> when required).
> 
> More detailed check, only for TCP_SEG is below (tso_segsz and IP_CSUM flag for IPV4):
> 
> > > +	if (ol_flags & PKT_TX_TCP_SEG)
> > > +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> > > +		if ((m->tso_segsz == 0) ||
> > > +				((ol_flags & PKT_TX_IPV4) && !(ol_flags &
> > PKT_TX_IP_CKSUM)))
> > > +			return -EINVAL;
> > > +
> >
> > Why not just:
> > If ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_SEG) {
> 
> PKT_TX_L4_MASK doesn't include PKT_TX_TCP_SEG, so it will always be false.

Ah yes, your right.
By some reason I thought that it does.
Looks good to me then and sorry for the noise.
Konstantin

> 
> >
> >          uint64_t f = ol_flags & PKT_TX_L4_MASK;
> >
> >          if ((f  & (PKT_TX_IPV4 | PKT_TX_IPV6)) == 0 || f == PKT_TX_IPV4 || m-
> > >tso_segsz == 0)
> > 		return -EINVAL;
> > }
> >
> > Instead of 2 ifs around TCP_SEG above?
> >
> > Konstantin
> >
> 
> Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: add Tx preparation
  2016-09-29 11:09       ` Ananyev, Konstantin
@ 2016-09-29 15:12         ` Kulasek, TomaszX
  2016-09-29 17:01           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-09-29 15:12 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, September 29, 2016 13:09
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH v3 5/6] ixgbe: add Tx preparation
> 
> Hi Tomasz,
> 
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---

...

> > +*/
> > +uint16_t
> > +ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> > +nb_pkts) {
> > +	int i, ret;
> > +	struct rte_mbuf *m;
> > +	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		m = tx_pkts[i];
> > +
> > +		/**
> > +		 * Check if packet meets requirements for number of
> segments
> > +		 *
> > +		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO
> and non-TSO
> > +		 */
> > +
> > +		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +
> > +		if (m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +
> > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > +		ret = rte_validate_tx_offload(m);
> > +		if (ret != 0) {
> > +			rte_errno = ret;
> > +			return i;
> > +		}
> > +#endif
> > +		ret = rte_phdr_cksum_fix(m);
> > +		if (ret != 0) {
> > +			rte_errno = ret;
> > +			return i;
> > +		}
> > +	}
> > +
> > +	return i;
> > +}
> > +
> > +/* ixgbe simple path as well as vector TX doesn't support tx offloads
> > +*/ uint16_t ixgbe_prep_pkts_simple(void *tx_queue __rte_unused,
> > +struct rte_mbuf **tx_pkts,
> > +		uint16_t nb_pkts)
> > +{
> > +	int i;
> > +	struct rte_mbuf *m;
> > +	uint64_t ol_flags;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		m = tx_pkts[i];
> > +		ol_flags = m->ol_flags;
> > +
> > +		/* simple tx path doesn't support multi-segments */
> > +		if (m->nb_segs != 1) {
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +
> > +		/* For simple path (simple and vector) no tx offloads are
> supported */
> > +		if (ol_flags & PKT_TX_OFFLOAD_MASK) {
> > +			rte_errno = -EINVAL;
> > +			return i;
> > +		}
> > +	}
> > +
> > +	return i;
> > +}
> 
> Just thought about it once again:
> As now inside rte_eth_tx_prep() we do now:
> +
> +	if (!dev->tx_pkt_prep)
> +		return nb_pkts;
> 
> Then there might be a better approach to set
> dev->tx_pkt_prep = NULL
> for simple and vector TX functions?
> 
> After all, prep_simple() does nothing but returns an error if conditions are
> not met.
> And if simple TX was already selected, then that means that user deliberately
> disabled all HW TX offloads in favor of faster TX and there is no point to slow
> him down with extra checks here.
> Same for i40e and fm10k.
> What is your opinion?
> 
> Konstantin
> 

Yes, if performance is a key, and, while the limitations of vector/simple path are quite well documented, these additional checks are a bit overzealous. We may assume that to made tx offloads working, we need to configure driver in a right way, and this is a configuration issue if something doesn't work.

I will remove it.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/6] ixgbe: add Tx preparation
  2016-09-29 15:12         ` Kulasek, TomaszX
@ 2016-09-29 17:01           ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-29 17:01 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

I Tomasz,

> >
> > Just thought about it once again:
> > As now inside rte_eth_tx_prep() we do now:
> > +
> > +	if (!dev->tx_pkt_prep)
> > +		return nb_pkts;
> >
> > Then there might be a better approach to set
> > dev->tx_pkt_prep = NULL
> > for simple and vector TX functions?
> >
> > After all, prep_simple() does nothing but returns an error if
> > conditions are not met.
> > And if simple TX was already selected, then that means that user
> > deliberately disabled all HW TX offloads in favor of faster TX and
> > there is no point to slow him down with extra checks here.
> > Same for i40e and fm10k.
> > What is your opinion?
> >
> > Konstantin
> >
> 
> Yes, if performance is a key, and, while the limitations of vector/simple path are quite well documented, these additional checks are a
> bit overzealous. We may assume that to made tx offloads working, we need to configure driver in a right way, and this is a
> configuration issue if something doesn't work.
> 
> I will remove it.

Great, thanks.
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 0/6] add Tx preparation
  2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
                       ` (5 preceding siblings ...)
  2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-09-30  9:00     ` Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 1/6] ethdev: " Tomasz Kulasek
                         ` (7 more replies)
  6 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   97 +++++++++++++++------------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 ++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   72 ++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   56 +++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   85 ++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |    8 +++
 lib/librte_net/Makefile          |    2 +-
 lib/librte_net/rte_pkt.h         |  133 ++++++++++++++++++++++++++++++++++++++
 21 files changed, 605 insertions(+), 51 deletions(-)
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-10-10 14:08         ` Thomas Monjalon
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 2/6] e1000: " Tomasz Kulasek
                         ` (6 subsequent siblings)
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this function is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   85 ++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    8 +++
 lib/librte_net/Makefile       |    2 +-
 lib/librte_net/rte_pkt.h      |  133 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 228 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index 7830535..7ada9e0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 96575e8..6594544 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1184,6 +1187,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1629,6 +1637,7 @@ enum rte_eth_dev_type {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2837,6 +2846,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
+		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 23b7bf8..8b73261 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -211,6 +211,14 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index ad2e482..b5abe84 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
 
 
 include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..72903ac
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,133 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG)
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (m->ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if (m->ol_flags & PKT_TX_IPV4) {
+		ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+				inner_l3_offset);
+
+		if (m->ol_flags & PKT_TX_IP_CKSUM)
+			ipv4_hdr->hdr_checksum = 0;
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, m->ol_flags);
+		}
+	} else if (m->ol_flags & PKT_TX_IPV6) {
+		ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+				inner_l3_offset);
+
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		} else if ((m->ol_flags & PKT_TX_TCP_CKSUM) ||
+				(m->ol_flags & PKT_TX_TCP_SEG)) {
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, m->ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 2/6] e1000: add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 1/6] ethdev: " Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 3/6] fm10k: " Tomasz Kulasek
                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index c5bf294..46515d4 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1073,6 +1074,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..44009d6 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index f7cfa18..5baacd0 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index a47c640..d466888 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1362,6 +1411,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 3/6] fm10k: add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 1/6] ethdev: " Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 2/6] e1000: " Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 4/6] i40e: " Tomasz Kulasek
                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0ecc167..8dacba7 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1441,6 +1441,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2749,8 +2751,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2829,6 +2833,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..fa5bf9c 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +593,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
                         ` (2 preceding siblings ...)
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 3/6] fm10k: " Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-10-10 14:02         ` Wu, Jingjing
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 5/6] ixgbe: " Tomasz Kulasek
                         ` (3 subsequent siblings)
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index b04c833..c1ee7e6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -948,6 +948,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2614,6 +2615,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..bb69175 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1930,6 +1943,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(void *tx_queue __rte_unused, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3271,9 +3339,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..e7eb89c 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 5/6] ixgbe: add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
                         ` (3 preceding siblings ...)
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 4/6] i40e: " Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   56 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 73a406b..fa6f045 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8b99282..a0caa74 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,54 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2280,6 +2332,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2300,6 +2353,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v4 6/6] testpmd: use Tx preparation in csum engine
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
                         ` (4 preceding siblings ...)
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 5/6] ixgbe: " Tomasz Kulasek
@ 2016-09-30  9:00       ` Tomasz Kulasek
  2016-09-30  9:55       ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Ananyev, Konstantin
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-09-30  9:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, Tomasz Kulasek

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   97 ++++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 43 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 21cb78f..8fcf814 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -110,15 +110,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -368,11 +359,9 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
@@ -381,15 +370,11 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (info->tso_segsz != 0) {
+		if (info->tso_segsz != 0)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
@@ -639,7 +624,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
-	uint16_t i;
+	uint16_t nb_prep;
+	uint16_t i, n;
 	uint64_t ol_flags;
 	uint16_t testpmd_ol_flags;
 	uint32_t retry;
@@ -847,31 +833,56 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
-	/*
-	 * Retry if necessary
-	 */
-	if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) {
-		retry = 0;
-		while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) {
-			rte_delay_us(burst_tx_delay_time);
-			nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
-					&pkts_burst[nb_tx], nb_rx - nb_tx);
+
+	n = 0;
+
+	do {
+		nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, &pkts_burst[n],
+				nb_rx - n);
+
+		if (nb_prep != nb_rx - n) {
+			printf("Preparing packet burst to transmit failed: %s\n",
+					rte_strerror(rte_errno));
+			/* Drop malicious packet */
+			rte_pktmbuf_free(pkts_burst[n + nb_prep]);
+			fs->fwd_dropped++;
+		}
+
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, &pkts_burst[n],
+				nb_prep);
+
+		/*
+		 * Retry if necessary
+		 */
+		if (unlikely(nb_tx < nb_prep) && fs->retry_enabled) {
+			retry = 0;
+			while ((nb_tx < nb_prep) && (retry++ < burst_tx_retry_num)) {
+				rte_delay_us(burst_tx_delay_time);
+				nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue,
+						&pkts_burst[nb_tx + n], nb_prep - nb_tx);
+			}
 		}
-	}
-	fs->tx_packets += nb_tx;
-	fs->rx_bad_ip_csum += rx_bad_ip_csum;
-	fs->rx_bad_l4_csum += rx_bad_l4_csum;
+
+		fs->tx_packets += nb_tx;
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
-	fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
+		fs->tx_burst_stats.pkt_burst_spread[nb_tx]++;
 #endif
-	if (unlikely(nb_tx < nb_rx)) {
-		fs->fwd_dropped += (nb_rx - nb_tx);
-		do {
-			rte_pktmbuf_free(pkts_burst[nb_tx]);
-		} while (++nb_tx < nb_rx);
-	}
+		if (unlikely(nb_tx < nb_prep)) {
+			fs->fwd_dropped += (nb_prep - nb_tx);
+			do {
+				rte_pktmbuf_free(pkts_burst[nb_tx]);
+			} while (++nb_tx < nb_prep);
+		}
+
+		/* If tx_prep failed, skip malicious packet */
+		n += (nb_prep + 1);
+
+	} while (n < nb_rx);
+
+	fs->rx_bad_ip_csum += rx_bad_ip_csum;
+	fs->rx_bad_l4_csum += rx_bad_l4_csum;
+
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	end_tsc = rte_rdtsc();
 	core_cycles = (end_tsc - start_tsc);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/6] add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
                         ` (5 preceding siblings ...)
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-09-30  9:55       ` Ananyev, Konstantin
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-09-30  9:55 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev



> 
> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.
> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.
> 
> 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> 4) Fields in packet may require different initialization (like e.g. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.
> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.
> 
> 
> PROPOSAL:
> ---------
> 
> To help user to deal with all these varieties we propose to:
> 
> 1) Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 2) Also new fields will be introduced in rte_eth_desc_lim:
>    nb_seg_max and nb_mtu_seg_max, providing an information about max
>    segments in TSO and non-TSO packets acceptable by device.
> 
>    This information is useful for application to not create/limit
>    malicious packet.
> 
> 
> APPLICATION (CASE OF USE):
> --------------------------
> 
> 1) Application should to initialize burst of packets to send, set
>    required tx offload flags and required fields, like l2_len, l3_len,
>    l4_len, and tso_segsz
> 
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>    required to send packets through the NIC.
> 
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>    and/or restore invalid if function fails.
> 
> e.g.
> 
> 	for (i = 0; i < nb_pkts; i++) {
> 
> 		/* initialize or process packet */
> 
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
> 
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
> 
> 		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
> 		 * can be used on remaining packets to find another ones.
> 		 */
> 
> 	}
> 
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
> 
> 	/* Free any unsent packets. */
> 
> v4 changes:
>  - tx_prep is now set to default behavior (NULL) for simple/vector path
>    in fm10k, i40e and ixgbe drivers to increase performance, when
>    Tx offloads are not intentionally available
> 
> v3 changes:
>  - reworked csum testpmd engine instead adding new one,
>  - fixed checksum initialization procedure to include also outer
>    checksum offloads,
>  - some minor formattings and optimalizations
> 
> v2 changes:
>  - rte_eth_tx_prep() returns number of packets when device doesn't
>    support tx_prep functionality,
>  - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep
> 
> 
> Tomasz Kulasek (6):
>   ethdev: add Tx preparation
>   e1000: add Tx preparation
>   fm10k: add Tx preparation
>   i40e: add Tx preparation
>   ixgbe: add Tx preparation
>   testpmd: use Tx preparation in csum engine
> 
>  app/test-pmd/csumonly.c          |   97 +++++++++++++++------------
>  config/common_base               |    1 +
>  drivers/net/e1000/e1000_ethdev.h |   11 ++++
>  drivers/net/e1000/em_ethdev.c    |    5 +-
>  drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++-
>  drivers/net/e1000/igb_ethdev.c   |    4 ++
>  drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++-
>  drivers/net/fm10k/fm10k.h        |    6 ++
>  drivers/net/fm10k/fm10k_ethdev.c |    5 ++
>  drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++-
>  drivers/net/i40e/i40e_ethdev.c   |    3 +
>  drivers/net/i40e/i40e_rxtx.c     |   72 ++++++++++++++++++++-
>  drivers/net/i40e/i40e_rxtx.h     |    8 +++
>  drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c   |   56 +++++++++++++++-
>  drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
>  lib/librte_ether/rte_ethdev.h    |   85 ++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h       |    8 +++
>  lib/librte_net/Makefile          |    2 +-
>  lib/librte_net/rte_pkt.h         |  133 ++++++++++++++++++++++++++++++++++++++
>  21 files changed, 605 insertions(+), 51 deletions(-)  create mode 100644 lib/librte_net/rte_pkt.h
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 4/6] i40e: " Tomasz Kulasek
@ 2016-10-10 14:02         ` Wu, Jingjing
  2016-10-10 17:20           ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Wu, Jingjing @ 2016-10-10 14:02 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: Ananyev, Konstantin, Kulasek, TomaszX

>  #include "i40e_logs.h"
>  #include "base/i40e_prototype.h"
> @@ -79,6 +81,17 @@
>  		PKT_TX_TCP_SEG |		 \
>  		PKT_TX_OUTER_IP_CKSUM)
> 
> +#define I40E_TX_OFFLOAD_MASK (  \
> +		PKT_TX_IP_CKSUM |       \
> +		PKT_TX_L4_MASK |        \
> +		PKT_TX_OUTER_IP_CKSUM | \
> +		PKT_TX_TCP_SEG |        \
> +		PKT_TX_QINQ_PKT |       \
> +		PKT_TX_VLAN_PKT)
> +
More TX flags are added for tunneling as below.
/**
 * Bits 45:48 used for the tunnel type.
 * When doing Tx offload like TSO or checksum, the HW needs to configure the
 * tunnel type into the HW descriptors.
 */
#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
/* add new TX TUNNEL type here */
#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)

Please check:
commit 63c0d74daaa9a807fbca8a3e363bbe41d6fb715f
Author: Jianfeng Tan <jianfeng.tan@intel.com>
Date:   Mon Aug 1 03:56:53 2016 +0000

    mbuf: add Tx side tunneling type


Thanks
Jingjing

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation
  2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-10 14:08         ` Thomas Monjalon
  2016-10-13  7:08           ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-10 14:08 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev

Hi,

Now that the feature seems to meet a consensus, I've looked at it more
closely before integrating. Sorry if it appears like a late review.

2016-09-30 11:00, Tomasz Kulasek:
> Added API for `rte_eth_tx_prep`

I would love to read the usability and performance considerations here.
No need for something as long as the cover letter. Just few lines
about why it is needed and why it is a good choice for performance.

> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */
[...]
> +#else
> +
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)

Doxygen is failing here.
Have you tried to move __rte_unused before the identifier?

[...]
> +#define PKT_TX_OFFLOAD_MASK (    \
> +		PKT_TX_IP_CKSUM |        \
> +		PKT_TX_L4_MASK |         \
> +		PKT_TX_OUTER_IP_CKSUM |  \
> +		PKT_TX_TCP_SEG |         \
> +		PKT_TX_QINQ_PKT |        \
> +		PKT_TX_VLAN_PKT)

We should really stop adding some public constants without the proper
RTE prefix.
And by the way, should not we move such flags into rte_net?

[...]
> -SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h

You can use the += operator on a new line for free :)

No more comments, the rest seems OK. Thanks

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation
  2016-10-10 14:02         ` Wu, Jingjing
@ 2016-10-10 17:20           ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-10 17:20 UTC (permalink / raw)
  To: Wu, Jingjing, dev; +Cc: Ananyev, Konstantin

Hi Jingjing,

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Monday, October 10, 2016 16:03
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation
> 
> >  #include "i40e_logs.h"
> >  #include "base/i40e_prototype.h"
> > @@ -79,6 +81,17 @@
> >  		PKT_TX_TCP_SEG |		 \
> >  		PKT_TX_OUTER_IP_CKSUM)
> >
> > +#define I40E_TX_OFFLOAD_MASK (  \
> > +		PKT_TX_IP_CKSUM |       \
> > +		PKT_TX_L4_MASK |        \
> > +		PKT_TX_OUTER_IP_CKSUM | \
> > +		PKT_TX_TCP_SEG |        \
> > +		PKT_TX_QINQ_PKT |       \
> > +		PKT_TX_VLAN_PKT)
> > +
> More TX flags are added for tunneling as below.
> /**
>  * Bits 45:48 used for the tunnel type.
>  * When doing Tx offload like TSO or checksum, the HW needs to configure
> the
>  * tunnel type into the HW descriptors.
>  */
> #define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
> #define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
> #define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
> #define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
> /* add new TX TUNNEL type here */
> #define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
> 
> Please check:
> commit 63c0d74daaa9a807fbca8a3e363bbe41d6fb715f
> Author: Jianfeng Tan <jianfeng.tan@intel.com>
> Date:   Mon Aug 1 03:56:53 2016 +0000
> 
>     mbuf: add Tx side tunneling type
> 
> 
> Thanks
> Jingjing

Thanks for spotting this, I will send updated v5.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation
  2016-10-10 14:08         ` Thomas Monjalon
@ 2016-10-13  7:08           ` Thomas Monjalon
  2016-10-13 10:47             ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-13  7:08 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev

Hi Tomasz,

Any news?
Sorry to speed up, we are very very late for RC1.

2016-10-10 16:08, Thomas Monjalon:
> Hi,
> 
> Now that the feature seems to meet a consensus, I've looked at it more
> closely before integrating. Sorry if it appears like a late review.
> 
> 2016-09-30 11:00, Tomasz Kulasek:
> > Added API for `rte_eth_tx_prep`
> 
> I would love to read the usability and performance considerations here.
> No need for something as long as the cover letter. Just few lines
> about why it is needed and why it is a good choice for performance.
> 
> > uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> > 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > 
> > Added fields to the `struct rte_eth_desc_lim`:
> > 
> > 	uint16_t nb_seg_max;
> > 		/**< Max number of segments per whole packet. */
> > 
> > 	uint16_t nb_mtu_seg_max;
> > 		/**< Max number of segments per one MTU */
> [...]
> > +#else
> > +
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> 
> Doxygen is failing here.
> Have you tried to move __rte_unused before the identifier?
> 
> [...]
> > +#define PKT_TX_OFFLOAD_MASK (    \
> > +		PKT_TX_IP_CKSUM |        \
> > +		PKT_TX_L4_MASK |         \
> > +		PKT_TX_OUTER_IP_CKSUM |  \
> > +		PKT_TX_TCP_SEG |         \
> > +		PKT_TX_QINQ_PKT |        \
> > +		PKT_TX_VLAN_PKT)
> 
> We should really stop adding some public constants without the proper
> RTE prefix.
> And by the way, should not we move such flags into rte_net?

Do not worry with this comment. It was just a thought which could be
addressed in a separate patch by someone else.

> [...]
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
> 
> You can use the += operator on a new line for free :)
> 
> No more comments, the rest seems OK. Thanks

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation
  2016-10-13  7:08           ` Thomas Monjalon
@ 2016-10-13 10:47             ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-13 10:47 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, October 13, 2016 09:09
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> Any news?
> Sorry to speed up, we are very very late for RC1.
> 
> 2016-10-10 16:08, Thomas Monjalon:
> > Hi,
> >
> > Now that the feature seems to meet a consensus, I've looked at it more
> > closely before integrating. Sorry if it appears like a late review.
> >
> > 2016-09-30 11:00, Tomasz Kulasek:
> > > Added API for `rte_eth_tx_prep`
> >
> > I would love to read the usability and performance considerations here.
> > No need for something as long as the cover letter. Just few lines
> > about why it is needed and why it is a good choice for performance.
> >
> > > uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> > > 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > >
> > > Added fields to the `struct rte_eth_desc_lim`:
> > >
> > > 	uint16_t nb_seg_max;
> > > 		/**< Max number of segments per whole packet. */
> > >
> > > 	uint16_t nb_mtu_seg_max;
> > > 		/**< Max number of segments per one MTU */
> > [...]
> > > +#else
> > > +
> > > +static inline uint16_t
> > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id
> __rte_unused,
> > > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> >
> > Doxygen is failing here.
> > Have you tried to move __rte_unused before the identifier?
> >
> > [...]
> > > +#define PKT_TX_OFFLOAD_MASK (    \
> > > +		PKT_TX_IP_CKSUM |        \
> > > +		PKT_TX_L4_MASK |         \
> > > +		PKT_TX_OUTER_IP_CKSUM |  \
> > > +		PKT_TX_TCP_SEG |         \
> > > +		PKT_TX_QINQ_PKT |        \
> > > +		PKT_TX_VLAN_PKT)
> >
> > We should really stop adding some public constants without the proper
> > RTE prefix.
> > And by the way, should not we move such flags into rte_net?
> 
> Do not worry with this comment. It was just a thought which could be
> addressed in a separate patch by someone else.

I've used this name convention to be consistent with other offload flag names, and this is the only reason. The place of these flags was chosen for easier management (flags and mask in one place).

I will leave it as is.

> 
> > [...]
> > > -SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h
> > > rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h
> > > +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h
> > > +rte_udp.h rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h
> >
> > You can use the += operator on a new line for free :)
> >
> > No more comments, the rest seems OK. Thanks
> 

Some additional work was needed due to the new tx tunnel type flags and changes in csum engine.

I will send v5 today.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 0/6] add Tx preparation
  2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
                         ` (6 preceding siblings ...)
  2016-09-30  9:55       ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Ananyev, Konstantin
@ 2016-10-13 17:36       ` Tomasz Kulasek
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 1/6] ethdev: " Tomasz Kulasek
                           ` (6 more replies)
  7 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  csum: fixup

 app/test-pmd/csumonly.c          |   36 ++++------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   85 +++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |    9 +++
 lib/librte_net/Makefile          |    3 +-
 lib/librte_net/rte_pkt.h         |  139 ++++++++++++++++++++++++++++++++++++++
 21 files changed, 574 insertions(+), 31 deletions(-)
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
@ 2016-10-13 17:36         ` Tomasz Kulasek
  2016-10-13 19:21           ` Thomas Monjalon
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 2/6] e1000: " Tomasz Kulasek
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this function is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   85 +++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    9 +++
 lib/librte_net/Makefile       |    3 +-
 lib/librte_net/rte_pkt.h      |  139 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 236 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index f5d2eff..0af6481 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0a32ebb..db5dc93 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
+		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7541070..bc8fd87 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -224,6 +224,15 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index e5758ce..cc69bc0 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -44,6 +44,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_pkt.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mempool
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_mbuf
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..8f53c0b
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,139 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG)
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if (ol_flags & PKT_TX_IPV4) {
+		if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			/* non-TSO udp */
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
+		} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+				(ol_flags & PKT_TX_TCP_SEG)) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
+		}
+	} else if (m->ol_flags & PKT_TX_IPV6) {
+		if ((m->ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
+		} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+				(ol_flags & PKT_TX_TCP_SEG)) {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 2/6] e1000: add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-13 17:36         ` Tomasz Kulasek
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 3/6] fm10k: " Tomasz Kulasek
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index f767e1c..6d2c5fc 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..3af2f69 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 5a1a83e..2c861b4 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index e8f9933..ca9399f 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1362,6 +1411,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 3/6] fm10k: add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 1/6] ethdev: " Tomasz Kulasek
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 2/6] e1000: " Tomasz Kulasek
@ 2016-10-13 17:36         ` Tomasz Kulasek
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 4/6] i40e: " Tomasz Kulasek
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 372564b..b9060c7 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 5b2d04b..76b5705 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -583,3 +593,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 4/6] i40e: add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
                           ` (2 preceding siblings ...)
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-13 17:37         ` Tomasz Kulasek
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 5/6] ixgbe: " Tomasz Kulasek
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:37 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d0640b9..c6d61c0 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 2cb2e30..499217d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1968,6 +1981,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -3318,9 +3386,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..e7eb89c 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 5/6] ixgbe: add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
                           ` (3 preceding siblings ...)
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 4/6] i40e: " Tomasz Kulasek
@ 2016-10-13 17:37         ` Tomasz Kulasek
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:37 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 6b3d4fa..1d740f0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -515,6 +515,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1101,6 +1103,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 8b99282..dd4f5ba 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2280,6 +2334,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2300,6 +2355,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v5 6/6] testpmd: use Tx preparation in csum engine
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
                           ` (4 preceding siblings ...)
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-13 17:37         ` Tomasz Kulasek
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-13 17:37 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index f9e65b6..3354b3d 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -372,32 +363,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -650,6 +633,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -855,7 +839,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
  2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-13 19:21           ` Thomas Monjalon
  2016-10-14 14:02             ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-13 19:21 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev

Hi,

2016-10-13 19:36, Tomasz Kulasek:
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:

Same comment as in previous revision:
this description lacks the usability and performance considerations.

> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)

Doxygen still do not parse it well (same issue as previous revision).

> +/**
> + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
> + * hardware tx checksum.
> + * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
> + * For TSO the IP payload length is not included.
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)

You probably don't need this function since the recent improvements from Olivier.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
  2016-10-13 19:21           ` Thomas Monjalon
@ 2016-10-14 14:02             ` Kulasek, TomaszX
  2016-10-14 14:20               ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-14 14:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, October 13, 2016 21:21
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [PATCH v5 1/6] ethdev: add Tx preparation
> 
> Hi,
> 
> 2016-10-13 19:36, Tomasz Kulasek:
> > Added API for `rte_eth_tx_prep`
> >
> > uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> > 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> >
> > Added fields to the `struct rte_eth_desc_lim`:
> >
> > 	uint16_t nb_seg_max;
> > 		/**< Max number of segments per whole packet. */
> >
> > 	uint16_t nb_mtu_seg_max;
> > 		/**< Max number of segments per one MTU */
> >
> > Created `rte_pkt.h` header with common used functions:
> 
> Same comment as in previous revision:
> this description lacks the usability and performance considerations.
> 
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id
> __rte_unused,
> > +		struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)
> 
> Doxygen still do not parse it well (same issue as previous revision).
> 
> > +/**
> > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> before
> > + * hardware tx checksum.
> > + * For non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set.
> > + * For TSO the IP payload length is not included.
> > + */
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> 
> You probably don't need this function since the recent improvements from
> Olivier.

Do you mean this improvement: "net: add function to calculate a checksum in a mbuf"
http://dpdk.org/dev/patchwork/patch/16542/

I see only full raw checksum computation on mbuf in Olivier patches, while this function counts only pseudo-header checksum to be used with tx offload.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
  2016-10-14 14:02             ` Kulasek, TomaszX
@ 2016-10-14 14:20               ` Thomas Monjalon
  2016-10-17 16:25                 ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-14 14:20 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev, Ananyev, Konstantin, olivier.matz

2016-10-14 14:02, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-10-13 19:36, Tomasz Kulasek:
> > > +/**
> > > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> > before
> > > + * hardware tx checksum.
> > > + * For non-TSO tcp/udp packets full pseudo-header checksum is counted
> > and set.
> > > + * For TSO the IP payload length is not included.
> > > + */
> > > +static inline int
> > > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> > 
> > You probably don't need this function since the recent improvements from
> > Olivier.
> 
> Do you mean this improvement: "net: add function to calculate a checksum in a mbuf"
> http://dpdk.org/dev/patchwork/patch/16542/
> 
> I see only full raw checksum computation on mbuf in Olivier patches, while this function counts only pseudo-header checksum to be used with tx offload.

OK. Please check what exists already in librte_net (especially rte_ip.h)
and try to re-use code if possible. Thanks

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 0/6] add Tx preparation
  2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
                           ` (5 preceding siblings ...)
  2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-14 15:05         ` Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 1/6] ethdev: " Tomasz Kulasek
                             ` (7 more replies)
  6 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 ++++------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   85 +++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |    9 +++
 lib/librte_net/Makefile          |    3 +-
 lib/librte_net/rte_pkt.h         |  137 ++++++++++++++++++++++++++++++++++++++
 21 files changed, 572 insertions(+), 31 deletions(-)
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-18 14:57             ` Olivier Matz
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 2/6] e1000: " Tomasz Kulasek
                             ` (6 subsequent siblings)
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Created `rte_pkt.h` header with common used functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload in packet such a
	flag completness. In current implementation this function is called
	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.


PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)


Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   85 +++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |    9 +++
 lib/librte_net/Makefile       |    3 +-
 lib/librte_net/rte_pkt.h      |  137 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 234 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_net/rte_pkt.h

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..a10ed9c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..cfd6284 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -276,6 +276,15 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV4   (1ULL << 59)
 
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 /**
  * Packet outer header is IPv6. This flag must be set when using any
  * outer offload feature (L4 checksum) to tell the NIC that the outer
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index e5758ce..cc69bc0 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -44,6 +44,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_pkt.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mempool
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_mbuf
diff --git a/lib/librte_net/rte_pkt.h b/lib/librte_net/rte_pkt.h
new file mode 100644
index 0000000..c4bd7b2
--- /dev/null
+++ b/lib/librte_net/rte_pkt.h
@@ -0,0 +1,137 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PKT_H_
+#define _RTE_PKT_H_
+
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
+/**
+ * Validate general requirements for tx offload in packet.
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		/* IP type not set */
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	if (ol_flags & PKT_TX_TCP_SEG)
+		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
+ * hardware tx checksum.
+ * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
+ * For TSO the IP payload length is not included.
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
+		}
+	}
+	return 0;
+}
+
+#endif /* _RTE_PKT_H_ */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 2/6] e1000: add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 3/6] fm10k: " Tomasz Kulasek
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..3af2f69 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..786902d 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_pkt.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 3/6] fm10k: add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 1/6] ethdev: " Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 2/6] e1000: " Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 4/6] i40e: " Tomasz Kulasek
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..7ca28c0 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_pkt.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 4/6] i40e: add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
                             ` (2 preceding siblings ...)
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 5/6] ixgbe: " Tomasz Kulasek
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..3e2c428 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -1;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 5/6] ixgbe: add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
                             ` (3 preceding siblings ...)
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 4/6] i40e: " Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..83db18f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_pkt.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v6 6/6] testpmd: use Tx preparation in csum engine
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
                             ` (4 preceding siblings ...)
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-14 15:05           ` Tomasz Kulasek
  2016-10-18 12:28           ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Ananyev, Konstantin
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-14 15:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, thomas.monjalon

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
  2016-10-14 14:20               ` Thomas Monjalon
@ 2016-10-17 16:25                 ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-17 16:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin, olivier.matz

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, October 14, 2016 16:20
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v5 1/6] ethdev: add Tx preparation
> 
> 2016-10-14 14:02, Kulasek, TomaszX:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-10-13 19:36, Tomasz Kulasek:
> > > > +/**
> > > > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> > > before
> > > > + * hardware tx checksum.
> > > > + * For non-TSO tcp/udp packets full pseudo-header checksum is
> > > > + counted
> > > and set.
> > > > + * For TSO the IP payload length is not included.
> > > > + */
> > > > +static inline int
> > > > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> > >
> > > You probably don't need this function since the recent improvements
> > > from Olivier.
> >
> > Do you mean this improvement: "net: add function to calculate a checksum
> in a mbuf"
> > http://dpdk.org/dev/patchwork/patch/16542/
> >
> > I see only full raw checksum computation on mbuf in Olivier patches,
> while this function counts only pseudo-header checksum to be used with tx
> offload.
> 
> OK. Please check what exists already in librte_net (especially rte_ip.h)
> and try to re-use code if possible. Thanks

I have already sent v6 with requested changes in Friday. There's no equivalent of rte_phdr_cksum_fix in librte_net. This function already uses rte_ipv4_phdr_cksum and rte_ipv6_phdr_cksum and there's nothing similar on the higher level to simplify it even more.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/6] add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
                             ` (5 preceding siblings ...)
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-18 12:28           ` Ananyev, Konstantin
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-18 12:28 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: thomas.monjalon



> 
> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.
> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.
> 
> 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> 4) Fields in packet may require different initialization (like e.g. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.
> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.
> 
> 
> PROPOSAL:
> ---------
> 
> To help user to deal with all these varieties we propose to:
> 
> 1) Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 2) Also new fields will be introduced in rte_eth_desc_lim:
>    nb_seg_max and nb_mtu_seg_max, providing an information about max
>    segments in TSO and non-TSO packets acceptable by device.
> 
>    This information is useful for application to not create/limit
>    malicious packet.
> 
> 
> APPLICATION (CASE OF USE):
> --------------------------
> 
> 1) Application should to initialize burst of packets to send, set
>    required tx offload flags and required fields, like l2_len, l3_len,
>    l4_len, and tso_segsz
> 
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>    required to send packets through the NIC.
> 
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>    and/or restore invalid if function fails.
> 
> e.g.
> 
> 	for (i = 0; i < nb_pkts; i++) {
> 
> 		/* initialize or process packet */
> 
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
> 
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
> 
> 		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
> 		 * can be used on remaining packets to find another ones.
> 		 */
> 
> 	}
> 
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
> 
> 	/* Free any unsent packets. */
> 
> 
> v5 changes:
>  - rebased csum engine modification
>  - added information to the csum engine about performance tests
>  - some performance improvements
> 
> v4 changes:
>  - tx_prep is now set to default behavior (NULL) for simple/vector path
>    in fm10k, i40e and ixgbe drivers to increase performance, when
>    Tx offloads are not intentionally available
> 
> v3 changes:
>  - reworked csum testpmd engine instead adding new one,
>  - fixed checksum initialization procedure to include also outer
>    checksum offloads,
>  - some minor formattings and optimalizations
> 
> v2 changes:
>  - rte_eth_tx_prep() returns number of packets when device doesn't
>    support tx_prep functionality,
>  - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep
> 
> 
> Tomasz Kulasek (6):
>   ethdev: add Tx preparation
>   e1000: add Tx preparation
>   fm10k: add Tx preparation
>   i40e: add Tx preparation
>   ixgbe: add Tx preparation
>   testpmd: use Tx preparation in csum engine
> 
>  app/test-pmd/csumonly.c          |   36 ++++------
>  config/common_base               |    1 +
>  drivers/net/e1000/e1000_ethdev.h |   11 +++
>  drivers/net/e1000/em_ethdev.c    |    5 +-
>  drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++-
>  drivers/net/e1000/igb_ethdev.c   |    4 ++
>  drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++-
>  drivers/net/fm10k/fm10k.h        |    6 ++
>  drivers/net/fm10k/fm10k_ethdev.c |    5 ++
>  drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++-
>  drivers/net/i40e/i40e_ethdev.c   |    3 +
>  drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++-
>  drivers/net/i40e/i40e_rxtx.h     |    8 +++
>  drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++-
>  drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
>  lib/librte_ether/rte_ethdev.h    |   85 +++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h       |    9 +++
>  lib/librte_net/Makefile          |    3 +-
>  lib/librte_net/rte_pkt.h         |  137 ++++++++++++++++++++++++++++++++++++++
>  21 files changed, 572 insertions(+), 31 deletions(-)
>  create mode 100644 lib/librte_net/rte_pkt.h
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation
  2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-18 14:57             ` Olivier Matz
  2016-10-19 15:42               ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Olivier Matz @ 2016-10-18 14:57 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: konstantin.ananyev, thomas.monjalon

Hi Tomasz,

I think the principle of tx_prep() is good, it may for instance help to
remove the function virtio_tso_fix_cksum() from the virtio, and maybe
even change the mbuf TSO/cksum API.

I have some questions/comments below, I'm sorry it comes very late.

On 10/14/2016 05:05 PM, Tomasz Kulasek wrote:
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */

Not sure I understand the second one. Is this is case of TSO?

Is it a usual limitation in different network hardware?
Can this info be retrieved/used by the application?

> 
> Created `rte_pkt.h` header with common used functions:
> 
> int rte_validate_tx_offload(struct rte_mbuf *m)
> 	to validate general requirements for tx offload in packet such a
> 	flag completness. In current implementation this function is called
> 	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> 
> int rte_phdr_cksum_fix(struct rte_mbuf *m)
> 	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> 	before hardware tx checksum offload.
> 	 - for non-TSO tcp/udp packets full pseudo-header checksum is
> 	   counted and set.
> 	 - for TSO the IP payload length is not included.

Why not in rte_net.h?


> [...]
>  
> @@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
>  }
>  
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet device.
> + *
> + * The rte_eth_tx_prep() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prep() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.

Do you mean hardware requirements?
Can the application be aware of these requirements? I mean capability
flags, or something in dev_infos?

Maybe the comment could be more precise?

> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.

What kind of additional requirements?

> + *
> + * - Update and/or reset required checksums when tx offload is set for packet.
> + *

By reading this, I think it may not be clear for the user about what
should be set in the mbuf. In mbuf API, it is said:

 * TCP segmentation offload. To enable this offload feature for a
 * packet to be transmitted on hardware supporting TSO:
 *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
 *    PKT_TX_TCP_CKSUM)
 *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
 *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
 *    to 0 in the packet
 *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
 *  - calculate the pseudo header checksum without taking ip_len in account,
 *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
 *    rte_ipv6_phdr_cksum() that can be used as helpers.


If I understand well, using tx_prep(), the user will have to do the
same except writing the IP checksum to 0, and without setting the
TCP pseudo header checksum, right?


> + * The rte_eth_tx_prep() function returns the number of packets ready to be
> + * sent. A return value equal to *nb_pkts* means that all packets are valid and
> + * ready to be sent.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_pkts
> + *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
> + *   which contain the output packets.
> + * @param nb_pkts
> + *   The maximum number of packets to process.
> + * @return
> + *   The number of packets correct and ready to be sent. The return value can be
> + *   less than the value of the *tx_pkts* parameter when some packet doesn't
> + *   meet devices requirements with rte_errno set appropriately.
> + */

Can we add the constraint that invalid packets are left untouched?

I think most of the time there will be a software fallback in that
case, so it would be good to ensure that this function does not change
the flags or the packet data.

Another thing that could be interesting for the caller is to know the
reason of the failure. Maybe the different errno types could be detailed
here. For instance:
- EINVAL: offload flags are not correctly set (i.e. would fail whatever
  the hardware)
- ENOTSUP: the offload feature is not supported by the hardware
- ...

> +
> +#ifdef RTE_ETHDEV_TX_PREP
> +
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	if (!dev->tx_pkt_prep)
> +		return nb_pkts;
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +	if (queue_id >= dev->data->nb_tx_queues) {
> +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> +		rte_errno = -EINVAL;
> +		return 0;
> +	}
> +#endif

Why checking the queue_id but not the port_id?

Maybe the API comment should also be modified to say that the port_id
has to be valid, because most ethdev functions do the check.

> +
> +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> +			tx_pkts, nb_pkts);
> +}
> +
> +#else
> +
> +static inline uint16_t
> +rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
> +		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> +{
> +	return nb_pkts;
> +}
> +
> +#endif
> +

nit: I wonder if the #else part should be inside the function instead
(with the proper RTE_SET_USED()), it would avoid to define the prototype
twice.

>  typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
>  		void *userdata);
>  
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 109e666..cfd6284 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -276,6 +276,15 @@ extern "C" {
>   */
>  #define PKT_TX_OUTER_IPV4   (1ULL << 59)
>  
> +#define PKT_TX_OFFLOAD_MASK (    \
> +		PKT_TX_IP_CKSUM |        \
> +		PKT_TX_L4_MASK |         \
> +		PKT_TX_OUTER_IP_CKSUM |  \
> +		PKT_TX_TCP_SEG |         \
> +		PKT_TX_QINQ_PKT |        \
> +		PKT_TX_VLAN_PKT |        \
> +		PKT_TX_TUNNEL_MASK)
> +

Could you add an API comment?

> --- /dev/null
> +++ b/lib/librte_net/rte_pkt.h
> 
> [...]
> +/**
> + * Validate general requirements for tx offload in packet.
> + */

The API comment does not have the usual format.

> +static inline int
> +rte_validate_tx_offload(struct rte_mbuf *m)

should be const struct rte_mbuf *m

> +{
> +	uint64_t ol_flags = m->ol_flags;
> +
> +	/* Does packet set any of available offloads? */
> +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> +		return 0;
> +
> +	/* IP checksum can be counted only for IPv4 packet */
> +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> +		return -EINVAL;
> +
> +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> +		/* IP type not set */
> +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> +			return -EINVAL;
> +
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> +		if ((m->tso_segsz == 0) ||
> +				((ol_flags & PKT_TX_IPV4) && !(ol_flags & PKT_TX_IP_CKSUM)))
> +			return -EINVAL;
> +
> +	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
> +	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags & PKT_TX_OUTER_IPV4))
> +		return -EINVAL;
> +
> +	return 0;
> +}

It looks this function is only used when RTE_LIBRTE_ETHDEV_DEBUG is
set.

I'd say this function should go in rte_mbuf.h, because it's purely
related to mbuf flags, and does not rely on packet data or network
headers.


> +
> +/**
> + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets before
> + * hardware tx checksum.
> + * For non-TSO tcp/udp packets full pseudo-header checksum is counted and set.
> + * For TSO the IP payload length is not included.
> + */

The API comment should be fixed.

> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct udp_hdr *udp_hdr;
> +	uint64_t ol_flags = m->ol_flags;
> +	uint64_t inner_l3_offset = m->l2_len;
> +
> +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> +
> +	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> +		if (ol_flags & PKT_TX_IPV4) {
> +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> +					inner_l3_offset);
> +
> +			if (ol_flags & PKT_TX_IP_CKSUM)
> +				ipv4_hdr->hdr_checksum = 0;
> +
> +			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
> +		} else {
> +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> +					inner_l3_offset);
> +			/* non-TSO udp */
> +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
> +					inner_l3_offset + m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
> +		}
> +	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
> +			(ol_flags & PKT_TX_TCP_SEG)) {
> +		if (ol_flags & PKT_TX_IPV4) {
> +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> +					inner_l3_offset);
> +
> +			if (ol_flags & PKT_TX_IP_CKSUM)
> +				ipv4_hdr->hdr_checksum = 0;
> +
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + m->l3_len);
> +			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
> +		} else {
> +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> +					inner_l3_offset);
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
> +					inner_l3_offset + m->l3_len);
> +			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
> +		}
> +	}
> +	return 0;
> +}
> +
> +#endif /* _RTE_PKT_H_ */
> 

The function expects that all the network headers are in the first, and
that each of them is contiguous.

Also, I had an interesting remark from Stephen [1] on a similar code.
If the mbuf is a clone, it will modify the data of the direct mbuf,
which should be read-only. Note that it is likely to happen in
a TCP stack, because the packet is kept locally in case it has to
be retransmitted. Cloning a mbuf is more efficient than duplicating
it.

I plan to fix it in virtio code by "uncloning" the headers.

[1] http://dpdk.org/ml/archives/dev/2016-October/048873.html



Regards,
Olivier

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation
  2016-10-18 14:57             ` Olivier Matz
@ 2016-10-19 15:42               ` Kulasek, TomaszX
  2016-10-19 22:07                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-19 15:42 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: Ananyev, Konstantin, thomas.monjalon

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, October 18, 2016 16:57
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> I think the principle of tx_prep() is good, it may for instance help to
> remove the function virtio_tso_fix_cksum() from the virtio, and maybe even
> change the mbuf TSO/cksum API.
> 
> I have some questions/comments below, I'm sorry it comes very late.
> 
> On 10/14/2016 05:05 PM, Tomasz Kulasek wrote:
> > Added API for `rte_eth_tx_prep`
> >
> > uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
> > 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> >
> > Added fields to the `struct rte_eth_desc_lim`:
> >
> > 	uint16_t nb_seg_max;
> > 		/**< Max number of segments per whole packet. */
> >
> > 	uint16_t nb_mtu_seg_max;
> > 		/**< Max number of segments per one MTU */
> 
> Not sure I understand the second one. Is this is case of TSO?
> 
> Is it a usual limitation in different network hardware?
> Can this info be retrieved/used by the application?
> 

Yes, limitation of number of segments may differ depend of TSO/non TSO, e.g. for Fortville NIC. This information is available for application

> >
> > Created `rte_pkt.h` header with common used functions:
> >
> > int rte_validate_tx_offload(struct rte_mbuf *m)
> > 	to validate general requirements for tx offload in packet such a
> > 	flag completness. In current implementation this function is called
> > 	optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> >
> > int rte_phdr_cksum_fix(struct rte_mbuf *m)
> > 	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> > 	before hardware tx checksum offload.
> > 	 - for non-TSO tcp/udp packets full pseudo-header checksum is
> > 	   counted and set.
> > 	 - for TSO the IP payload length is not included.
> 
> Why not in rte_net.h?
> 
> 
> > [...]
> >
> > @@ -2816,6 +2825,82 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t
> queue_id,
> >  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts,
> > nb_pkts);  }
> >
> > +/**
> > + * Process a burst of output packets on a transmit queue of an Ethernet
> device.
> > + *
> > + * The rte_eth_tx_prep() function is invoked to prepare output
> > +packets to be
> > + * transmitted on the output queue *queue_id* of the Ethernet device
> > +designated
> > + * by its *port_id*.
> > + * The *nb_pkts* parameter is the number of packets to be prepared
> > +which are
> > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of
> > +them
> > + * allocated from a pool created with rte_pktmbuf_pool_create().
> > + * For each packet to send, the rte_eth_tx_prep() function performs
> > + * the following operations:
> > + *
> > + * - Check if packet meets devices requirements for tx offloads.
> 
> Do you mean hardware requirements?
> Can the application be aware of these requirements? I mean capability
> flags, or something in dev_infos?

Yes. If some offloads cannot be handled by hardware, it fails. Also if e.g. number of segments is invalid and so on.

> 
> Maybe the comment could be more precise?
> 
> > + * - Check limitations about number of segments.
> > + *
> > + * - Check additional requirements when debug is enabled.
> 
> What kind of additional requirements?
> 

We may assume, that application setts right, e.g. ip header version is required for most of checksum offloads. To not have additional performance overhead, these checks are done only when debug is on.

> > + *
> > + * - Update and/or reset required checksums when tx offload is set for
> packet.
> > + *
> 
> By reading this, I think it may not be clear for the user about what
> should be set in the mbuf. In mbuf API, it is said:
> 
>  * TCP segmentation offload. To enable this offload feature for a
>  * packet to be transmitted on hardware supporting TSO:
>  *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
>  *    PKT_TX_TCP_CKSUM)
>  *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
>  *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
>  *    to 0 in the packet
>  *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
>  *  - calculate the pseudo header checksum without taking ip_len in
> account,
>  *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
>  *    rte_ipv6_phdr_cksum() that can be used as helpers.
> 
> 
> If I understand well, using tx_prep(), the user will have to do the same
> except writing the IP checksum to 0, and without setting the TCP pseudo
> header checksum, right?
> 

Right, but this header information is still valid for tx_burst operation done without preparation stage.

> 
> > + * The rte_eth_tx_prep() function returns the number of packets ready
> > + to be
> > + * sent. A return value equal to *nb_pkts* means that all packets are
> > + valid and
> > + * ready to be sent.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *   The index of the transmit queue through which output packets must
> be
> > + *   sent.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously
> supplied
> > + *   to rte_eth_dev_configure().
> > + * @param tx_pkts
> > + *   The address of an array of *nb_pkts* pointers to *rte_mbuf*
> structures
> > + *   which contain the output packets.
> > + * @param nb_pkts
> > + *   The maximum number of packets to process.
> > + * @return
> > + *   The number of packets correct and ready to be sent. The return
> value can be
> > + *   less than the value of the *tx_pkts* parameter when some packet
> doesn't
> > + *   meet devices requirements with rte_errno set appropriately.
> > + */
> 
> Can we add the constraint that invalid packets are left untouched?
> 
> I think most of the time there will be a software fallback in that case,
> so it would be good to ensure that this function does not change the flags
> or the packet data.

In current implementation, if packet is invalid, its data is never modified. Only checks are done. The only exception is when checksum needs to be updated or initialized, but it's done after packet validation.
If we want to use/restore packet in application it didn't should be changed in any way for invalid packets.

> 
> Another thing that could be interesting for the caller is to know the
> reason of the failure. Maybe the different errno types could be detailed
> here. For instance:
> - EINVAL: offload flags are not correctly set (i.e. would fail whatever
>   the hardware)
> - ENOTSUP: the offload feature is not supported by the hardware
> - ...
> 

Ok.

> > +
> > +#ifdef RTE_ETHDEV_TX_PREP
> > +
> > +static inline uint16_t
> > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
> **tx_pkts,
> > +		uint16_t nb_pkts)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +	if (!dev->tx_pkt_prep)
> > +		return nb_pkts;
> > +
> > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > +	if (queue_id >= dev->data->nb_tx_queues) {
> > +		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> > +		rte_errno = -EINVAL;
> > +		return 0;
> > +	}
> > +#endif
> 
> Why checking the queue_id but not the port_id?
> 
> Maybe the API comment should also be modified to say that the port_id has
> to be valid, because most ethdev functions do the check.
> 

I can add this check when debug is on to made it more complete, and update an API comment for the default case.

> > +
> > +	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> > +			tx_pkts, nb_pkts);
> > +}
> > +
> > +#else
> > +
> > +static inline uint16_t
> > +rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t
> queue_id,
> > +		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts) {
> > +	return nb_pkts;
> > +}
> > +
> > +#endif
> > +
> 
> nit: I wonder if the #else part should be inside the function instead
> (with the proper RTE_SET_USED()), it would avoid to define the prototype
> twice.
> 
> >  typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t
> count,
> >  		void *userdata);
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 109e666..cfd6284 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -276,6 +276,15 @@ extern "C" {
> >   */
> >  #define PKT_TX_OUTER_IPV4   (1ULL << 59)
> >
> > +#define PKT_TX_OFFLOAD_MASK (    \
> > +		PKT_TX_IP_CKSUM |        \
> > +		PKT_TX_L4_MASK |         \
> > +		PKT_TX_OUTER_IP_CKSUM |  \
> > +		PKT_TX_TCP_SEG |         \
> > +		PKT_TX_QINQ_PKT |        \
> > +		PKT_TX_VLAN_PKT |        \
> > +		PKT_TX_TUNNEL_MASK)
> > +
> 
> Could you add an API comment?
> 
> > --- /dev/null
> > +++ b/lib/librte_net/rte_pkt.h
> >
> > [...]
> > +/**
> > + * Validate general requirements for tx offload in packet.
> > + */
> 
> The API comment does not have the usual format.
> 
> > +static inline int
> > +rte_validate_tx_offload(struct rte_mbuf *m)
> 
> should be const struct rte_mbuf *m
> 
> > +{
> > +	uint64_t ol_flags = m->ol_flags;
> > +
> > +	/* Does packet set any of available offloads? */
> > +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> > +		return 0;
> > +
> > +	/* IP checksum can be counted only for IPv4 packet */
> > +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> > +		return -EINVAL;
> > +
> > +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> > +		/* IP type not set */
> > +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> > +			return -EINVAL;
> > +
> > +	if (ol_flags & PKT_TX_TCP_SEG)
> > +		/* PKT_TX_IP_CKSUM offload not set for IPv4 TSO packet */
> > +		if ((m->tso_segsz == 0) ||
> > +				((ol_flags & PKT_TX_IPV4) && !(ol_flags &
> PKT_TX_IP_CKSUM)))
> > +			return -EINVAL;
> > +
> > +	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
> > +	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) && !(ol_flags &
> PKT_TX_OUTER_IPV4))
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> 
> It looks this function is only used when RTE_LIBRTE_ETHDEV_DEBUG is set.
> 

Yes, for performance reasons we expect, that application sets these values right. This is the most of "additional checks" mentioned before.

> I'd say this function should go in rte_mbuf.h, because it's purely related
> to mbuf flags, and does not rely on packet data or network headers.
> 
> 
> > +
> > +/**
> > + * Fix pseudo header checksum for TSO and non-TSO tcp/udp packets
> > +before
> > + * hardware tx checksum.
> > + * For non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set.
> > + * For TSO the IP payload length is not included.
> > + */
> 
> The API comment should be fixed.

Ok.

> 
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct ipv6_hdr *ipv6_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	struct udp_hdr *udp_hdr;
> > +	uint64_t ol_flags = m->ol_flags;
> > +	uint64_t inner_l3_offset = m->l2_len;
> > +
> > +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > +
> > +	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> > +		if (ol_flags & PKT_TX_IPV4) {
> > +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> > +					inner_l3_offset);
> > +
> > +			if (ol_flags & PKT_TX_IP_CKSUM)
> > +				ipv4_hdr->hdr_checksum = 0;
> > +
> > +			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + m-
> >l3_len);
> > +			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
> ol_flags);
> > +		} else {
> > +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> > +					inner_l3_offset);
> > +			/* non-TSO udp */
> > +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
> > +					inner_l3_offset + m->l3_len);
> > +			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
> ol_flags);
> > +		}
> > +	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
> > +			(ol_flags & PKT_TX_TCP_SEG)) {
> > +		if (ol_flags & PKT_TX_IPV4) {
> > +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> > +					inner_l3_offset);
> > +
> > +			if (ol_flags & PKT_TX_IP_CKSUM)
> > +				ipv4_hdr->hdr_checksum = 0;
> > +
> > +			/* non-TSO tcp or TSO */
> > +			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + m-
> >l3_len);
> > +			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
> > +		} else {
> > +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> > +					inner_l3_offset);
> > +			/* non-TSO tcp or TSO */
> > +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
> > +					inner_l3_offset + m->l3_len);
> > +			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> > +#endif /* _RTE_PKT_H_ */
> >
> 
> The function expects that all the network headers are in the first, and
> that each of them is contiguous.
> 

Yes, I see...

> Also, I had an interesting remark from Stephen [1] on a similar code.
> If the mbuf is a clone, it will modify the data of the direct mbuf, which
> should be read-only. Note that it is likely to happen in a TCP stack,
> because the packet is kept locally in case it has to be retransmitted.
> Cloning a mbuf is more efficient than duplicating it.
> 
> I plan to fix it in virtio code by "uncloning" the headers.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-October/048873.html
> 
> 
> 
> Regards,
> Olivier

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/6] ethdev: add Tx preparation
  2016-10-19 15:42               ` Kulasek, TomaszX
@ 2016-10-19 22:07                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-19 22:07 UTC (permalink / raw)
  To: Kulasek, TomaszX, Olivier Matz, dev; +Cc: thomas.monjalon

Hi guys,

> >
> > > +static inline int
> > > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct ipv6_hdr *ipv6_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	struct udp_hdr *udp_hdr;
> > > +	uint64_t ol_flags = m->ol_flags;
> > > +	uint64_t inner_l3_offset = m->l2_len;
> > > +
> > > +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > > +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > > +
> > > +	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> > > +		if (ol_flags & PKT_TX_IPV4) {
> > > +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> > > +					inner_l3_offset);
> > > +
> > > +			if (ol_flags & PKT_TX_IP_CKSUM)
> > > +				ipv4_hdr->hdr_checksum = 0;
> > > +
> > > +			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr + m-
> > >l3_len);
> > > +			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
> > ol_flags);
> > > +		} else {
> > > +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> > > +					inner_l3_offset);
> > > +			/* non-TSO udp */
> > > +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
> > > +					inner_l3_offset + m->l3_len);
> > > +			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
> > ol_flags);
> > > +		}
> > > +	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
> > > +			(ol_flags & PKT_TX_TCP_SEG)) {
> > > +		if (ol_flags & PKT_TX_IPV4) {
> > > +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> > > +					inner_l3_offset);
> > > +
> > > +			if (ol_flags & PKT_TX_IP_CKSUM)
> > > +				ipv4_hdr->hdr_checksum = 0;
> > > +
> > > +			/* non-TSO tcp or TSO */
> > > +			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + m-
> > >l3_len);
> > > +			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr, ol_flags);
> > > +		} else {
> > > +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> > > +					inner_l3_offset);
> > > +			/* non-TSO tcp or TSO */
> > > +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
> > > +					inner_l3_offset + m->l3_len);
> > > +			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr, ol_flags);
> > > +		}
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +#endif /* _RTE_PKT_H_ */
> > >
> >
> > The function expects that all the network headers are in the first, and
> > that each of them is contiguous.
> >
> 
> Yes, I see...

Yes it does.
I suppose it is a legitimate restriction  (assumptions) for those who'd like to use that function.
But that's a good point and I suppose we need to state it explicitly in the API comments.

> 
> > Also, I had an interesting remark from Stephen [1] on a similar code.
> > If the mbuf is a clone, it will modify the data of the direct mbuf, which
> > should be read-only. Note that it is likely to happen in a TCP stack,
> > because the packet is kept locally in case it has to be retransmitted.
> > Cloning a mbuf is more efficient than duplicating it.
> >
> > I plan to fix it in virtio code by "uncloning" the headers.
> >
> > [1] http://dpdk.org/ml/archives/dev/2016-October/048873.html

This subject is probably a bit off-topic... 
My position on it - it shouldn't be a PMD responsibility to make these kind of checks.
I think it should be upper layer responsibility to provide to the PMD an mbuf
that can be safely used (and in that case modified) by the driver.
Let say if upper layer would like to use packet clone for TCP retransmissions,
It can easily overcome that problem by cloning only data part of the packet,
and putting L2/L3/L4 headers in a separate (head) mbuf and chaining it with
cloned data mbuf.

Konstantin 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 0/6] add Tx preparation
  2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
                             ` (6 preceding siblings ...)
  2016-10-18 12:28           ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Ananyev, Konstantin
@ 2016-10-21 13:42           ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 1/6] ethdev: " Tomasz Kulasek
                               ` (6 more replies)
  7 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

>From 35b09a978d244092337b6f46fd1309f8c733bb6b Mon Sep 17 00:00:00 2001
From: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Date: Fri, 14 Oct 2016 16:10:35 +0200
Subject: [PATCH v6 0/6] add Tx preparation

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 +++++---------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   97 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   57 ++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   90 +++++++++++++++++++++++++++++++++++
 20 files changed, 583 insertions(+), 30 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 1/6] ethdev: add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 2/6] e1000: " Tomasz Kulasek
                               ` (5 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   97 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   57 ++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   90 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 245 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..d548d48 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,94 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..6fae003 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,50 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in packet.
+ *
+ * This function checks correctness and completeness of Tx offload flags
+ * settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..79669d7 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,91 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Fix pseudo header checksum
+ *
+ * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, and are not fragmented.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* headers are fragmented */
+	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len +
+			m->l4_len))
+		return -ENOTSUP;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 2/6] e1000: add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 3/6] fm10k: " Tomasz Kulasek
                               ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..5bd3c99 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..08e47f2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 3/6] fm10k: add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 1/6] ethdev: " Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 2/6] e1000: " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 4/6] i40e: " Tomasz Kulasek
                               ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..5fc4d5a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 4/6] i40e: add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
                               ` (2 preceding siblings ...)
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 5/6] ixgbe: " Tomasz Kulasek
                               ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..7f6d3d8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 5/6] ixgbe: add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
                               ` (3 preceding siblings ...)
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 4/6] i40e: " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..031414c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v7 6/6] testpmd: use Tx preparation in csum engine
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
                               ` (4 preceding siblings ...)
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-21 13:42             ` Tomasz Kulasek
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 13:42 UTC (permalink / raw)
  To: dev

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 0/6] add Tx preparation
  2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
                               ` (5 preceding siblings ...)
  2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-21 14:46             ` Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 1/6] ethdev: " Tomasz Kulasek
                                 ` (6 more replies)
  6 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

>From 35b09a978d244092337b6f46fd1309f8c733bb6b Mon Sep 17 00:00:00 2001
From: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Date: Fri, 14 Oct 2016 16:10:35 +0200
Subject: [PATCH v6 0/6] add Tx preparation

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 +++++---------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   97 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   56 ++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   90 +++++++++++++++++++++++++++++++++++
 20 files changed, 582 insertions(+), 30 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-24 12:14                 ` Ananyev, Konstantin
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 2/6] e1000: " Tomasz Kulasek
                                 ` (5 subsequent siblings)
  6 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(const struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   97 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   56 ++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   90 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 244 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..d548d48 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,94 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..db4c99a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,49 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..79669d7 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,91 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Fix pseudo header checksum
+ *
+ * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, and are not fragmented.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* headers are fragmented */
+	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len +
+			m->l4_len))
+		return -ENOTSUP;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 2/6] e1000: add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 3/6] fm10k: " Tomasz Kulasek
                                 ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..5bd3c99 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..08e47f2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 3/6] fm10k: add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 1/6] ethdev: " Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 2/6] e1000: " Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 4/6] i40e: " Tomasz Kulasek
                                 ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..5fc4d5a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 4/6] i40e: add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
                                 ` (2 preceding siblings ...)
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 5/6] ixgbe: " Tomasz Kulasek
                                 ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..7f6d3d8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 5/6] ixgbe: add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
                                 ` (3 preceding siblings ...)
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 4/6] i40e: " Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..031414c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v8 6/6] testpmd: use Tx preparation in csum engine
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
                                 ` (4 preceding siblings ...)
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-21 14:46               ` Tomasz Kulasek
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-21 14:46 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding aditional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-24 12:14                 ` Ananyev, Konstantin
  2016-10-24 12:49                   ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-24 12:14 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: olivier.matz

Hi Tomasz,

> 
>  /**
> + * Validate general requirements for tx offload in mbuf.
> + *
> + * This function checks correctness and completeness of Tx offload settings.
> + *
> + * @param m
> + *   The packet mbuf to be validated.
> + * @return
> + *   0 if packet is valid
> + */
> +static inline int
> +rte_validate_tx_offload(const struct rte_mbuf *m)
> +{
> +	uint64_t ol_flags = m->ol_flags;
> +
> +	/* Does packet set any of available offloads? */
> +	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> +		return 0;
> +
> +	/* IP checksum can be counted only for IPv4 packet */
> +	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> +		return -EINVAL;
> +
> +	/* IP type not set when required */
> +	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> +		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> +			return -EINVAL;
> +
> +	/* Check requirements for TSO packet */
> +	if (ol_flags & PKT_TX_TCP_SEG)
> +		if ((m->tso_segsz == 0) ||
> +				((ol_flags & PKT_TX_IPV4) &&
> +				!(ol_flags & PKT_TX_IP_CKSUM)))
> +			return -EINVAL;
> +
> +	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
> +	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
> +			!(ol_flags & PKT_TX_OUTER_IPV4))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/**
>   * Dump an mbuf structure to a file.
>   *
>   * Dump all fields for the given packet mbuf and all its associated
> diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
> index d4156ae..79669d7 100644
> --- a/lib/librte_net/rte_net.h
> +++ b/lib/librte_net/rte_net.h
> @@ -38,6 +38,11 @@
>  extern "C" {
>  #endif
> 
> +#include <rte_ip.h>
> +#include <rte_udp.h>
> +#include <rte_tcp.h>
> +#include <rte_sctp.h>
> +
>  /**
>   * Structure containing header lengths associated to a packet, filled
>   * by rte_net_get_ptype().
> @@ -86,6 +91,91 @@ struct rte_net_hdr_lens {
>  uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
>  	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
> 
> +/**
> + * Fix pseudo header checksum
> + *
> + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
> + * provided mbufs packet data.
> + *
> + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
> + *   in packet data,
> + * - for TSO the IP payload length is not included in pseudo header.
> + *
> + * This function expects that used headers are in the first data segment of
> + * mbuf, and are not fragmented.
> + *
> + * @param m
> + *   The packet mbuf to be validated.
> + * @return
> + *   0 if checksum is initialized properly
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct ipv6_hdr *ipv6_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	struct udp_hdr *udp_hdr;
> +	uint64_t ol_flags = m->ol_flags;
> +	uint64_t inner_l3_offset = m->l2_len;
> +
> +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> +
> +	/* headers are fragmented */
> +	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len +
> +			m->l4_len))

Might be better to move that check into rte_validate_tx_offload(),
so it would be called only when TX_DEBUG is on.
Another thing, shouldn't it be:
if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
?
Konstantin


> +		return -ENOTSUP;
> +
> +	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> +		if (ol_flags & PKT_TX_IPV4) {
> +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> +					inner_l3_offset);
> +
> +			if (ol_flags & PKT_TX_IP_CKSUM)
> +				ipv4_hdr->hdr_checksum = 0;
> +
> +			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
> +					m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
> +					ol_flags);
> +		} else {
> +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> +					inner_l3_offset);
> +			/* non-TSO udp */
> +			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
> +					inner_l3_offset + m->l3_len);
> +			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
> +					ol_flags);
> +		}
> +	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
> +			(ol_flags & PKT_TX_TCP_SEG)) {
> +		if (ol_flags & PKT_TX_IPV4) {
> +			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> +					inner_l3_offset);
> +
> +			if (ol_flags & PKT_TX_IP_CKSUM)
> +				ipv4_hdr->hdr_checksum = 0;
> +
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
> +					m->l3_len);
> +			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
> +					ol_flags);
> +		} else {
> +			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> +					inner_l3_offset);
> +			/* non-TSO tcp or TSO */
> +			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
> +					inner_l3_offset + m->l3_len);
> +			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
> +					ol_flags);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation
  2016-10-24 12:14                 ` Ananyev, Konstantin
@ 2016-10-24 12:49                   ` Kulasek, TomaszX
  2016-10-24 12:56                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-24 12:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: olivier.matz

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, October 24, 2016 14:15
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 

[...]

> >
> > +/**
> > + * Fix pseudo header checksum
> > + *
> > + * This function fixes pseudo header checksum for TSO and non-TSO
> > +tcp/udp in
> > + * provided mbufs packet data.
> > + *
> > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set
> > + *   in packet data,
> > + * - for TSO the IP payload length is not included in pseudo header.
> > + *
> > + * This function expects that used headers are in the first data
> > +segment of
> > + * mbuf, and are not fragmented.
> > + *
> > + * @param m
> > + *   The packet mbuf to be validated.
> > + * @return
> > + *   0 if checksum is initialized properly
> > + */
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct ipv6_hdr *ipv6_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	struct udp_hdr *udp_hdr;
> > +	uint64_t ol_flags = m->ol_flags;
> > +	uint64_t inner_l3_offset = m->l2_len;
> > +
> > +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > +
> > +	/* headers are fragmented */
> > +	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len
> +
> > +			m->l4_len))
> 
> Might be better to move that check into rte_validate_tx_offload(), so it
> would be called only when TX_DEBUG is on.

While unfragmented headers are not general requirements for Tx offloads, and this requirement is for this particular implementation, maybe for performance reasons will be better to keep it here, and just add #if DEBUG to leave rte_validate_tx_offload more generic.

> Another thing, shouldn't it be:
> if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len) ?

Yes, it should.

> Konstantin
> 

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation
  2016-10-24 12:49                   ` Kulasek, TomaszX
@ 2016-10-24 12:56                     ` Ananyev, Konstantin
  2016-10-24 14:12                       ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-24 12:56 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: olivier.matz



> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Monday, October 24, 2016 1:49 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Monday, October 24, 2016 14:15
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Cc: olivier.matz@6wind.com
> > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> >
> > Hi Tomasz,
> >
> 
> [...]
> 
> > >
> > > +/**
> > > + * Fix pseudo header checksum
> > > + *
> > > + * This function fixes pseudo header checksum for TSO and non-TSO
> > > +tcp/udp in
> > > + * provided mbufs packet data.
> > > + *
> > > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> > and set
> > > + *   in packet data,
> > > + * - for TSO the IP payload length is not included in pseudo header.
> > > + *
> > > + * This function expects that used headers are in the first data
> > > +segment of
> > > + * mbuf, and are not fragmented.
> > > + *
> > > + * @param m
> > > + *   The packet mbuf to be validated.
> > > + * @return
> > > + *   0 if checksum is initialized properly
> > > + */
> > > +static inline int
> > > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct ipv6_hdr *ipv6_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	struct udp_hdr *udp_hdr;
> > > +	uint64_t ol_flags = m->ol_flags;
> > > +	uint64_t inner_l3_offset = m->l2_len;
> > > +
> > > +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > > +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > > +
> > > +	/* headers are fragmented */
> > > +	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len
> > +
> > > +			m->l4_len))
> >
> > Might be better to move that check into rte_validate_tx_offload(), so it
> > would be called only when TX_DEBUG is on.
> 
> While unfragmented headers are not general requirements for Tx offloads, and this requirement is for this particular implementation,
> maybe for performance reasons will be better to keep it here, and just add #if DEBUG to leave rte_validate_tx_offload more generic.

Hmm and what is the advantage to pollute that code with more ifdefs?
Again, why unfragmented headers are not general requirements?
As long as DPDK pseudo-headear csum calculation routines can't handle fragmented case,
it pretty much is a general requirement, no?
Konstantin

> 
> > Another thing, shouldn't it be:
> > if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len) ?
> 
> Yes, it should.
> 
> > Konstantin
> >
> 
> Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 0/6] add Tx preparation
  2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
                                 ` (5 preceding siblings ...)
  2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-24 14:05               ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 1/6] ethdev: " Tomasz Kulasek
                                   ` (6 more replies)
  6 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz


>From 35b09a978d244092337b6f46fd1309f8c733bb6b Mon Sep 17 00:00:00 2001
From: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Date: Fri, 14 Oct 2016 16:10:35 +0200
Subject: [PATCH v6 0/6] add Tx preparation

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
- added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 +++++---------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   97 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   64 +++++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   85 +++++++++++++++++++++++++++++++++
 20 files changed, 585 insertions(+), 30 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 1/6] ethdev: add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 2/6] e1000: " Tomasz Kulasek
                                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   97 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   85 ++++++++++++++++++++++++++++++++++++
 4 files changed, 247 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..d548d48 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,94 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..ff9e749 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..90af335 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,86 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Fix pseudo header checksum
+ *
+ * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, and are not fragmented.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 2/6] e1000: add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 3/6] fm10k: " Tomasz Kulasek
                                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..5bd3c99 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..08e47f2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 3/6] fm10k: add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 1/6] ethdev: " Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 2/6] e1000: " Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 4/6] i40e: " Tomasz Kulasek
                                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..5fc4d5a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 4/6] i40e: add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
                                   ` (2 preceding siblings ...)
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 5/6] ixgbe: " Tomasz Kulasek
                                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..7f6d3d8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 5/6] ixgbe: add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
                                   ` (3 preceding siblings ...)
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 4/6] i40e: " Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..031414c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v9 6/6] testpmd: use Tx preparation in csum engine
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
                                   ` (4 preceding siblings ...)
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-24 14:05                 ` Tomasz Kulasek
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 14:05 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation
  2016-10-24 12:56                     ` Ananyev, Konstantin
@ 2016-10-24 14:12                       ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-24 14:12 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: olivier.matz



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, October 24, 2016 14:57
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -----Original Message-----
> > From: Kulasek, TomaszX
> > Sent: Monday, October 24, 2016 1:49 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> > Cc: olivier.matz@6wind.com
> > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Monday, October 24, 2016 14:15
> > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > Cc: olivier.matz@6wind.com
> > > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> > >
> > > Hi Tomasz,
> > >
> >
> > [...]
> >
> > > >
> > > > +/**
> > > > + * Fix pseudo header checksum
> > > > + *
> > > > + * This function fixes pseudo header checksum for TSO and non-TSO
> > > > +tcp/udp in
> > > > + * provided mbufs packet data.
> > > > + *
> > > > + * - for non-TSO tcp/udp packets full pseudo-header checksum is
> > > > +counted
> > > and set
> > > > + *   in packet data,
> > > > + * - for TSO the IP payload length is not included in pseudo
> header.
> > > > + *
> > > > + * This function expects that used headers are in the first data
> > > > +segment of
> > > > + * mbuf, and are not fragmented.
> > > > + *
> > > > + * @param m
> > > > + *   The packet mbuf to be validated.
> > > > + * @return
> > > > + *   0 if checksum is initialized properly
> > > > + */
> > > > +static inline int
> > > > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct ipv6_hdr *ipv6_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	struct udp_hdr *udp_hdr;
> > > > +	uint64_t ol_flags = m->ol_flags;
> > > > +	uint64_t inner_l3_offset = m->l2_len;
> > > > +
> > > > +	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > > > +		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > > > +
> > > > +	/* headers are fragmented */
> > > > +	if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset +
> > > > +m->l3_len
> > > +
> > > > +			m->l4_len))
> > >
> > > Might be better to move that check into rte_validate_tx_offload(),
> > > so it would be called only when TX_DEBUG is on.
> >
> > While unfragmented headers are not general requirements for Tx
> > offloads, and this requirement is for this particular implementation,
> maybe for performance reasons will be better to keep it here, and just add
> #if DEBUG to leave rte_validate_tx_offload more generic.
> 
> Hmm and what is the advantage to pollute that code with more ifdefs?
> Again, why unfragmented headers are not general requirements?
> As long as DPDK pseudo-headear csum calculation routines can't handle
> fragmented case, it pretty much is a general requirement, no?
> Konstantin
> 

Ok, you're right, if we assume that this is general requirement, it should be moved.

> >
> > > Another thing, shouldn't it be:
> > > if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
> ?
> >
> > Yes, it should.
> >
> > > Konstantin
> > >
> >
> > Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 0/6] add Tx preparation
  2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
                                   ` (5 preceding siblings ...)
  2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-24 16:51                 ` Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 1/6] ethdev: " Tomasz Kulasek
                                     ` (7 more replies)
  6 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
- added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 ++++++--------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 +++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   96 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   64 +++++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   85 +++++++++++++++++++++++++++++++++
 20 files changed, 584 insertions(+), 30 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 1/6] ethdev: add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-25 14:41                     ` Olivier Matz
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 2/6] e1000: " Tomasz Kulasek
                                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |   96 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   85 ++++++++++++++++++++++++++++++++++++
 4 files changed, 246 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..c4a8ccd 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,93 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..ff9e749 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..90af335 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,86 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Fix pseudo header checksum
+ *
+ * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, and are not fragmented.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 2/6] e1000: add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 3/6] fm10k: " Tomasz Kulasek
                                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..5bd3c99 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..08e47f2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 3/6] fm10k: add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 1/6] ethdev: " Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 2/6] e1000: " Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 4/6] i40e: " Tomasz Kulasek
                                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..5fc4d5a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 4/6] i40e: add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
                                     ` (2 preceding siblings ...)
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 5/6] ixgbe: " Tomasz Kulasek
                                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..7f6d3d8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 5/6] ixgbe: add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
                                     ` (3 preceding siblings ...)
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 4/6] i40e: " Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
                                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..031414c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v10 6/6] testpmd: use Tx preparation in csum engine
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
                                     ` (4 preceding siblings ...)
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-24 16:51                   ` Tomasz Kulasek
  2016-10-24 17:26                   ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Ananyev, Konstantin
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-24 16:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v10 0/6] add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
                                     ` (5 preceding siblings ...)
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-10-24 17:26                   ` Ananyev, Konstantin
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-24 17:26 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev; +Cc: olivier.matz

> 
> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.
> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.
> 
> 3) Some parameters (e.g. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> 4) Fields in packet may require different initialization (like e.g. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.
> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.
> 
> 
> PROPOSAL:
> ---------
> 
> To help user to deal with all these varieties we propose to:
> 
> 1) Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 2) Also new fields will be introduced in rte_eth_desc_lim:
>    nb_seg_max and nb_mtu_seg_max, providing an information about max
>    segments in TSO and non-TSO packets acceptable by device.
> 
>    This information is useful for application to not create/limit
>    malicious packet.
> 
> 
> APPLICATION (CASE OF USE):
> --------------------------
> 
> 1) Application should to initialize burst of packets to send, set
>    required tx offload flags and required fields, like l2_len, l3_len,
>    l4_len, and tso_segsz
> 
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>    required to send packets through the NIC.
> 
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>    and/or restore invalid if function fails.
> 
> e.g.
> 
> 	for (i = 0; i < nb_pkts; i++) {
> 
> 		/* initialize or process packet */
> 
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
> 
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
> 
> 		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
> 		 * can be used on remaining packets to find another ones.
> 		 */
> 
> 	}
> 
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
> 
> 	/* Free any unsent packets. */
> 
> v10 changes:
>  - moved drivers tx calback check in rte_eth_tx_prep after queue_id check
> 
> v9 changes:
>  - fixed headers structure fragmentation check
>  - moved fragmentation check into rte_validate_tx_offload()
> 
> v8 changes:
>  - mbuf argument in rte_validate_tx_offload declared as const
> 
> v7 changes:
>  - comments reworded/added
>  - changed errno values returned from Tx prep API
>  - added check in rte_phdr_cksum_fix if headers are in the first
>    data segment and can be safetly modified
>  - moved rte_validate_tx_offload to rte_mbuf
>  - moved rte_phdr_cksum_fix to rte_net.h
>  - removed rte_pkt.h new file as useless
> 
> v6 changes:
> - added performance impact test results to the patch description
> 
> v5 changes:
>  - rebased csum engine modification
>  - added information to the csum engine about performance tests
>  - some performance improvements
> 
> v4 changes:
>  - tx_prep is now set to default behavior (NULL) for simple/vector path
>    in fm10k, i40e and ixgbe drivers to increase performance, when
>    Tx offloads are not intentionally available
> 
> v3 changes:
>  - reworked csum testpmd engine instead adding new one,
>  - fixed checksum initialization procedure to include also outer
>    checksum offloads,
>  - some minor formattings and optimalizations
> 
> v2 changes:
>  - rte_eth_tx_prep() returns number of packets when device doesn't
>    support tx_prep functionality,
>  - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep
> 
> Tomasz Kulasek (6):
>   ethdev: add Tx preparation
>   e1000: add Tx preparation
>   fm10k: add Tx preparation
>   i40e: add Tx preparation
>   ixgbe: add Tx preparation
>   testpmd: use Tx preparation in csum engine
> 
>  app/test-pmd/csumonly.c          |   36 ++++++--------
>  config/common_base               |    1 +
>  drivers/net/e1000/e1000_ethdev.h |   11 +++++
>  drivers/net/e1000/em_ethdev.c    |    5 +-
>  drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++-
>  drivers/net/e1000/igb_ethdev.c   |    4 ++
>  drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++++++++-
>  drivers/net/fm10k/fm10k.h        |    6 +++
>  drivers/net/fm10k/fm10k_ethdev.c |    5 ++
>  drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++-
>  drivers/net/i40e/i40e_ethdev.c   |    3 ++
>  drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++++-
>  drivers/net/i40e/i40e_rxtx.h     |    8 ++++
>  drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
>  drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
>  lib/librte_ether/rte_ethdev.h    |   96 ++++++++++++++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h       |   64 +++++++++++++++++++++++++
>  lib/librte_net/rte_net.h         |   85 +++++++++++++++++++++++++++++++++
>  20 files changed, 584 insertions(+), 30 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v10 1/6] ethdev: add Tx preparation
  2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-25 14:41                     ` Olivier Matz
  2016-10-25 17:28                       ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Olivier Matz @ 2016-10-25 14:41 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: konstantin.ananyev

Hi Tomasz,

On 10/24/2016 06:51 PM, Tomasz Kulasek wrote:
> Added API for `rte_eth_tx_prep`
> 
> [...]
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -182,6 +182,7 @@ extern "C" {
>  #include <rte_pci.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
> +#include <rte_errno.h>
>  #include "rte_ether.h"
>  #include "rte_eth_ctrl.h"
>  #include "rte_dev_info.h"
> @@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
>  	uint16_t nb_max;   /**< Max allowed number of descriptors. */
>  	uint16_t nb_min;   /**< Min allowed number of descriptors. */
>  	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
> +	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */

Sorry if it was not clear in my previous review, but I think this should
be better explained here. You said that the "limitation of number
of segments may differ depend of TSO/non TSO".

As an application developer, I still have some difficulties to
clearly understand what does that mean. Is it the maximum number
of mbuf-segments that contain payload for one tcp-segment sent by
the device?

In that case, it looks quite difficult to verify that in an application.
It looks that this field is not used by validate_offload(), so how
should it be used by an application?


>  };
>  
>  /**
> @@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
>  				   uint16_t nb_pkts);
>  /**< @internal Send output packets on a transmit queue of an Ethernet device. */
>  
> +typedef uint16_t (*eth_tx_prep_t)(void *txq,
> +				   struct rte_mbuf **tx_pkts,
> +				   uint16_t nb_pkts);
> +/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
> +
>  typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
>  			       struct rte_eth_fc_conf *fc_conf);
>  /**< @internal Get current flow control parameter on an Ethernet device */
> @@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
>  struct rte_eth_dev {
>  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> @@ -2816,6 +2825,93 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
>  }
>  
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet device.
> + *
> + * The rte_eth_tx_prep() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prep() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.
> + *
> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.
> + *
> + * - Update and/or reset required checksums when tx offload is set for packet.
> + *
> + * The rte_eth_tx_prep() function returns the number of packets ready to be
> + * sent. A return value equal to *nb_pkts* means that all packets are valid and
> + * ready to be sent.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + *   The value must be a valid port id.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_pkts
> + *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
> + *   which contain the output packets.
> + * @param nb_pkts
> + *   The maximum number of packets to process.
> + * @return
> + *   The number of packets correct and ready to be sent. The return value can be
> + *   less than the value of the *tx_pkts* parameter when some packet doesn't
> + *   meet devices requirements with rte_errno set appropriately.
> + */

Inserting here the previous comment:

>> Can we add the constraint that invalid packets are left untouched?
>>
>> I think most of the time there will be a software fallback in that case,
>> so it would be good to ensure that this function does not change the flags
>> or the packet data.
> 
> In current implementation, if packet is invalid, its data is never modified. Only checks are done. The only exception is when checksum needs to be updated or initialized, but it's done after packet validation.
> If we want to use/restore packet in application it didn't should be changed in any way for invalid packets.
> 

I think this should be explicitly said in the API comment that
valid packets may be modified (*), but invalid packets (whose index
is >= than the return value) are left untouched.

(*) we still need to discuss that point, see below.

Another comment was made:

>> Another thing that could be interesting for the caller is to know the
>> reason of the failure. Maybe the different errno types could be detailed
>> here. For instance:
>> - EINVAL: offload flags are not correctly set (i.e. would fail whatever
>>   the hardware)
>> - ENOTSUP: the offload feature is not supported by the hardware
>> - ...
>>
> 
> Ok.

Don't you feel it could go in the API comment too?


> [...]
>
> +/**
> + * Fix pseudo header checksum
> + *
> + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
> + * provided mbufs packet data.
> + *
> + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
> + *   in packet data,
> + * - for TSO the IP payload length is not included in pseudo header.
> + *
> + * This function expects that used headers are in the first data segment of
> + * mbuf, and are not fragmented.

There is another requirement about the cloning and reference count.
Sorry I did not answer to Konstantin, but I still think this could be
an issue.

For instance, I think that the zero-copy mode of vhost application
references the packet sent by the guest and send the data. The payload
should not be modified because it is in guest memory (we don't know,
maybe the guest also cloned it for its own purpose).

It means that the tx_prep() API must not be used with clones, i.e
the headers must not reside in a segment whose RTE_MBUF_INDIRECT(seg)
or rte_mbuf_refcnt_read(seg) > 1.

- if we really want this API in 16.11, it should be clearly explained
  in the API comment that it does not work with shared segments

- for next versions, we have to take a decision whether it should be
  supported or not. In my opinion, cloned packets are useful and should
  be supported properly by all dpdk APIs.


Thanks,
Olivier

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v10 1/6] ethdev: add Tx preparation
  2016-10-25 14:41                     ` Olivier Matz
@ 2016-10-25 17:28                       ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-25 17:28 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: Ananyev, Konstantin



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, October 25, 2016 16:42
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [PATCH v10 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> On 10/24/2016 06:51 PM, Tomasz Kulasek wrote:
> > Added API for `rte_eth_tx_prep`
> >
> > [...]
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -182,6 +182,7 @@ extern "C" {
> >  #include <rte_pci.h>
> >  #include <rte_dev.h>
> >  #include <rte_devargs.h>
> > +#include <rte_errno.h>
> >  #include "rte_ether.h"
> >  #include "rte_eth_ctrl.h"
> >  #include "rte_dev_info.h"
> > @@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
> >  	uint16_t nb_max;   /**< Max allowed number of descriptors. */
> >  	uint16_t nb_min;   /**< Min allowed number of descriptors. */
> >  	uint16_t nb_align; /**< Number of descriptors should be aligned to.
> > */
> > +	uint16_t nb_seg_max;     /**< Max number of segments per whole
> packet. */
> > +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> 
> Sorry if it was not clear in my previous review, but I think this should
> be better explained here. You said that the "limitation of number of
> segments may differ depend of TSO/non TSO".
> 
> As an application developer, I still have some difficulties to clearly
> understand what does that mean. Is it the maximum number of mbuf-segments
> that contain payload for one tcp-segment sent by the device?
> 
> In that case, it looks quite difficult to verify that in an application.
> It looks that this field is not used by validate_offload(), so how should
> it be used by an application?
> 

E.g. for i40e (from xl710 datasheet):

  8.4.1 Transmit Packet in System Memory
   ...
  A few rules related to the transmit packet in host memory are:
   ...
   - A single transmit packet may span up to 8 buffers (up to 8 data descriptors
     per packet including both the header and payload buffers).
   - The total number of data descriptors for the whole TSO (explained later on in
     this chapter) is unlimited as long as each segment within the TSO obeys
     the previous rule (up to 8 data descriptors per segment for both the TSO
     header and the segment payload buffers).
  ...

+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8

For ixgbe driver there's one limitation, both for TSO and non-TSO and it
is always 40-WTHRESH.

Such a differences causes that these values should checked in the tx_prep
callback (when it is required) for a specific device (e.g. i40e, ixgbe).

If other device have not such a limitations, or are higher than DPDK can
handle (nb_segs is uint8_t), then for performance reason it doesn't need
to be checked.

Validate_offload() is for general check, the drivers specific implementation
Is the part in the callback function e.g. in 

  uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
		uint16_t nb_pkts)

for ixgbe.

Values in the "struct rte_eth_desc_lim" provides an information to the
application, which should to let you to not create malicious packets or
at least to limit their number.

Using Tx preparation API, these checks are made transparently for the
application and when nb_segs are out of limits, tx_prep function fails.

> 
> >  };
> >
> >  /**
> > @@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
> >  				   uint16_t nb_pkts);
> >  /**< @internal Send output packets on a transmit queue of an Ethernet
> > device. */
> >
> > +typedef uint16_t (*eth_tx_prep_t)(void *txq,
> > +				   struct rte_mbuf **tx_pkts,
> > +				   uint16_t nb_pkts);
> > +/**< @internal Prepare output packets on a transmit queue of an
> > +Ethernet device. */
> > +
> >  typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
> >  			       struct rte_eth_fc_conf *fc_conf);  /**< @internal
> Get
> > current flow control parameter on an Ethernet device */ @@ -1622,6
> > +1630,7 @@ struct rte_eth_rxtx_callback {  struct rte_eth_dev {
> >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function.
> */
> >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function.
> > */
> > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare
> > +function. */
> >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> >  	const struct eth_driver *driver;/**< Driver for this device */
> >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > @@ -2816,6 +2825,93 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t
> queue_id,
> >  	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts,
> > nb_pkts);  }
> >
> > +/**
> > + * Process a burst of output packets on a transmit queue of an Ethernet
> device.
> > + *
> > + * The rte_eth_tx_prep() function is invoked to prepare output
> > +packets to be
> > + * transmitted on the output queue *queue_id* of the Ethernet device
> > +designated
> > + * by its *port_id*.
> > + * The *nb_pkts* parameter is the number of packets to be prepared
> > +which are
> > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of
> > +them
> > + * allocated from a pool created with rte_pktmbuf_pool_create().
> > + * For each packet to send, the rte_eth_tx_prep() function performs
> > + * the following operations:
> > + *
> > + * - Check if packet meets devices requirements for tx offloads.
> > + *
> > + * - Check limitations about number of segments.
> > + *
> > + * - Check additional requirements when debug is enabled.
> > + *
> > + * - Update and/or reset required checksums when tx offload is set for
> packet.
> > + *
> > + * The rte_eth_tx_prep() function returns the number of packets ready
> > +to be
> > + * sent. A return value equal to *nb_pkts* means that all packets are
> > +valid and
> > + * ready to be sent.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + *   The value must be a valid port id.
> > + * @param queue_id
> > + *   The index of the transmit queue through which output packets must
> be
> > + *   sent.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously
> supplied
> > + *   to rte_eth_dev_configure().
> > + * @param tx_pkts
> > + *   The address of an array of *nb_pkts* pointers to *rte_mbuf*
> structures
> > + *   which contain the output packets.
> > + * @param nb_pkts
> > + *   The maximum number of packets to process.
> > + * @return
> > + *   The number of packets correct and ready to be sent. The return
> value can be
> > + *   less than the value of the *tx_pkts* parameter when some packet
> doesn't
> > + *   meet devices requirements with rte_errno set appropriately.
> > + */
> 
> Inserting here the previous comment:
> 
> >> Can we add the constraint that invalid packets are left untouched?
> >>
> >> I think most of the time there will be a software fallback in that
> >> case, so it would be good to ensure that this function does not
> >> change the flags or the packet data.
> >
> > In current implementation, if packet is invalid, its data is never
> modified. Only checks are done. The only exception is when checksum needs
> to be updated or initialized, but it's done after packet validation.
> > If we want to use/restore packet in application it didn't should be
> changed in any way for invalid packets.
> >
> 
> I think this should be explicitly said in the API comment that valid
> packets may be modified (*), but invalid packets (whose index is >= than
> the return value) are left untouched.
> 
> (*) we still need to discuss that point, see below.
> 
> Another comment was made:
> 
> >> Another thing that could be interesting for the caller is to know the
> >> reason of the failure. Maybe the different errno types could be
> >> detailed here. For instance:
> >> - EINVAL: offload flags are not correctly set (i.e. would fail whatever
> >>   the hardware)
> >> - ENOTSUP: the offload feature is not supported by the hardware
> >> - ...
> >>
> >
> > Ok.
> 
> Don't you feel it could go in the API comment too?
> 

Ok, I will modify comments.

> 
> > [...]
> >
> > +/**
> > + * Fix pseudo header checksum
> > + *
> > + * This function fixes pseudo header checksum for TSO and non-TSO
> > +tcp/udp in
> > + * provided mbufs packet data.
> > + *
> > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set
> > + *   in packet data,
> > + * - for TSO the IP payload length is not included in pseudo header.
> > + *
> > + * This function expects that used headers are in the first data
> > +segment of
> > + * mbuf, and are not fragmented.
> 
> There is another requirement about the cloning and reference count.
> Sorry I did not answer to Konstantin, but I still think this could be an
> issue.
> 
> For instance, I think that the zero-copy mode of vhost application
> references the packet sent by the guest and send the data. The payload
> should not be modified because it is in guest memory (we don't know, maybe
> the guest also cloned it for its own purpose).
> 
> It means that the tx_prep() API must not be used with clones, i.e the
> headers must not reside in a segment whose RTE_MBUF_INDIRECT(seg) or
> rte_mbuf_refcnt_read(seg) > 1.
> 
> - if we really want this API in 16.11, it should be clearly explained
>   in the API comment that it does not work with shared segments
> 

Ok, I will.

> - for next versions, we have to take a decision whether it should be
>   supported or not. In my opinion, cloned packets are useful and should
>   be supported properly by all dpdk APIs.
> 
> 
> Thanks,
> Olivier

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 0/6] add Tx preparation
  2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
                                     ` (6 preceding siblings ...)
  2016-10-24 17:26                   ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Ananyev, Konstantin
@ 2016-10-26 12:56                   ` Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
                                       ` (6 more replies)
  7 siblings, 7 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v11 changed:
 - updated comments
 - added information to the API description about packet data
   requirements/limitations.

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
- added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c          |   36 +++++--------
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 ++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |  103 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   64 +++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   85 +++++++++++++++++++++++++++++++
 20 files changed, 591 insertions(+), 30 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-10-27 12:38                       ` Olivier Matz
  2016-10-27 15:01                       ` Thomas Monjalon
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 2/6] e1000: " Tomasz Kulasek
                                       ` (5 subsequent siblings)
  6 siblings, 2 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |  103 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   85 ++++++++++++++++++++++++++++++++++
 4 files changed, 253 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..cf6f68e 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,100 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * Since this function can modify packet data, provided mbufs must be safely
+ * writable (e.g. modified data cannot be in shared segment).
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent, otherwise stops processing on the first invalid packet and
+ * leaves the rest packets untouched.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately:
+ *   - -EINVAL: offload flags are not correctly set
+ *   - -ENOTSUP: the offload feature is not supported by the hardware
+ *
+ */
+
+#ifdef RTE_ETHDEV_TX_PREP
+
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prep)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prep(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 109e666..ff9e749 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@ extern "C" {
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..553d5f1 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,86 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Fix pseudo header checksum
+ *
+ * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_phdr_cksum_fix(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 2/6] e1000: add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 3/6] fm10k: " Tomasz Kulasek
                                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ void eth_igb_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ void eth_em_tx_init(struct rte_eth_dev *dev);
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 7cf5f0c..17b45cb 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prep = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1067,6 +1068,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..5bd3c99 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 4924396..0afdd09 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prep = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..08e47f2 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prep = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 3/6] fm10k: add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 2/6] e1000: " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 4/6] i40e: " Tomasz Kulasek
                                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index c804436..dffb6d1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1446,6 +1446,8 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2754,8 +2756,10 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prep = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prep = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2834,6 +2838,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prep = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..5fc4d5a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 4/6] i40e: add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
                                       ` (2 preceding siblings ...)
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 3/6] fm10k: " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 5/6] ixgbe: " Tomasz Kulasek
                                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 5af0e43..dab0d48 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -936,6 +936,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prep = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2629,6 +2630,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..7f6d3d8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ i40e_xmit_pkts_simple(void *tx_queue,
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prep = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prep = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 5/6] ixgbe: add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
                                       ` (3 preceding siblings ...)
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 4/6] i40e: " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   58 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4ca5747..4c6a8e1 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static const struct rte_eth_desc_lim tx_desc_lim = {
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prep = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 2ce8234..031414c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   Copyright 2014 6WIND S.A.
  *   All rights reserved.
  *
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ end_of_tx:
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_phdr_cksum_fix(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prep = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prep = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v11 6/6] testpmd: use Tx preparation in csum engine
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
                                       ` (4 preceding siblings ...)
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 5/6] ixgbe: " Tomasz Kulasek
@ 2016-10-26 12:56                     ` Tomasz Kulasek
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
  6 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-10-26 12:56 UTC (permalink / raw)
  To: dev

Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.

Adding additional step to the csum engine costs about 3-4% of
performance drop, on my setup with ixgbe driver. It's caused mostly by
the need of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 app/test-pmd/csumonly.c |   36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..6f33ae9 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,32 +361,24 @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info *info,
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
-			}
 		}
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
-		}
 	} else if (info->l4_proto == IPPROTO_SCTP) {
 		sctp_hdr = (struct sctp_hdr *)((char *)l3_hdr + info->l3_len);
 		sctp_hdr->cksum = 0;
@@ -648,6 +631,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +841,13 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+	nb_prep = rte_eth_tx_prep(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_prep);
 	/*
 	 * Retry if necessary
 	 */
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
@ 2016-10-27 12:38                       ` Olivier Matz
  2016-10-27 15:01                       ` Thomas Monjalon
  1 sibling, 0 replies; 261+ messages in thread
From: Olivier Matz @ 2016-10-27 12:38 UTC (permalink / raw)
  To: Tomasz Kulasek, dev



On 10/26/2016 02:56 PM, Tomasz Kulasek wrote:
> Added API for `rte_eth_tx_prep`
> 
> [...]
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
  2016-10-27 12:38                       ` Olivier Matz
@ 2016-10-27 15:01                       ` Thomas Monjalon
  2016-10-27 15:52                         ` Ananyev, Konstantin
  2016-10-27 16:29                         ` Kulasek, TomaszX
  1 sibling, 2 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-27 15:01 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

Hi Tomasz,

This is a major new function in the API and I still have some comments.

2016-10-26 14:56, Tomasz Kulasek:
> --- a/config/common_base
> +++ b/config/common_base
> +CONFIG_RTE_ETHDEV_TX_PREP=y

We cannot enable it until it is implemented in every drivers.

>  struct rte_eth_dev {
>  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */

Could you confirm why tx_pkt_prep is not in dev_ops?
I guess we want to have several implementations?

Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?

> +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
> +               uint16_t nb_pkts)

The word "prep" can be understood as "prepend".
Why not rte_eth_tx_prepare?

> +/**
> + * Fix pseudo header checksum
> + *
> + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
> + * provided mbufs packet data.
> + *
> + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
> + *   in packet data,
> + * - for TSO the IP payload length is not included in pseudo header.
> + *
> + * This function expects that used headers are in the first data segment of
> + * mbuf, are not fragmented and can be safely modified.

What happens otherwise?

> + *
> + * @param m
> + *   The packet mbuf to be fixed.
> + * @return
> + *   0 if checksum is initialized properly
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)

Could we find a better name for this function?
- About the prefix, rte_ip_ ?
- About the scope, where this phdr_cksum is specified?
Isn't it an intel_phdr_cksum to match what hardware expects?
- About the verb, is it really fixing something broken?
Or just writing into a mbuf?
I would suggest rte_ip_intel_cksum_prepare.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 15:01                       ` Thomas Monjalon
@ 2016-10-27 15:52                         ` Ananyev, Konstantin
  2016-10-27 16:02                           ` Thomas Monjalon
  2016-10-27 16:29                         ` Kulasek, TomaszX
  1 sibling, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-27 15:52 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev



> 
> Hi Tomasz,
> 
> This is a major new function in the API and I still have some comments.
> 
> 2016-10-26 14:56, Tomasz Kulasek:
> > --- a/config/common_base
> > +++ b/config/common_base
> > +CONFIG_RTE_ETHDEV_TX_PREP=y
> 
> We cannot enable it until it is implemented in every drivers.

Not sure why?
If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
Right now it is not mandatory for the PMD to implement it.

> 
> >  struct rte_eth_dev {
> >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> >  	const struct eth_driver *driver;/**< Driver for this device */
> >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> 
> Could you confirm why tx_pkt_prep is not in dev_ops?
> I guess we want to have several implementations?

Yes, it depends on configuration options, same as tx_pkt_burst.

> 
> Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?

That's probably a good idea, but I suppose it is out of scope for that patch.
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 15:52                         ` Ananyev, Konstantin
@ 2016-10-27 16:02                           ` Thomas Monjalon
  2016-10-27 16:24                             ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-27 16:02 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-10-27 15:52, Ananyev, Konstantin:
> 
> > 
> > Hi Tomasz,
> > 
> > This is a major new function in the API and I still have some comments.
> > 
> > 2016-10-26 14:56, Tomasz Kulasek:
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > 
> > We cannot enable it until it is implemented in every drivers.
> 
> Not sure why?
> If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> Right now it is not mandatory for the PMD to implement it.

If it is not implemented, the application must do the preparation by itself.
>From patch 6:
"
Removed pseudo header calculation for udp/tcp/tso packets from
application and used Tx preparation API for packet preparation and
verification.
"
So how does it behave with other drivers?

> > >  struct rte_eth_dev {
> > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > >  	const struct eth_driver *driver;/**< Driver for this device */
> > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > 
> > Could you confirm why tx_pkt_prep is not in dev_ops?
> > I guess we want to have several implementations?
> 
> Yes, it depends on configuration options, same as tx_pkt_burst.
> 
> > 
> > Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?
> 
> That's probably a good idea, but I suppose it is out of scope for that patch.

No it's not out of scope.
It answers to the question "why is it added in this structure and not dev_ops".
We won't do this change when nothing else is changed in the struct.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 16:02                           ` Thomas Monjalon
@ 2016-10-27 16:24                             ` Ananyev, Konstantin
  2016-10-27 16:39                               ` Thomas Monjalon
  2016-10-27 16:39                               ` Kulasek, TomaszX
  0 siblings, 2 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-27 16:24 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, October 27, 2016 5:02 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 2016-10-27 15:52, Ananyev, Konstantin:
> >
> > >
> > > Hi Tomasz,
> > >
> > > This is a major new function in the API and I still have some comments.
> > >
> > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > --- a/config/common_base
> > > > +++ b/config/common_base
> > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > >
> > > We cannot enable it until it is implemented in every drivers.
> >
> > Not sure why?
> > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > Right now it is not mandatory for the PMD to implement it.
> 
> If it is not implemented, the application must do the preparation by itself.
> From patch 6:
> "
> Removed pseudo header calculation for udp/tcp/tso packets from
> application and used Tx preparation API for packet preparation and
> verification.
> "
> So how does it behave with other drivers?

Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
My bad, missed that part completely.
Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
Probably a new fwd mode or just extra parameter for the existing one?
Any other suggestions?

> 
> > > >  struct rte_eth_dev {
> > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > >
> > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > I guess we want to have several implementations?
> >
> > Yes, it depends on configuration options, same as tx_pkt_burst.
> >
> > >
> > > Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?
> >
> > That's probably a good idea, but I suppose it is out of scope for that patch.
> 
> No it's not out of scope.
> It answers to the question "why is it added in this structure and not dev_ops".
> We won't do this change when nothing else is changed in the struct.

Not sure I understood you here:
Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as part of that patch?
But that's a lot of  changes all over rte_ethdev.[h,c].
It definitely worse a separate patch (might be some discussion) for me.
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 15:01                       ` Thomas Monjalon
  2016-10-27 15:52                         ` Ananyev, Konstantin
@ 2016-10-27 16:29                         ` Kulasek, TomaszX
  1 sibling, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-27 16:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, October 27, 2016 17:01
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> This is a major new function in the API and I still have some comments.
> 
> 2016-10-26 14:56, Tomasz Kulasek:
> > --- a/config/common_base
> > +++ b/config/common_base
> > +CONFIG_RTE_ETHDEV_TX_PREP=y
> 
> We cannot enable it until it is implemented in every drivers.
> 

For most of drivers it's safe to enable it by default and if this feature is not supported, no checks/modifications are done. In that meaning the processing path is the same as without using Tx preparation. 

Introducing this macro was discussed in the threads:

http://dpdk.org/ml/archives/dev/2016-September/046437.html
http://dpdk.org/dev/patchwork/patch/15770/

Short conclusion:

Jerin Jacob pointed, that it can have significant impact on some architectures (such a low-end ARMv7, ARMv8 targets which may not have PCIE-RC support and have only integrated NIC controller), even if this feature is not implemented.

We've added this macro to provide an ability to use NOOP operation and allow turn off this feature if will have adverse effect on specific configuration/hardware.

> >  struct rte_eth_dev {
> >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function.
> */
> >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function.
> */
> > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare
> function. */
> >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> >  	const struct eth_driver *driver;/**< Driver for this device */
> >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> 
> Could you confirm why tx_pkt_prep is not in dev_ops?
> I guess we want to have several implementations?
> 

Yes, the implementation may vary on selected tx_burst path (e.g. vector implementation, simple implementation, full featured, and so on, and can have another requirements, such a implemented features, performance requirements for each implementation).
The path is chosen based on the application requirements transparently and we have a pair of callbacks -- tx_burst and corresponding callback (which depends directly on tx_burst path).

> Shouldn't we have a const struct control_dev_ops and a struct
> datapath_dev_ops?
> 
> > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
> **tx_pkts,
> > +               uint16_t nb_pkts)
> 
> The word "prep" can be understood as "prepend".
> Why not rte_eth_tx_prepare?
> 

I do not mind.

> > +/**
> > + * Fix pseudo header checksum
> > + *
> > + * This function fixes pseudo header checksum for TSO and non-TSO
> tcp/udp in
> > + * provided mbufs packet data.
> > + *
> > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set
> > + *   in packet data,
> > + * - for TSO the IP payload length is not included in pseudo header.
> > + *
> > + * This function expects that used headers are in the first data
> segment of
> > + * mbuf, are not fragmented and can be safely modified.
> 
> What happens otherwise?
> 

There are requirements for this helper function. For Tx preparation callback we check this requirement and if it fails, -NOTSUP errno is returned.

> > + *
> > + * @param m
> > + *   The packet mbuf to be fixed.
> > + * @return
> > + *   0 if checksum is initialized properly
> > + */
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m)
> 
> Could we find a better name for this function?
> - About the prefix, rte_ip_ ?
> - About the scope, where this phdr_cksum is specified?
> Isn't it an intel_phdr_cksum to match what hardware expects?
> - About the verb, is it really fixing something broken?
> Or just writing into a mbuf?
> I would suggest rte_ip_intel_cksum_prepare.

Fixes in the meaning of requirements for offloads, which states e.g. that to use specific Tx offload we should to fill checksums in a proper way, if not, thee settings are not valid and should be fixed.

But you're right, prepare is better word.

About the function name, maybe rte_net_intel_chksum_prepare will be better while it prepares also tcp/udp headers and is placed in rte_net.h?

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 16:24                             ` Ananyev, Konstantin
@ 2016-10-27 16:39                               ` Thomas Monjalon
  2016-10-28 11:29                                 ` Ananyev, Konstantin
  2016-10-27 16:39                               ` Kulasek, TomaszX
  1 sibling, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-27 16:39 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-10-27 16:24, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > Hi Tomasz,
> > > >
> > > > This is a major new function in the API and I still have some comments.
> > > >
> > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > >
> > > > We cannot enable it until it is implemented in every drivers.
> > >
> > > Not sure why?
> > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > Right now it is not mandatory for the PMD to implement it.
> > 
> > If it is not implemented, the application must do the preparation by itself.
> > From patch 6:
> > "
> > Removed pseudo header calculation for udp/tcp/tso packets from
> > application and used Tx preparation API for packet preparation and
> > verification.
> > "
> > So how does it behave with other drivers?
> 
> Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> My bad, missed that part completely.
> Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> Probably a new fwd mode or just extra parameter for the existing one?
> Any other suggestions?

Please think how we can use it in every applications.
It is not ready.
Either we introduce the API without enabling it, or we implement it
in every drivers.

> > > > >  struct rte_eth_dev {
> > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > > >
> > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > I guess we want to have several implementations?
> > >
> > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > >
> > > >
> > > > Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?
> > >
> > > That's probably a good idea, but I suppose it is out of scope for that patch.
> > 
> > No it's not out of scope.
> > It answers to the question "why is it added in this structure and not dev_ops".
> > We won't do this change when nothing else is changed in the struct.
> 
> Not sure I understood you here:
> Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as part of that patch?
> But that's a lot of  changes all over rte_ethdev.[h,c].
> It definitely worse a separate patch (might be some discussion) for me.

Yes it could be a separate patch in the same patchset.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 16:24                             ` Ananyev, Konstantin
  2016-10-27 16:39                               ` Thomas Monjalon
@ 2016-10-27 16:39                               ` Kulasek, TomaszX
  2016-10-28 10:15                                 ` Ananyev, Konstantin
  1 sibling, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-27 16:39 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon; +Cc: dev

Hi

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, October 27, 2016 18:24
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Thursday, October 27, 2016 5:02 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> > 2016-10-27 15:52, Ananyev, Konstantin:
> > >
> > > >
> > > > Hi Tomasz,
> > > >
> > > > This is a major new function in the API and I still have some
> comments.
> > > >
> > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > >
> > > > We cannot enable it until it is implemented in every drivers.
> > >
> > > Not sure why?
> > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > Right now it is not mandatory for the PMD to implement it.
> >
> > If it is not implemented, the application must do the preparation by
> itself.
> > From patch 6:
> > "
> > Removed pseudo header calculation for udp/tcp/tso packets from
> > application and used Tx preparation API for packet preparation and
> > verification.
> > "
> > So how does it behave with other drivers?
> 
> Hmm so it seems that we broke testpmd csumonly mode for non-intel
> drivers..
> My bad, missed that part completely.
> Yes, then I suppose for now we'll need to support both (with and without)
> code paths for testpmd.
> Probably a new fwd mode or just extra parameter for the existing one?
> Any other suggestions?
> 

I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can resent it in place of csumonly modification.

Tomasz

> >
> > > > >  struct rte_eth_dev {
> > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> function. */
> > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > function. */
> > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > +prepare function. */
> > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > PMD */
> > > >
> > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > I guess we want to have several implementations?
> > >
> > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > >
> > > >
> > > > Shouldn't we have a const struct control_dev_ops and a struct
> datapath_dev_ops?
> > >
> > > That's probably a good idea, but I suppose it is out of scope for that
> patch.
> >
> > No it's not out of scope.
> > It answers to the question "why is it added in this structure and not
> dev_ops".
> > We won't do this change when nothing else is changed in the struct.
> 
> Not sure I understood you here:
> Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> as part of that patch?
> But that's a lot of  changes all over rte_ethdev.[h,c].
> It definitely worse a separate patch (might be some discussion) for me.
> Konstantin
> 
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 16:39                               ` Kulasek, TomaszX
@ 2016-10-28 10:15                                 ` Ananyev, Konstantin
  2016-10-28 10:22                                   ` Kulasek, TomaszX
                                                     ` (2 more replies)
  0 siblings, 3 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-28 10:15 UTC (permalink / raw)
  To: Kulasek, TomaszX, Thomas Monjalon; +Cc: dev

Hi Tomasz,

> 
> Hi
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, October 27, 2016 18:24
> > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> >
> >
> > > -----Original Message-----
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > Sent: Thursday, October 27, 2016 5:02 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > >
> > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > >
> > > > >
> > > > > Hi Tomasz,
> > > > >
> > > > > This is a major new function in the API and I still have some
> > comments.
> > > > >
> > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > >
> > > > > We cannot enable it until it is implemented in every drivers.
> > > >
> > > > Not sure why?
> > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > Right now it is not mandatory for the PMD to implement it.
> > >
> > > If it is not implemented, the application must do the preparation by
> > itself.
> > > From patch 6:
> > > "
> > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > application and used Tx preparation API for packet preparation and
> > > verification.
> > > "
> > > So how does it behave with other drivers?
> >
> > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > drivers..
> > My bad, missed that part completely.
> > Yes, then I suppose for now we'll need to support both (with and without)
> > code paths for testpmd.
> > Probably a new fwd mode or just extra parameter for the existing one?
> > Any other suggestions?
> >
> 
> I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can resent
> it in place of csumonly modification.

I still not sure it is worth to have another version of csum...
Can we introduce a new global variable in testpmd and a new command:
testpmd> csum tx_prep
or so? 
Looking at current testpmd patch, I suppose the changes will be minimal.
What do you think?
Konstantin 

> 
> Tomasz
> 
> > >
> > > > > >  struct rte_eth_dev {
> > > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> > function. */
> > > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > > function. */
> > > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > > +prepare function. */
> > > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > > PMD */
> > > > >
> > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > I guess we want to have several implementations?
> > > >
> > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > >
> > > > >
> > > > > Shouldn't we have a const struct control_dev_ops and a struct
> > datapath_dev_ops?
> > > >
> > > > That's probably a good idea, but I suppose it is out of scope for that
> > patch.
> > >
> > > No it's not out of scope.
> > > It answers to the question "why is it added in this structure and not
> > dev_ops".
> > > We won't do this change when nothing else is changed in the struct.
> >
> > Not sure I understood you here:
> > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> > as part of that patch?
> > But that's a lot of  changes all over rte_ethdev.[h,c].
> > It definitely worse a separate patch (might be some discussion) for me.
> > Konstantin
> >
> >

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 10:15                                 ` Ananyev, Konstantin
@ 2016-10-28 10:22                                   ` Kulasek, TomaszX
  2016-10-28 10:22                                   ` Thomas Monjalon
  2016-10-28 11:14                                   ` Jerin Jacob
  2 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-10-28 10:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon; +Cc: dev

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, October 28, 2016 12:16
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, October 27, 2016 18:24
> > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > Sent: Thursday, October 27, 2016 5:02 PM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> > > >
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > >
> > > > > >
> > > > > > Hi Tomasz,
> > > > > >
> > > > > > This is a major new function in the API and I still have some
> > > comments.
> > > > > >
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as
> noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation
> > > > by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and
> > > without) code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> >
> > I had sent txprep engine in v2
> > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the
> suggestions. If you like it I can resent it in place of csumonly
> modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep
> or so?
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?
> Konstantin
> 

This is not a problem.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 10:15                                 ` Ananyev, Konstantin
  2016-10-28 10:22                                   ` Kulasek, TomaszX
@ 2016-10-28 10:22                                   ` Thomas Monjalon
  2016-10-28 10:28                                     ` Ananyev, Konstantin
  2016-10-28 11:14                                   ` Jerin Jacob
  2 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-28 10:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, Kulasek, TomaszX; +Cc: dev

2016-10-28 10:15, Ananyev, Konstantin:
> > From: Ananyev, Konstantin
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without)
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> > 
> > I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can resent
> > it in place of csumonly modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep
> or so? 
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?

No please no!
The problem is not in testpmd.
The problem is in every applications.
Should we prepare the checksums or let tx_prep do it?
The result will depend of the driver used.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 10:22                                   ` Thomas Monjalon
@ 2016-10-28 10:28                                     ` Ananyev, Konstantin
  2016-10-28 11:02                                       ` Richardson, Bruce
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-28 10:28 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, October 28, 2016 11:22 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 2016-10-28 10:15, Ananyev, Konstantin:
> > > From: Ananyev, Konstantin
> > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > --- a/config/common_base
> > > > > > > > +++ b/config/common_base
> > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > >
> > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > >
> > > > > > Not sure why?
> > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > >
> > > > > If it is not implemented, the application must do the preparation by
> > > > itself.
> > > > > From patch 6:
> > > > > "
> > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > application and used Tx preparation API for packet preparation and
> > > > > verification.
> > > > > "
> > > > > So how does it behave with other drivers?
> > > >
> > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > > drivers..
> > > > My bad, missed that part completely.
> > > > Yes, then I suppose for now we'll need to support both (with and without)
> > > > code paths for testpmd.
> > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > Any other suggestions?
> > > >
> > >
> > > I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can
> resent
> > > it in place of csumonly modification.
> >
> > I still not sure it is worth to have another version of csum...
> > Can we introduce a new global variable in testpmd and a new command:
> > testpmd> csum tx_prep
> > or so?
> > Looking at current testpmd patch, I suppose the changes will be minimal.
> > What do you think?
> 
> No please no!
> The problem is not in testpmd.
> The problem is in every applications.
> Should we prepare the checksums or let tx_prep do it?

Not sure, I understood you...
Right now we don't' change other apps.
They would work as before.
If people would like to start to use tx_prep in their apps -
they are free to do that.
If they like to keep doing that manually - that's fine too.
>From other side we need an ability to test (and demonstrate) that new functionality.
So we do need changes in testpmd.
Konstantin



> The result will depend of the driver used.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 10:28                                     ` Ananyev, Konstantin
@ 2016-10-28 11:02                                       ` Richardson, Bruce
  0 siblings, 0 replies; 261+ messages in thread
From: Richardson, Bruce @ 2016-10-28 11:02 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Kulasek, TomaszX; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, October 28, 2016 11:29 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Friday, October 28, 2016 11:22 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Kulasek,
> > TomaszX <tomaszx.kulasek@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
> >
> > 2016-10-28 10:15, Ananyev, Konstantin:
> > > > From: Ananyev, Konstantin
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > --- a/config/common_base
> > > > > > > > > +++ b/config/common_base
> > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > >
> > > > > > > > We cannot enable it until it is implemented in every
> drivers.
> > > > > > >
> > > > > > > Not sure why?
> > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act
> as noop.
> > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > >
> > > > > > If it is not implemented, the application must do the
> > > > > > preparation by
> > > > > itself.
> > > > > > From patch 6:
> > > > > > "
> > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > application and used Tx preparation API for packet preparation
> > > > > > and verification.
> > > > > > "
> > > > > > So how does it behave with other drivers?
> > > > >
> > > > > Hmm so it seems that we broke testpmd csumonly mode for
> > > > > non-intel drivers..
> > > > > My bad, missed that part completely.
> > > > > Yes, then I suppose for now we'll need to support both (with and
> > > > > without) code paths for testpmd.
> > > > > Probably a new fwd mode or just extra parameter for the existing
> one?
> > > > > Any other suggestions?
> > > > >
> > > >
> > > > I had sent txprep engine in v2
> > > > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on
> > > > the suggestions. If you like it I can
> > resent
> > > > it in place of csumonly modification.
> > >
> > > I still not sure it is worth to have another version of csum...
> > > Can we introduce a new global variable in testpmd and a new command:
> > > testpmd> csum tx_prep
> > > or so?
> > > Looking at current testpmd patch, I suppose the changes will be
> minimal.
> > > What do you think?
> >
> > No please no!
> > The problem is not in testpmd.
> > The problem is in every applications.
> > Should we prepare the checksums or let tx_prep do it?
> 
> Not sure, I understood you...
> Right now we don't' change other apps.
> They would work as before.
> If people would like to start to use tx_prep in their apps - they are free
> to do that.
> If they like to keep doing that manually - that's fine too.
> From other side we need an ability to test (and demonstrate) that new
> functionality.
> So we do need changes in testpmd.
> Konstantin
> 

Just my 2c on this:
* given this is new functionality, and no apps are currently using it, I'm not sure I see the harm in having the function available by default. We just need to be clear about the limits of the function and the fact that apps need to do work themselves if the driver doesn't provide the function.
* having it enabled will then allow any apps that want to use it do to so.
* however, for our sample apps, and by default in testpmd, we *shouldn't* use this functionality, in the absence of any fallback, so that is where I would look to have the enable/disable switch, not in the library.
* going forward, I think a SW fallback inside the ethdev API itself would be a good addition to make this fully generic.

Hope this helps, [and also that I haven't missed some subtlety in the discussion!]

/Bruce

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 10:15                                 ` Ananyev, Konstantin
  2016-10-28 10:22                                   ` Kulasek, TomaszX
  2016-10-28 10:22                                   ` Thomas Monjalon
@ 2016-10-28 11:14                                   ` Jerin Jacob
  2 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-10-28 11:14 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Thomas Monjalon

On Fri, Oct 28, 2016 at 10:15:47AM +0000, Ananyev, Konstantin wrote:
> Hi Tomasz,
> 
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without)
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> > 
> > I had sent txprep engine in v2 (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the suggestions. If you like it I can resent
> > it in place of csumonly modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep

Just my 2 cents, As "tx_prep" is a generic API and if PMD tries to
fix-up some other limitation(not csum) then in that case it is difficult for
the application to know in which PMD&application combination it needs be used.

> or so? 
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?
> Konstantin 
> 
> > 
> > Tomasz
> > 
> > > >
> > > > > > >  struct rte_eth_dev {
> > > > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> > > function. */
> > > > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > > > function. */
> > > > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > > > +prepare function. */
> > > > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > > > PMD */
> > > > > >
> > > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > > I guess we want to have several implementations?
> > > > >
> > > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > > >
> > > > > >
> > > > > > Shouldn't we have a const struct control_dev_ops and a struct
> > > datapath_dev_ops?
> > > > >
> > > > > That's probably a good idea, but I suppose it is out of scope for that
> > > patch.
> > > >
> > > > No it's not out of scope.
> > > > It answers to the question "why is it added in this structure and not
> > > dev_ops".
> > > > We won't do this change when nothing else is changed in the struct.
> > >
> > > Not sure I understood you here:
> > > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> > > as part of that patch?
> > > But that's a lot of  changes all over rte_ethdev.[h,c].
> > > It definitely worse a separate patch (might be some discussion) for me.
> > > Konstantin
> > >
> > >
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-27 16:39                               ` Thomas Monjalon
@ 2016-10-28 11:29                                 ` Ananyev, Konstantin
  2016-10-28 11:34                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-28 11:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomasz,

> 
> 2016-10-27 16:24, Ananyev, Konstantin:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > Hi Tomasz,
> > > > >
> > > > > This is a major new function in the API and I still have some comments.
> > > > >
> > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > >
> > > > > We cannot enable it until it is implemented in every drivers.
> > > >
> > > > Not sure why?
> > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > Right now it is not mandatory for the PMD to implement it.
> > >
> > > If it is not implemented, the application must do the preparation by itself.
> > > From patch 6:
> > > "
> > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > application and used Tx preparation API for packet preparation and
> > > verification.
> > > "
> > > So how does it behave with other drivers?
> >
> > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > My bad, missed that part completely.
> > Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> > Probably a new fwd mode or just extra parameter for the existing one?
> > Any other suggestions?
> 
> Please think how we can use it in every applications.
> It is not ready.
> Either we introduce the API without enabling it, or we implement it
> in every drivers.

I understand your position here, but just like to point that:
1) It is a new functionality optional to use.
     The app is free not to use that functionality and still do the preparation itself
     (as it has to do it now).
    All existing apps would keep working as expected without using that function.
    Though if the app developer knows that for all HW models he plans to run on
    tx_prep is implemented - he is free to use it.
    2) It would be difficult for Tomasz (and other Intel guys) to implement tx_prep()
     for all non-Intel HW that DPDK supports right now.
     We just don't have all the actual HW in stock and probably adequate knowledge of it.
    So we depend here on the good will of other PMD mainaners/developers to implement
    tx_prep() for these devices. 
    From other side, if it will be disabled by default, then, I think,
    PMD developers just wouldn't be motivated to implement it. 
    So it will be left untested and unused forever.   

> 
> > > > > >  struct rte_eth_dev {
> > > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> > > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> > > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > > > >
> > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > I guess we want to have several implementations?
> > > >
> > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > >
> > > > >
> > > > > Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?
> > > >
> > > > That's probably a good idea, but I suppose it is out of scope for that patch.
> > >
> > > No it's not out of scope.
> > > It answers to the question "why is it added in this structure and not dev_ops".
> > > We won't do this change when nothing else is changed in the struct.
> >
> > Not sure I understood you here:
> > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as part of that patch?
> > But that's a lot of  changes all over rte_ethdev.[h,c].
> > It definitely worse a separate patch (might be some discussion) for me.
> 
> Yes it could be a separate patch in the same patchset.

Honestly, I think it is a good idea, but it is too late and too risky to do such change right now.
We are on RC2 right now, just few days before RC3...
Can't that wait till 17.02?
>From my understanding - it is pure code restructuring, without any functionality affected.
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 11:29                                 ` Ananyev, Konstantin
@ 2016-10-28 11:34                                   ` Ananyev, Konstantin
  2016-10-28 12:23                                     ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-28 11:34 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon; +Cc: dev



> 
> Hi Thomasz,
> 
> >
> > 2016-10-27 16:24, Ananyev, Konstantin:
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > Hi Tomasz,
> > > > > >
> > > > > > This is a major new function in the API and I still have some comments.
> > > > > >
> > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > >
> > > > > > We cannot enable it until it is implemented in every drivers.
> > > > >
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> >
> > Please think how we can use it in every applications.
> > It is not ready.
> > Either we introduce the API without enabling it, or we implement it
> > in every drivers.
> 
> I understand your position here, but just like to point that:
> 1) It is a new functionality optional to use.
>      The app is free not to use that functionality and still do the preparation itself
>      (as it has to do it now).
>     All existing apps would keep working as expected without using that function.
>     Though if the app developer knows that for all HW models he plans to run on
>     tx_prep is implemented - he is free to use it.
>     2) It would be difficult for Tomasz (and other Intel guys) to implement tx_prep()
>      for all non-Intel HW that DPDK supports right now.
>      We just don't have all the actual HW in stock and probably adequate knowledge of it.
>     So we depend here on the good will of other PMD mainaners/developers to implement
>     tx_prep() for these devices.
>     From other side, if it will be disabled by default, then, I think,
>     PMD developers just wouldn't be motivated to implement it.
>     So it will be left untested and unused forever.

Actually as another thought:
Can we have it enabled by default, but mark it as experimental or so?
If memory serves me right, we've done that for cryptodev in the past, no?
Konstantin

> 
> >
> > > > > > >  struct rte_eth_dev {
> > > > > > >  	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
> > > > > > >  	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> > > > > > > +	eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare function. */
> > > > > > >  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > >  	const struct eth_driver *driver;/**< Driver for this device */
> > > > > > >  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> > > > > >
> > > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > > I guess we want to have several implementations?
> > > > >
> > > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > > >
> > > > > >
> > > > > > Shouldn't we have a const struct control_dev_ops and a struct datapath_dev_ops?
> > > > >
> > > > > That's probably a good idea, but I suppose it is out of scope for that patch.
> > > >
> > > > No it's not out of scope.
> > > > It answers to the question "why is it added in this structure and not dev_ops".
> > > > We won't do this change when nothing else is changed in the struct.
> > >
> > > Not sure I understood you here:
> > > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced as part of that patch?
> > > But that's a lot of  changes all over rte_ethdev.[h,c].
> > > It definitely worse a separate patch (might be some discussion) for me.
> >
> > Yes it could be a separate patch in the same patchset.
> 
> Honestly, I think it is a good idea, but it is too late and too risky to do such change right now.
> We are on RC2 right now, just few days before RC3...
> Can't that wait till 17.02?
> From my understanding - it is pure code restructuring, without any functionality affected.
> Konstantin
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 11:34                                   ` Ananyev, Konstantin
@ 2016-10-28 12:23                                     ` Thomas Monjalon
  2016-10-28 12:59                                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-28 12:23 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-10-28 11:34, Ananyev, Konstantin:
> > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > --- a/config/common_base
> > > > > > > > +++ b/config/common_base
> > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > >
> > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > >
> > > > > > Not sure why?
> > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > >
> > > > > If it is not implemented, the application must do the preparation by itself.
> > > > > From patch 6:
> > > > > "
> > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > application and used Tx preparation API for packet preparation and
> > > > > verification.
> > > > > "
> > > > > So how does it behave with other drivers?
> > > >
> > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > > > My bad, missed that part completely.
> > > > Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > Any other suggestions?
> > >
> > > Please think how we can use it in every applications.
> > > It is not ready.
> > > Either we introduce the API without enabling it, or we implement it
> > > in every drivers.
> > 
> > I understand your position here, but just like to point that:
> > 1) It is a new functionality optional to use.
> >      The app is free not to use that functionality and still do the preparation itself
> >      (as it has to do it now).
> >     All existing apps would keep working as expected without using that function.
> >     Though if the app developer knows that for all HW models he plans to run on
> >     tx_prep is implemented - he is free to use it.
> >     2) It would be difficult for Tomasz (and other Intel guys) to implement tx_prep()
> >      for all non-Intel HW that DPDK supports right now.
> >      We just don't have all the actual HW in stock and probably adequate knowledge of it.
> >     So we depend here on the good will of other PMD mainaners/developers to implement
> >     tx_prep() for these devices.
> >     From other side, if it will be disabled by default, then, I think,
> >     PMD developers just wouldn't be motivated to implement it.
> >     So it will be left untested and unused forever.
> 
> Actually as another thought:
> Can we have it enabled by default, but mark it as experimental or so?
> If memory serves me right, we've done that for cryptodev in the past, no?

Cryptodev was a whole new library.
We won't play the game "find which function is experimental or not".

We should not enable a function until it is fully implemented.

If the user really understands that it will work only with few drivers
then he can change the build configuration himself.
Enabling in the default configuration is a message to say that it works
everywhere without any risk.
It's so simple that I don't even understand why I must argue for.

And by the way, it is late for 16.11.
I suggest to integrate it in the beginning of 17.02 cycle, with the hope
that you can convince other developers to implement it in other drivers,
so we could finally enable it in the default config.

Oh, and I don't trust that nobody were thinking that it would break testpmd
for non-Intel drivers.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 12:23                                     ` Thomas Monjalon
@ 2016-10-28 12:59                                       ` Ananyev, Konstantin
  2016-10-28 13:42                                         ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-10-28 12:59 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev


> 
> 2016-10-28 11:34, Ananyev, Konstantin:
> > > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > --- a/config/common_base
> > > > > > > > > +++ b/config/common_base
> > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > >
> > > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > > >
> > > > > > > Not sure why?
> > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > >
> > > > > > If it is not implemented, the application must do the preparation by itself.
> > > > > > From patch 6:
> > > > > > "
> > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > application and used Tx preparation API for packet preparation and
> > > > > > verification.
> > > > > > "
> > > > > > So how does it behave with other drivers?
> > > > >
> > > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > > > > My bad, missed that part completely.
> > > > > Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> > > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > > Any other suggestions?
> > > >
> > > > Please think how we can use it in every applications.
> > > > It is not ready.
> > > > Either we introduce the API without enabling it, or we implement it
> > > > in every drivers.
> > >
> > > I understand your position here, but just like to point that:
> > > 1) It is a new functionality optional to use.
> > >      The app is free not to use that functionality and still do the preparation itself
> > >      (as it has to do it now).
> > >     All existing apps would keep working as expected without using that function.
> > >     Though if the app developer knows that for all HW models he plans to run on
> > >     tx_prep is implemented - he is free to use it.
> > >     2) It would be difficult for Tomasz (and other Intel guys) to implement tx_prep()
> > >      for all non-Intel HW that DPDK supports right now.
> > >      We just don't have all the actual HW in stock and probably adequate knowledge of it.
> > >     So we depend here on the good will of other PMD mainaners/developers to implement
> > >     tx_prep() for these devices.
> > >     From other side, if it will be disabled by default, then, I think,
> > >     PMD developers just wouldn't be motivated to implement it.
> > >     So it will be left untested and unused forever.
> >
> > Actually as another thought:
> > Can we have it enabled by default, but mark it as experimental or so?
> > If memory serves me right, we've done that for cryptodev in the past, no?
> 
> Cryptodev was a whole new library.
> We won't play the game "find which function is experimental or not".
> 
> We should not enable a function until it is fully implemented.
> 
> If the user really understands that it will work only with few drivers
> then he can change the build configuration himself.
> Enabling in the default configuration is a message to say that it works
> everywhere without any risk.
> It's so simple that I don't even understand why I must argue for.
> 
> And by the way, it is late for 16.11.

Ok, I understand your concern about enabling it by default and testpmd breakage,
but what else you believe is not ready? 

> I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> that you can convince other developers to implement it in other drivers,
> so we could finally enable it in the default config.

Ok, any insights then, how we can convince people to do that?
BTW,  it means then that tx_prep() should become part of mandatory API
to be implemented by each PMD doing TX offloads, right?   

> 
> Oh, and I don't trust that nobody were thinking that it would break testpmd
> for non-Intel drivers.

Well, believe it or not, but yes, I missed that one.
I think I already admitted that it was my fault, and apologized for that.
But sure, it is your choice to trust me here or not.
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 12:59                                       ` Ananyev, Konstantin
@ 2016-10-28 13:42                                         ` Thomas Monjalon
  2016-11-01 12:57                                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-10-28 13:42 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-10-28 12:59, Ananyev, Konstantin:
> > 2016-10-28 11:34, Ananyev, Konstantin:
> > > > > 2016-10-27 16:24, Ananyev, Konstantin:
> > > > > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > > > > 2016-10-27 15:52, Ananyev, Konstantin:
> > > > > > > > > 2016-10-26 14:56, Tomasz Kulasek:
> > > > > > > > > > --- a/config/common_base
> > > > > > > > > > +++ b/config/common_base
> > > > > > > > > > +CONFIG_RTE_ETHDEV_TX_PREP=y
> > > > > > > > >
> > > > > > > > > We cannot enable it until it is implemented in every drivers.
> > > > > > > >
> > > > > > > > Not sure why?
> > > > > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > > > > Right now it is not mandatory for the PMD to implement it.
> > > > > > >
> > > > > > > If it is not implemented, the application must do the preparation by itself.
> > > > > > > From patch 6:
> > > > > > > "
> > > > > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > > > > application and used Tx preparation API for packet preparation and
> > > > > > > verification.
> > > > > > > "
> > > > > > > So how does it behave with other drivers?
> > > > > >
> > > > > > Hmm so it seems that we broke testpmd csumonly mode for non-intel drivers..
> > > > > > My bad, missed that part completely.
> > > > > > Yes, then I suppose for now we'll need to support both (with and without) code paths for testpmd.
> > > > > > Probably a new fwd mode or just extra parameter for the existing one?
> > > > > > Any other suggestions?
> > > > >
> > > > > Please think how we can use it in every applications.
> > > > > It is not ready.
> > > > > Either we introduce the API without enabling it, or we implement it
> > > > > in every drivers.
> > > >
> > > > I understand your position here, but just like to point that:
> > > > 1) It is a new functionality optional to use.
> > > >      The app is free not to use that functionality and still do the preparation itself
> > > >      (as it has to do it now).
> > > >     All existing apps would keep working as expected without using that function.
> > > >     Though if the app developer knows that for all HW models he plans to run on
> > > >     tx_prep is implemented - he is free to use it.
> > > >     2) It would be difficult for Tomasz (and other Intel guys) to implement tx_prep()
> > > >      for all non-Intel HW that DPDK supports right now.
> > > >      We just don't have all the actual HW in stock and probably adequate knowledge of it.
> > > >     So we depend here on the good will of other PMD mainaners/developers to implement
> > > >     tx_prep() for these devices.
> > > >     From other side, if it will be disabled by default, then, I think,
> > > >     PMD developers just wouldn't be motivated to implement it.
> > > >     So it will be left untested and unused forever.
> > >
> > > Actually as another thought:
> > > Can we have it enabled by default, but mark it as experimental or so?
> > > If memory serves me right, we've done that for cryptodev in the past, no?
> > 
> > Cryptodev was a whole new library.
> > We won't play the game "find which function is experimental or not".
> > 
> > We should not enable a function until it is fully implemented.
> > 
> > If the user really understands that it will work only with few drivers
> > then he can change the build configuration himself.
> > Enabling in the default configuration is a message to say that it works
> > everywhere without any risk.
> > It's so simple that I don't even understand why I must argue for.
> > 
> > And by the way, it is late for 16.11.
> 
> Ok, I understand your concern about enabling it by default and testpmd breakage,
> but what else you believe is not ready?

That's already a lot!
I commented also about function naming.
All these things are trivial to fix.
But it is late. After RC1, we should stop integrating new features.

> > I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> > that you can convince other developers to implement it in other drivers,
> > so we could finally enable it in the default config.
> 
> Ok, any insights then, how we can convince people to do that?

You just have to explain clearly what this new feature is bringing
and what will be the future possibilities.

> BTW,  it means then that tx_prep() should become part of mandatory API
> to be implemented by each PMD doing TX offloads, right?

Right.
The question is "what means mandatory"?
Should we block some patches for non-compliant drivers?
Should we remove offloads capability from non-compliant drivers?

> > Oh, and I don't trust that nobody were thinking that it would break testpmd
> > for non-Intel drivers.
> 
> Well, believe it or not, but yes, I missed that one.
> I think I already admitted that it was my fault, and apologized for that.

And it's my fault not having seen that before.
I was hoping that good reviews would be done by other contributors.

> But sure, it is your choice to trust me here or not.

Konstantin I trust you.
However if nobody else was reviewing this patchset at Intel,
this is probably an issue.
And more importantly we must encourage other vendors to review such
major patch for the ethdev API.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-10-28 13:42                                         ` Thomas Monjalon
@ 2016-11-01 12:57                                           ` Ananyev, Konstantin
  2016-11-04 11:35                                             ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-11-01 12:57 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev


Hi Thomas,
 
> > > I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> > > that you can convince other developers to implement it in other drivers,
> > > so we could finally enable it in the default config.
> >
> > Ok, any insights then, how we can convince people to do that?
> 
> You just have to explain clearly what this new feature is bringing
> and what will be the future possibilities.
> 
> > BTW,  it means then that tx_prep() should become part of mandatory API
> > to be implemented by each PMD doing TX offloads, right?
> 
> Right.
> The question is "what means mandatory"?

For me "mandatory" here would mean that:
 - if the PMD supports TX offloads AND
 - if to be able use any of these offloads the upper layer SW would have to:
	- modify the contents of the packet OR
	- obey HW specific restrictions
 then it is a PMD developer responsibility to provide tx_prep() that would implement
 expected modifications of the packet contents and restriction checks.
Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      

Does it sounds good enough to everyone?

> Should we block some patches for non-compliant drivers?

If we agree that it should be a 'mandatory' one - and patch in question breaks
that requirement, then probably yes.

> Should we remove offloads capability from non-compliant drivers?

Do you mean existing PMDs?
Are there any particular right now, that can't work properly with testpmd csumonly mode?

Konstantin


 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation
  2016-11-01 12:57                                           ` Ananyev, Konstantin
@ 2016-11-04 11:35                                             ` Thomas Monjalon
  0 siblings, 0 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-11-04 11:35 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-11-01 12:57, Ananyev, Konstantin:
> > > > I suggest to integrate it in the beginning of 17.02 cycle, with the hope
> > > > that you can convince other developers to implement it in other drivers,
> > > > so we could finally enable it in the default config.
> > >
> > > Ok, any insights then, how we can convince people to do that?
> > 
> > You just have to explain clearly what this new feature is bringing
> > and what will be the future possibilities.
> > 
> > > BTW,  it means then that tx_prep() should become part of mandatory API
> > > to be implemented by each PMD doing TX offloads, right?
> > 
> > Right.
> > The question is "what means mandatory"?
> 
> For me "mandatory" here would mean that:
>  - if the PMD supports TX offloads AND
>  - if to be able use any of these offloads the upper layer SW would have to:
> 	- modify the contents of the packet OR
> 	- obey HW specific restrictions
>  then it is a PMD developer responsibility to provide tx_prep() that would implement
>  expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
> 
> Does it sounds good enough to everyone?

Yes, good definition, thanks.

> > Should we block some patches for non-compliant drivers?
> 
> If we agree that it should be a 'mandatory' one - and patch in question breaks
> that requirement, then probably yes.
> 
> > Should we remove offloads capability from non-compliant drivers?
> 
> Do you mean existing PMDs?
> Are there any particular right now, that can't work properly with testpmd csumonly mode?

I cannot answer to this question.
Before txprep, there is only one API: the application must prepare the
packets checksum itself (get_psd_sum in testpmd).
With txprep, the application have 2 choices: keep doing the job itself
or call txprep which calls a PMD-specific function.
The question is: does non-Intel drivers need a checksum preparation for TSO?
Will it behave well if txprep does nothing in these drivers?

When looking at the code, most of drivers handle the TSO flags.
But it is hard to know whether they rely on the pseudo checksum or not.

git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/

drivers/net/bnxt/bnxt_txr.c
drivers/net/cxgbe/sge.c
drivers/net/e1000/em_rxtx.c
drivers/net/e1000/igb_rxtx.c
drivers/net/ena/ena_ethdev.c
drivers/net/enic/enic_rxtx.c
drivers/net/fm10k/fm10k_rxtx.c
drivers/net/i40e/i40e_rxtx.c
drivers/net/ixgbe/ixgbe_rxtx.c
drivers/net/mlx4/mlx4.c
drivers/net/mlx5/mlx5_rxtx.c
drivers/net/nfp/nfp_net.c
drivers/net/qede/qede_rxtx.c
drivers/net/thunderx/nicvf_rxtx.c
drivers/net/virtio/virtio_rxtx.c
drivers/net/vmxnet3/vmxnet3_rxtx.c

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
                                       ` (5 preceding siblings ...)
  2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-11-23 17:36                     ` Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
                                         ` (7 more replies)
  6 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prepare() function to do necessary preparations
   of packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prepare" which can be implemented in the driver to prepare
   and verify packets, in devices specific way, before burst, what
   should to prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prepare(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("Tx prepare failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

	
v12 changes:
 - renamed API function from "rte_eth_tx_prep" to "rte_eth_tx_prepare"
   (to be not confused with "prepend")
 - changed "rte_phdr_cksum_fix" to "rte_net_intel_cksum_prepare"
 - added "csum txprep (on|off)" command to the csum engine allowing to
   select txprep path for packet processing

v11 changed:
 - updated comments
 - added information to the API description about packet data
   requirements/limitations.

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
 - added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/cmdline.c           |   49 ++++++++++++++++++
 app/test-pmd/csumonly.c          |   33 +++++++++---
 app/test-pmd/testpmd.c           |    5 ++
 app/test-pmd/testpmd.h           |    2 +
 config/common_base               |    1 +
 drivers/net/e1000/e1000_ethdev.h |   11 ++++
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 ++
 drivers/net/e1000/igb_rxtx.c     |   52 ++++++++++++++++++-
 drivers/net/fm10k/fm10k.h        |    6 +++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 ++
 drivers/net/i40e/i40e_rxtx.c     |   72 +++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h     |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   56 ++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |  106 ++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h       |   64 +++++++++++++++++++++++
 lib/librte_net/rte_net.h         |   85 ++++++++++++++++++++++++++++++
 23 files changed, 662 insertions(+), 13 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-11-28 10:54                         ` Thomas Monjalon
                                           ` (2 more replies)
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 2/6] e1000: " Tomasz Kulasek
                                         ` (6 subsequent siblings)
  7 siblings, 3 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Added API for `rte_eth_tx_prepare`

uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_net_intel_cksum_prepare(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base            |    1 +
 lib/librte_ether/rte_ethdev.h |  106 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   85 +++++++++++++++++++++++++++++++++
 4 files changed, 256 insertions(+)

diff --git a/config/common_base b/config/common_base
index 4bff83a..d609a88 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREPARE=y
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..4ffc1b3 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -702,6 +703,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1191,6 +1194,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1625,6 +1633,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2819,6 +2828,103 @@ int rte_eth_dev_set_vlan_ether_type(uint8_t port_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prepare() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * Since this function can modify packet data, provided mbufs must be safely
+ * writable (e.g. modified data cannot be in shared segment).
+ *
+ * The rte_eth_tx_prepare() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent, otherwise stops processing on the first invalid packet and
+ * leaves the rest packets untouched.
+ *
+ * When this functionality is not implemented in the driver, all packets are
+ * are returned untouched.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately:
+ *   - -EINVAL: offload flags are not correctly set
+ *   - -ENOTSUP: the offload feature is not supported by the hardware
+ *
+ */
+
+#ifdef RTE_ETHDEV_TX_PREPARE
+
+static inline uint16_t
+rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prepare)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+static inline uint16_t
+rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..39ee5ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..85f356d 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,86 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_prepare(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 2/6] e1000: add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 3/6] fm10k: " Tomasz Kulasek
                                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ int eth_em_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index aee3d34..a004ee9 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prepare = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1079,6 +1080,8 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..7e271ad 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ struct em_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 2fddf0c..015ef46 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..8a3a3db 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ struct igb_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ struct igb_tx_queue {
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 3/6] fm10k: add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 2/6] e1000: " Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 4/6] i40e: " Tomasz Kulasek
                                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 923690c..a116822 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1447,6 +1447,8 @@ static int fm10k_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2755,8 +2757,10 @@ static void __attribute__((cold))
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prepare = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2835,6 +2839,7 @@ static void __attribute__((cold))
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prepare = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..144e5e6 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, struct rte_mbuf *mb)
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 4/6] i40e: add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
                                         ` (2 preceding siblings ...)
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 3/6] fm10k: " Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 5/6] ixgbe: " Tomasz Kulasek
                                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   72 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 67778ba..5761357 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -943,6 +943,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prepare = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2645,6 +2646,8 @@ static int i40e_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..5827f2f 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,61 @@ static inline int __attribute__((always_inline))
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered malicious */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2831,11 @@ void __attribute__((cold))
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prepare = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 5/6] ixgbe: add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
                                         ` (3 preceding siblings ...)
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 4/6] i40e: " Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
                                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   56 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index edc9b22..a75f59d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ struct rte_ixgbe_xstats_name_off {
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b2d9f45..dbe83e7 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ static inline int __attribute__((always_inline))
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ void __attribute__((cold))
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prepare = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ void __attribute__((cold))
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prepare = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
                                         ` (4 preceding siblings ...)
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 5/6] ixgbe: " Tomasz Kulasek
@ 2016-11-23 17:36                       ` Tomasz Kulasek
  2016-12-07 11:13                         ` Ferruh Yigit
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
  7 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-11-23 17:36 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, olivier.matz

Added "csum txprep (on|off)" command which allows to switch to the
tx path using Tx preparation API.

By default unchanged implementation is used.

Using Tx preparation path, pseudo header calculation for udp/tcp/tso
packets from application, and used Tx preparation API for
packet preparation and verification.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/cmdline.c  |   49 +++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/csumonly.c |   33 ++++++++++++++++++++++++-------
 app/test-pmd/testpmd.c  |    5 +++++
 app/test-pmd/testpmd.h  |    2 ++
 4 files changed, 82 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 63b55dc..373fc59 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -366,6 +366,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"csum show (port_id)\n"
 			"    Display tx checksum offload configuration\n\n"
 
+			"csum txprep (on|off)"
+			"    Enable tx preparation path in csum forward engine"
+			"\n\n"
+
 			"tso set (segsize) (portid)\n"
 			"    Enable TCP Segmentation Offload in csum forward"
 			" engine.\n"
@@ -3523,6 +3527,50 @@ struct cmd_csum_tunnel_result {
 	},
 };
 
+/* Enable/disable tx preparation path */
+struct cmd_csum_txprep_result {
+	cmdline_fixed_string_t csum;
+	cmdline_fixed_string_t parse;
+	cmdline_fixed_string_t onoff;
+};
+
+static void
+cmd_csum_txprep_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_csum_txprep_result *res = parsed_result;
+
+	if (!strcmp(res->onoff, "on"))
+		tx_prepare = 1;
+	else
+		tx_prepare = 0;
+
+}
+
+cmdline_parse_token_string_t cmd_csum_txprep_csum =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				csum, "csum");
+cmdline_parse_token_string_t cmd_csum_txprep_parse =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				parse, "txprep");
+cmdline_parse_token_string_t cmd_csum_txprep_onoff =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				onoff, "on#off");
+
+cmdline_parse_inst_t cmd_csum_txprep = {
+	.f = cmd_csum_txprep_parsed,
+	.data = NULL,
+	.help_str = "enable/disable tx preparation path for csum engine: "
+	"csum txprep on|off",
+	.tokens = {
+		(void *)&cmd_csum_txprep_csum,
+		(void *)&cmd_csum_txprep_parse,
+		(void *)&cmd_csum_txprep_onoff,
+		NULL,
+	},
+};
+
 /* *** ENABLE HARDWARE SEGMENTATION IN TX NON-TUNNELED PACKETS *** */
 struct cmd_tso_set_result {
 	cmdline_fixed_string_t tso;
@@ -11470,6 +11518,7 @@ struct cmd_set_vf_mac_addr_result {
 	(cmdline_parse_inst_t *)&cmd_csum_set,
 	(cmdline_parse_inst_t *)&cmd_csum_show,
 	(cmdline_parse_inst_t *)&cmd_csum_tunnel,
+	(cmdline_parse_inst_t *)&cmd_csum_txprep,
 	(cmdline_parse_inst_t *)&cmd_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..3afa9ab 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -372,8 +372,10 @@ struct simple_gre_hdr {
 			udp_hdr->dgram_cksum = 0;
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
+				if (!tx_prepare)
+					udp_hdr->dgram_cksum = get_psd_sum(
+							l3_hdr, info->ethertype,
+							ol_flags);
 			} else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
@@ -385,12 +387,15 @@ struct simple_gre_hdr {
 		tcp_hdr->cksum = 0;
 		if (tso_segsz) {
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
+			if (!tx_prepare)
+				tcp_hdr->cksum = get_psd_sum(l3_hdr,
+						info->ethertype, ol_flags);
+
 		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
+			if (!tx_prepare)
+				tcp_hdr->cksum = get_psd_sum(l3_hdr,
+						info->ethertype, ol_flags);
 		} else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
@@ -648,6 +653,7 @@ struct simple_gre_hdr {
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +863,20 @@ struct simple_gre_hdr {
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	if (tx_prepare) {
+		nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue,
+				pkts_burst, nb_rx);
+		if (nb_prep != nb_rx)
+			printf("Preparing packet burst to transmit failed: %s\n",
+					rte_strerror(rte_errno));
+
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_prep);
+	} else
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+
 	/*
 	 * Retry if necessary
 	 */
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c2..c18bc28 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -180,6 +180,11 @@ struct fwd_engine * fwd_engines[] = {
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+/*
+ * Enable Tx preparation path in the "csum" engine.
+ */
+uint8_t tx_prepare = 0;
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9c1e703..488a6e1 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t tx_prepare;
+
 extern uint16_t nb_pkt_per_burst;
 extern uint16_t mb_mempool_cache;
 extern int8_t rx_pthresh;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
@ 2016-11-28 10:54                         ` Thomas Monjalon
  2016-12-01 16:24                           ` Thomas Monjalon
  2016-12-01 16:26                         ` Thomas Monjalon
  2016-12-01 16:28                         ` Thomas Monjalon
  2 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-11-28 10:54 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev, olivier.matz, bruce.richardson

Hi,

2016-11-23 18:36, Tomasz Kulasek:
> --- a/config/common_base
> +++ b/config/common_base
> @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
>  CONFIG_RTE_LIBRTE_IEEE1588=n
>  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
>  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> +CONFIG_RTE_ETHDEV_TX_PREPARE=y

Please, remind me why is there a configuration here.
It should be the responsibility of the application to call tx_prepare
or not. If the application choose to use this new API but it is
disabled, then the packets won't be prepared and there is no error code:

> +#else
> +
> +static inline uint16_t
> +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
> +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> +{
> +       return nb_pkts;
> +}
> +
> +#endif

So the application is not aware of the issue and it will not use
any fallback.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
                                         ` (5 preceding siblings ...)
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-11-28 11:03                       ` Thomas Monjalon
  2016-11-30  5:48                         ` John Daley (johndale)
                                           ` (5 more replies)
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
  7 siblings, 6 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-11-28 11:03 UTC (permalink / raw)
  To: dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala, Jakub Palider,
	John Daley, Adrien Mazarguil, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang
  Cc: Tomasz Kulasek, konstantin.ananyev, olivier.matz

We need attention of every PMD developers on this thread.

Reminder of what Konstantin suggested:
"
- if the PMD supports TX offloads AND
- if to be able use any of these offloads the upper layer SW would have to:
    * modify the contents of the packet OR
    * obey HW specific restrictions
then it is a PMD developer responsibility to provide tx_prep() that would implement
expected modifications of the packet contents and restriction checks.
Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
"

I copy/paste also my previous conclusion:

Before txprep, there is only one API: the application must prepare the
packets checksum itself (get_psd_sum in testpmd).
With txprep, the application have 2 choices: keep doing the job itself
or call txprep which calls a PMD-specific function.
The question is: does non-Intel drivers need a checksum preparation for TSO?
Will it behave well if txprep does nothing in these drivers?

When looking at the code, most of drivers handle the TSO flags.
But it is hard to know whether they rely on the pseudo checksum or not.

git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/

drivers/net/bnxt/bnxt_txr.c
drivers/net/cxgbe/sge.c
drivers/net/e1000/em_rxtx.c
drivers/net/e1000/igb_rxtx.c
drivers/net/ena/ena_ethdev.c
drivers/net/enic/enic_rxtx.c
drivers/net/fm10k/fm10k_rxtx.c
drivers/net/i40e/i40e_rxtx.c
drivers/net/ixgbe/ixgbe_rxtx.c
drivers/net/mlx4/mlx4.c
drivers/net/mlx5/mlx5_rxtx.c
drivers/net/nfp/nfp_net.c
drivers/net/qede/qede_rxtx.c
drivers/net/thunderx/nicvf_rxtx.c
drivers/net/virtio/virtio_rxtx.c
drivers/net/vmxnet3/vmxnet3_rxtx.c

Please, we need a comment for each driver saying
"it is OK, we do not need any checksum preparation for TSO"
or
"yes we have to implement tx_prepare or TSO will not work in this mode"

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
@ 2016-11-30  5:48                         ` John Daley (johndale)
  2016-11-30 10:59                           ` Ananyev, Konstantin
  2016-11-30  7:40                         ` Adrien Mazarguil
                                           ` (4 subsequent siblings)
  5 siblings, 1 reply; 261+ messages in thread
From: John Daley (johndale) @ 2016-11-30  5:48 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, Adrien Mazarguil, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang
  Cc: Tomasz Kulasek, konstantin.ananyev, olivier.matz

Hi,
-john

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, November 28, 2016 3:03 AM
> To: dev@dpdk.org; Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>;
> Stephen Hurd <stephen.hurd@broadcom.com>; Jan Medala
> <jan@semihalf.com>; Jakub Palider <jpa@semihalf.com>; John Daley
> (johndale) <johndale@cisco.com>; Adrien Mazarguil
> <adrien.mazarguil@6wind.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Harish Patil
> <harish.patil@qlogic.com>; Rasesh Mody <rasesh.mody@qlogic.com>; Jerin
> Jacob <jerin.jacob@caviumnetworks.com>; Yuanhan Liu
> <yuanhan.liu@linux.intel.com>; Yong Wang <yongwang@vmware.com>
> Cc: Tomasz Kulasek <tomaszx.kulasek@intel.com>;
> konstantin.ananyev@intel.com; olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> We need attention of every PMD developers on this thread.
> 
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would
> implement expected modifications of the packet contents and restriction
> checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to
> NULL.
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself or call
> txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for
> TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying "it is OK, we do not need
> any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

I like the idea of tx prep since it should make for cleaner apps.

For enic, I believe the answer is " it is OK, we do not need any checksum preparation".

Prior to now, it was necessary to set IP checksum to 0 and put in a TCP/UDP pseudo header. But there is a hardware overwrite of checksums option which makes preparation in software unnecessary and it is testing out well so far. I plan to enable it in 17.02. TSO is also being enabled for 17.02 and it does not look like any prep is required. So I'm going with " txprep NULL pointer is OK for enic", but may have to change my mind if something comes up in testing.

-john

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
  2016-11-30  5:48                         ` John Daley (johndale)
@ 2016-11-30  7:40                         ` Adrien Mazarguil
  2016-11-30  8:50                           ` Thomas Monjalon
  2016-11-30 10:54                           ` Ananyev, Konstantin
  2016-11-30 16:34                         ` Harish Patil
                                           ` (3 subsequent siblings)
  5 siblings, 2 replies; 261+ messages in thread
From: Adrien Mazarguil @ 2016-11-30  7:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala, Jakub Palider,
	John Daley, Alejandro Lucero, Harish Patil, Rasesh Mody,
	Jerin Jacob, Yuanhan Liu, Yong Wang, Tomasz Kulasek,
	konstantin.ananyev, olivier.matz

On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> We need attention of every PMD developers on this thread.

I've been following this thread from the beginning while working on rte_flow
and wanted to see where it was headed before replying. (I know, v11 was
submitted about 1 month ago but still.)

> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.

Something is definitely needed here, and only PMDs can provide it. I think
applications should not have to clear checksum fields or initialize them to
some magic value, same goes for any other offload or hardware limitation
that needs to be worked around.

tx_prep() is one possible answer to this issue, however as mentioned in the
original patch it can be very expensive if exposed by the PMD.

Another issue I'm more concerned about is the way limitations are managed
(struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
structure contains new fields that are only relevant to a few devices, and I
fear it will keep growing with each new hardware quirk to manage, breaking
ABIs in the process.

What are applications supposed to do, check each of them regardless before
attempting to send a burst?

I understand tx_prep() automates this process, however I'm wondering why
isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
example, tx_prep() has an extra check in case of TSO that the TX burst
function does not perform. This ends up being much more expensive to
applications due to the additional loop doing redundant testing on each
mbuf.

If, say as a performance improvement, we decided to leave the validation
part to the TX burst function; what remains in tx_prep() is basically heavy
"preparation" requiring mbuf changes (i.e. erasing checksums, for now).

Following the same logic, why can't such a thing be made part of the TX
burst function as well (through a direct call to rte_phdr_cksum_fix()
whenever necessary). From an application standpoint, what are the advantages
of having to:

 if (tx_prep()) // iterate and update mbufs as needed
     tx_burst(); // iterate and send

Compared to:

 tx_burst(); // iterate, update as needed and send

Note that PMDs could still provide different TX callbacks depending on the
set of enabled offloads so performance is not unnecessarily impacted.

In my opinion the second approach is both faster to applications and more
friendly from a usability perspective, am I missing something obvious?

> The question is: does non-Intel drivers need a checksum preparation for TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

For both mlx4 and mlx5 then,
"it is OK, we do not need any checksum preparation for TSO".

Actually I do not think we'll ever need tx_prep() unless we add our own
quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
handled by TX burst functions.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30  7:40                         ` Adrien Mazarguil
@ 2016-11-30  8:50                           ` Thomas Monjalon
  2016-11-30 10:30                             ` Kulasek, TomaszX
  2016-11-30 10:54                           ` Ananyev, Konstantin
  1 sibling, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-11-30  8:50 UTC (permalink / raw)
  To: Adrien Mazarguil, Tomasz Kulasek; +Cc: dev, konstantin.ananyev, olivier.matz

2016-11-30 08:40, Adrien Mazarguil:
[...]
> I understand tx_prep() automates this process, however I'm wondering why
> isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
> example, tx_prep() has an extra check in case of TSO that the TX burst
> function does not perform. This ends up being much more expensive to
> applications due to the additional loop doing redundant testing on each
> mbuf.
> 
> If, say as a performance improvement, we decided to leave the validation
> part to the TX burst function; what remains in tx_prep() is basically heavy
> "preparation" requiring mbuf changes (i.e. erasing checksums, for now).
> 
> Following the same logic, why can't such a thing be made part of the TX
> burst function as well (through a direct call to rte_phdr_cksum_fix()
> whenever necessary). From an application standpoint, what are the advantages
> of having to:
> 
>  if (tx_prep()) // iterate and update mbufs as needed
>      tx_burst(); // iterate and send
> 
> Compared to:
> 
>  tx_burst(); // iterate, update as needed and send
> 
> Note that PMDs could still provide different TX callbacks depending on the
> set of enabled offloads so performance is not unnecessarily impacted.
> 
> In my opinion the second approach is both faster to applications and more
> friendly from a usability perspective, am I missing something obvious?

I think it was not clearly explained in this patchset, but this is
my understanding:
tx_prepare and tx_burst can be called at different stages of a pipeline,
on different cores.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30  8:50                           ` Thomas Monjalon
@ 2016-11-30 10:30                             ` Kulasek, TomaszX
  2016-12-01  7:19                               ` Adrien Mazarguil
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-11-30 10:30 UTC (permalink / raw)
  To: Thomas Monjalon, Adrien Mazarguil; +Cc: dev, Ananyev, Konstantin, olivier.matz

Hi,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, November 30, 2016 09:50
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 2016-11-30 08:40, Adrien Mazarguil:
> [...]
> > I understand tx_prep() automates this process, however I'm wondering
> > why isn't the TX burst function doing that itself. Using
> > nb_mtu_seg_max as an example, tx_prep() has an extra check in case of
> > TSO that the TX burst function does not perform. This ends up being
> > much more expensive to applications due to the additional loop doing
> > redundant testing on each mbuf.
> >
> > If, say as a performance improvement, we decided to leave the
> > validation part to the TX burst function; what remains in tx_prep() is
> > basically heavy "preparation" requiring mbuf changes (i.e. erasing
> checksums, for now).
> >
> > Following the same logic, why can't such a thing be made part of the
> > TX burst function as well (through a direct call to
> > rte_phdr_cksum_fix() whenever necessary). From an application
> > standpoint, what are the advantages of having to:
> >
> >  if (tx_prep()) // iterate and update mbufs as needed
> >      tx_burst(); // iterate and send
> >
> > Compared to:
> >
> >  tx_burst(); // iterate, update as needed and send
> >
> > Note that PMDs could still provide different TX callbacks depending on
> > the set of enabled offloads so performance is not unnecessarily
> impacted.
> >
> > In my opinion the second approach is both faster to applications and
> > more friendly from a usability perspective, am I missing something
> obvious?
> 
> I think it was not clearly explained in this patchset, but this is my
> understanding:
> tx_prepare and tx_burst can be called at different stages of a pipeline,
> on different cores.

Yes, this API is intended to be used optionaly, not only just before tx_burst.

1. Separating both stages:
   a) We may have a control over burst (packet content, validation) when needed.
   b) For invalid packets we may restore them or do some another task if needed (even on early stage of processing).
   c) Tx burst keep as simple as it should be.

2. Joining the functionality of tx_prepare and tx_burst have some disadvantages:
   a) When packet is invalid it cannot be restored by application should be dropped.
   b) Tx burst needs to modify the content of the packet.
   c) We have no way to eliminate overhead of preparation (tx_prepare) for the application where performance is a key.

3. Using tx callbacks
   a) We still need to have different implementations for different devices.
   b) The overhead in performance (comparing to the pair tx_prepare/tx_burst) will not be better while both ways uses very similar mechanism.

In addition, tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30  7:40                         ` Adrien Mazarguil
  2016-11-30  8:50                           ` Thomas Monjalon
@ 2016-11-30 10:54                           ` Ananyev, Konstantin
  2016-12-01  7:15                             ` Adrien Mazarguil
  1 sibling, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-11-30 10:54 UTC (permalink / raw)
  To: Adrien Mazarguil, Thomas Monjalon
  Cc: dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala, Jakub Palider,
	John Daley, Alejandro Lucero, Harish Patil, Rasesh Mody,
	Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek, TomaszX,
	olivier.matz

Hi Adrien,

> 
> On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> > We need attention of every PMD developers on this thread.
> 
> I've been following this thread from the beginning while working on rte_flow
> and wanted to see where it was headed before replying. (I know, v11 was
> submitted about 1 month ago but still.)
> 
> > Reminder of what Konstantin suggested:
> > "
> > - if the PMD supports TX offloads AND
> > - if to be able use any of these offloads the upper layer SW would have to:
> >     * modify the contents of the packet OR
> >     * obey HW specific restrictions
> > then it is a PMD developer responsibility to provide tx_prep() that would implement
> > expected modifications of the packet contents and restriction checks.
> > Otherwise, tx_prep() implementation is not required and can be safely set to NULL.
> > "
> >
> > I copy/paste also my previous conclusion:
> >
> > Before txprep, there is only one API: the application must prepare the
> > packets checksum itself (get_psd_sum in testpmd).
> > With txprep, the application have 2 choices: keep doing the job itself
> > or call txprep which calls a PMD-specific function.
> 
> Something is definitely needed here, and only PMDs can provide it. I think
> applications should not have to clear checksum fields or initialize them to
> some magic value, same goes for any other offload or hardware limitation
> that needs to be worked around.
> 
> tx_prep() is one possible answer to this issue, however as mentioned in the
> original patch it can be very expensive if exposed by the PMD.
> 
> Another issue I'm more concerned about is the way limitations are managed
> (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> structure contains new fields that are only relevant to a few devices, and I
> fear it will keep growing with each new hardware quirk to manage, breaking
> ABIs in the process.

Well, if some new HW capability/limitation would arise and we'd like to support
it in DPDK, then yes we probably would need to think how to incorporate it here.
Do you have anything particular in mind here?

> 
> What are applications supposed to do, check each of them regardless before
> attempting to send a burst?
> 
> I understand tx_prep() automates this process, however I'm wondering why
> isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
> example, tx_prep() has an extra check in case of TSO that the TX burst
> function does not perform. This ends up being much more expensive to
> applications due to the additional loop doing redundant testing on each
> mbuf.
> 
> If, say as a performance improvement, we decided to leave the validation
> part to the TX burst function; what remains in tx_prep() is basically heavy
> "preparation" requiring mbuf changes (i.e. erasing checksums, for now).
> 
> Following the same logic, why can't such a thing be made part of the TX
> burst function as well (through a direct call to rte_phdr_cksum_fix()
> whenever necessary). From an application standpoint, what are the advantages
> of having to:
> 
>  if (tx_prep()) // iterate and update mbufs as needed
>      tx_burst(); // iterate and send
> 
> Compared to:
> 
>  tx_burst(); // iterate, update as needed and send

I think that was discussed extensively quite a lot previously here:
As Thomas already replied - main motivation is to allow user
to execute them on different stages of packet TX pipeline,
and probably on different cores.
I think that provides better flexibility to the user to when/where
do these preparations and hopefully would lead to better performance.

Though, if you or any other PMD developer/maintainer would prefer
for particular PMD to combine both functionalities into tx_burst() and
keep tx_prep() as NOP - this is still possible too.  

> 
> Note that PMDs could still provide different TX callbacks depending on the
> set of enabled offloads so performance is not unnecessarily impacted.
> 
> In my opinion the second approach is both faster to applications and more
> friendly from a usability perspective, am I missing something obvious?
> 
> > The question is: does non-Intel drivers need a checksum preparation for TSO?
> > Will it behave well if txprep does nothing in these drivers?
> >
> > When looking at the code, most of drivers handle the TSO flags.
> > But it is hard to know whether they rely on the pseudo checksum or not.
> >
> > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> >
> > drivers/net/bnxt/bnxt_txr.c
> > drivers/net/cxgbe/sge.c
> > drivers/net/e1000/em_rxtx.c
> > drivers/net/e1000/igb_rxtx.c
> > drivers/net/ena/ena_ethdev.c
> > drivers/net/enic/enic_rxtx.c
> > drivers/net/fm10k/fm10k_rxtx.c
> > drivers/net/i40e/i40e_rxtx.c
> > drivers/net/ixgbe/ixgbe_rxtx.c
> > drivers/net/mlx4/mlx4.c
> > drivers/net/mlx5/mlx5_rxtx.c
> > drivers/net/nfp/nfp_net.c
> > drivers/net/qede/qede_rxtx.c
> > drivers/net/thunderx/nicvf_rxtx.c
> > drivers/net/virtio/virtio_rxtx.c
> > drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> > Please, we need a comment for each driver saying
> > "it is OK, we do not need any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
> 
> For both mlx4 and mlx5 then,
> "it is OK, we do not need any checksum preparation for TSO".
> 
> Actually I do not think we'll ever need tx_prep() unless we add our own
> quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
> handled by TX burst functions.

Ok, so MLX PMD is not affected by these changes and tx_prep for MLX can be safely
set to NULL, correct?

Thanks
Konstantin

> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30  5:48                         ` John Daley (johndale)
@ 2016-11-30 10:59                           ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-11-30 10:59 UTC (permalink / raw)
  To: John Daley (johndale),
	Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, Adrien Mazarguil, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang
  Cc: Kulasek, TomaszX, olivier.matz

Hi John,

> 
> Hi,
> -john
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Monday, November 28, 2016 3:03 AM
> > To: dev@dpdk.org; Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>;
> > Stephen Hurd <stephen.hurd@broadcom.com>; Jan Medala
> > <jan@semihalf.com>; Jakub Palider <jpa@semihalf.com>; John Daley
> > (johndale) <johndale@cisco.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>; Alejandro Lucero
> > <alejandro.lucero@netronome.com>; Harish Patil
> > <harish.patil@qlogic.com>; Rasesh Mody <rasesh.mody@qlogic.com>; Jerin
> > Jacob <jerin.jacob@caviumnetworks.com>; Yuanhan Liu
> > <yuanhan.liu@linux.intel.com>; Yong Wang <yongwang@vmware.com>
> > Cc: Tomasz Kulasek <tomaszx.kulasek@intel.com>;
> > konstantin.ananyev@intel.com; olivier.matz@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> >
> > We need attention of every PMD developers on this thread.
> >
> > Reminder of what Konstantin suggested:
> > "
> > - if the PMD supports TX offloads AND
> > - if to be able use any of these offloads the upper layer SW would have to:
> >     * modify the contents of the packet OR
> >     * obey HW specific restrictions
> > then it is a PMD developer responsibility to provide tx_prep() that would
> > implement expected modifications of the packet contents and restriction
> > checks.
> > Otherwise, tx_prep() implementation is not required and can be safely set to
> > NULL.
> > "
> >
> > I copy/paste also my previous conclusion:
> >
> > Before txprep, there is only one API: the application must prepare the
> > packets checksum itself (get_psd_sum in testpmd).
> > With txprep, the application have 2 choices: keep doing the job itself or call
> > txprep which calls a PMD-specific function.
> > The question is: does non-Intel drivers need a checksum preparation for
> > TSO?
> > Will it behave well if txprep does nothing in these drivers?
> >
> > When looking at the code, most of drivers handle the TSO flags.
> > But it is hard to know whether they rely on the pseudo checksum or not.
> >
> > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> > drivers/net/
> >
> > drivers/net/bnxt/bnxt_txr.c
> > drivers/net/cxgbe/sge.c
> > drivers/net/e1000/em_rxtx.c
> > drivers/net/e1000/igb_rxtx.c
> > drivers/net/ena/ena_ethdev.c
> > drivers/net/enic/enic_rxtx.c
> > drivers/net/fm10k/fm10k_rxtx.c
> > drivers/net/i40e/i40e_rxtx.c
> > drivers/net/ixgbe/ixgbe_rxtx.c
> > drivers/net/mlx4/mlx4.c
> > drivers/net/mlx5/mlx5_rxtx.c
> > drivers/net/nfp/nfp_net.c
> > drivers/net/qede/qede_rxtx.c
> > drivers/net/thunderx/nicvf_rxtx.c
> > drivers/net/virtio/virtio_rxtx.c
> > drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> > Please, we need a comment for each driver saying "it is OK, we do not need
> > any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
> 
> I like the idea of tx prep since it should make for cleaner apps.
> 
> For enic, I believe the answer is " it is OK, we do not need any checksum preparation".
> 
> Prior to now, it was necessary to set IP checksum to 0 and put in a TCP/UDP pseudo header. But there is a hardware overwrite of
> checksums option which makes preparation in software unnecessary and it is testing out well so far. I plan to enable it in 17.02. TSO is also
> being enabled for 17.02 and it does not look like any prep is required. So I'm going with " txprep NULL pointer is OK for enic", but may have
> to change my mind if something comes up in testing.

That's great thanks.
Other non-Intel PMD maintainers, please any feedback? 
Konstantin

> 
> -john

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
  2016-11-30  5:48                         ` John Daley (johndale)
  2016-11-30  7:40                         ` Adrien Mazarguil
@ 2016-11-30 16:34                         ` Harish Patil
  2016-11-30 17:42                           ` Ananyev, Konstantin
  2016-11-30 19:37                         ` Ajit Khaparde
                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 261+ messages in thread
From: Harish Patil @ 2016-11-30 16:34 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Harish Patil, Rasesh Mody, Jacob,  Jerin, Yuanhan Liu, Yong Wang
  Cc: Tomasz Kulasek, konstantin.ananyev, olivier.matz



>We need attention of every PMD developers on this thread.
>
>Reminder of what Konstantin suggested:
>"
>- if the PMD supports TX offloads AND
>- if to be able use any of these offloads the upper layer SW would have
>to:
>    * modify the contents of the packet OR
>    * obey HW specific restrictions
>then it is a PMD developer responsibility to provide tx_prep() that would
>implement
>expected modifications of the packet contents and restriction checks.
>Otherwise, tx_prep() implementation is not required and can be safely set
>to NULL.      
>"
>
>I copy/paste also my previous conclusion:
>
>Before txprep, there is only one API: the application must prepare the
>packets checksum itself (get_psd_sum in testpmd).
>With txprep, the application have 2 choices: keep doing the job itself
>or call txprep which calls a PMD-specific function.
>The question is: does non-Intel drivers need a checksum preparation for
>TSO?
>Will it behave well if txprep does nothing in these drivers?
>
>When looking at the code, most of drivers handle the TSO flags.
>But it is hard to know whether they rely on the pseudo checksum or not.
>
>git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
>drivers/net/
>
>drivers/net/bnxt/bnxt_txr.c
>drivers/net/cxgbe/sge.c
>drivers/net/e1000/em_rxtx.c
>drivers/net/e1000/igb_rxtx.c
>drivers/net/ena/ena_ethdev.c
>drivers/net/enic/enic_rxtx.c
>drivers/net/fm10k/fm10k_rxtx.c
>drivers/net/i40e/i40e_rxtx.c
>drivers/net/ixgbe/ixgbe_rxtx.c
>drivers/net/mlx4/mlx4.c
>drivers/net/mlx5/mlx5_rxtx.c
>drivers/net/nfp/nfp_net.c
>drivers/net/qede/qede_rxtx.c
>drivers/net/thunderx/nicvf_rxtx.c
>drivers/net/virtio/virtio_rxtx.c
>drivers/net/vmxnet3/vmxnet3_rxtx.c
>
>Please, we need a comment for each driver saying
>"it is OK, we do not need any checksum preparation for TSO"
>or
>"yes we have to implement tx_prepare or TSO will not work in this mode"
>

qede PMD doesn’t currently support TSO yet, it only supports Tx TCP/UDP/IP
csum offloads.
So Tx preparation isn’t applicable. So as of now -
"it is OK, we do not need any checksum preparation for TSO"


Thanks,
Harish


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 16:34                         ` Harish Patil
@ 2016-11-30 17:42                           ` Ananyev, Konstantin
  2016-11-30 18:26                             ` Thomas Monjalon
  2016-11-30 18:39                             ` Harish Patil
  0 siblings, 2 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-11-30 17:42 UTC (permalink / raw)
  To: Harish Patil, Thomas Monjalon, dev, Rahul Lakkireddy,
	Stephen Hurd, Jan Medala, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob,  Jerin,
	Yuanhan Liu, Yong Wang
  Cc: Kulasek, TomaszX, olivier.matz


Hi Harish,
> 
> 
> >We need attention of every PMD developers on this thread.
> >
> >Reminder of what Konstantin suggested:
> >"
> >- if the PMD supports TX offloads AND
> >- if to be able use any of these offloads the upper layer SW would have
> >to:
> >    * modify the contents of the packet OR
> >    * obey HW specific restrictions
> >then it is a PMD developer responsibility to provide tx_prep() that would
> >implement
> >expected modifications of the packet contents and restriction checks.
> >Otherwise, tx_prep() implementation is not required and can be safely set
> >to NULL.
> >"
> >
> >I copy/paste also my previous conclusion:
> >
> >Before txprep, there is only one API: the application must prepare the
> >packets checksum itself (get_psd_sum in testpmd).
> >With txprep, the application have 2 choices: keep doing the job itself
> >or call txprep which calls a PMD-specific function.
> >The question is: does non-Intel drivers need a checksum preparation for
> >TSO?
> >Will it behave well if txprep does nothing in these drivers?
> >
> >When looking at the code, most of drivers handle the TSO flags.
> >But it is hard to know whether they rely on the pseudo checksum or not.
> >
> >git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> >drivers/net/
> >
> >drivers/net/bnxt/bnxt_txr.c
> >drivers/net/cxgbe/sge.c
> >drivers/net/e1000/em_rxtx.c
> >drivers/net/e1000/igb_rxtx.c
> >drivers/net/ena/ena_ethdev.c
> >drivers/net/enic/enic_rxtx.c
> >drivers/net/fm10k/fm10k_rxtx.c
> >drivers/net/i40e/i40e_rxtx.c
> >drivers/net/ixgbe/ixgbe_rxtx.c
> >drivers/net/mlx4/mlx4.c
> >drivers/net/mlx5/mlx5_rxtx.c
> >drivers/net/nfp/nfp_net.c
> >drivers/net/qede/qede_rxtx.c
> >drivers/net/thunderx/nicvf_rxtx.c
> >drivers/net/virtio/virtio_rxtx.c
> >drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> >Please, we need a comment for each driver saying
> >"it is OK, we do not need any checksum preparation for TSO"
> >or
> >"yes we have to implement tx_prepare or TSO will not work in this mode"
> >
> 
> qede PMD doesn’t currently support TSO yet, it only supports Tx TCP/UDP/IP
> csum offloads.
> So Tx preparation isn’t applicable. So as of now -
> "it is OK, we do not need any checksum preparation for TSO"

Thanks for the answer.
Though please note that it not only for TSO.
This is for any TX offload for which the upper layer SW would have
to modify the contents of the packet.
Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
exhibits any extra requirements for the user.
Is that correct?

Konstantin   


> 
> 
> Thanks,
> Harish


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 17:42                           ` Ananyev, Konstantin
@ 2016-11-30 18:26                             ` Thomas Monjalon
  2016-11-30 21:01                               ` Jerin Jacob
                                                 ` (2 more replies)
  2016-11-30 18:39                             ` Harish Patil
  1 sibling, 3 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-11-30 18:26 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

2016-11-30 17:42, Ananyev, Konstantin:
> > >Please, we need a comment for each driver saying
> > >"it is OK, we do not need any checksum preparation for TSO"
> > >or
> > >"yes we have to implement tx_prepare or TSO will not work in this mode"
> > >
> > 
> > qede PMD doesn’t currently support TSO yet, it only supports Tx TCP/UDP/IP
> > csum offloads.
> > So Tx preparation isn’t applicable. So as of now -
> > "it is OK, we do not need any checksum preparation for TSO"
> 
> Thanks for the answer.
> Though please note that it not only for TSO.

Oh yes, sorry, my wording was incorrect.
We need to know if any checksum preparation is needed prior
offloading its final computation to the hardware or driver.
So the question applies to TSO and simple checksum offload.

We are still waiting answers for
	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

> This is for any TX offload for which the upper layer SW would have
> to modify the contents of the packet.
> Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
> exhibits any extra requirements for the user.
> Is that correct?

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 17:42                           ` Ananyev, Konstantin
  2016-11-30 18:26                             ` Thomas Monjalon
@ 2016-11-30 18:39                             ` Harish Patil
  1 sibling, 0 replies; 261+ messages in thread
From: Harish Patil @ 2016-11-30 18:39 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, dev, Rahul Lakkireddy,
	Stephen Hurd, Jan Medala, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob,  Jerin,
	Yuanhan Liu, Yong Wang
  Cc: Kulasek, TomaszX, olivier.matz

>
>
>
>Hi Harish,
>> 
>> 
>> >We need attention of every PMD developers on this thread.
>> >
>> >Reminder of what Konstantin suggested:
>> >"
>> >- if the PMD supports TX offloads AND
>> >- if to be able use any of these offloads the upper layer SW would have
>> >to:
>> >    * modify the contents of the packet OR
>> >    * obey HW specific restrictions
>> >then it is a PMD developer responsibility to provide tx_prep() that
>>would
>> >implement
>> >expected modifications of the packet contents and restriction checks.
>> >Otherwise, tx_prep() implementation is not required and can be safely
>>set
>> >to NULL.
>> >"
>> >
>> >I copy/paste also my previous conclusion:
>> >
>> >Before txprep, there is only one API: the application must prepare the
>> >packets checksum itself (get_psd_sum in testpmd).
>> >With txprep, the application have 2 choices: keep doing the job itself
>> >or call txprep which calls a PMD-specific function.
>> >The question is: does non-Intel drivers need a checksum preparation for
>> >TSO?
>> >Will it behave well if txprep does nothing in these drivers?
>> >
>> >When looking at the code, most of drivers handle the TSO flags.
>> >But it is hard to know whether they rely on the pseudo checksum or not.
>> >
>> >git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
>> >drivers/net/
>> >
>> >drivers/net/bnxt/bnxt_txr.c
>> >drivers/net/cxgbe/sge.c
>> >drivers/net/e1000/em_rxtx.c
>> >drivers/net/e1000/igb_rxtx.c
>> >drivers/net/ena/ena_ethdev.c
>> >drivers/net/enic/enic_rxtx.c
>> >drivers/net/fm10k/fm10k_rxtx.c
>> >drivers/net/i40e/i40e_rxtx.c
>> >drivers/net/ixgbe/ixgbe_rxtx.c
>> >drivers/net/mlx4/mlx4.c
>> >drivers/net/mlx5/mlx5_rxtx.c
>> >drivers/net/nfp/nfp_net.c
>> >drivers/net/qede/qede_rxtx.c
>> >drivers/net/thunderx/nicvf_rxtx.c
>> >drivers/net/virtio/virtio_rxtx.c
>> >drivers/net/vmxnet3/vmxnet3_rxtx.c
>> >
>> >Please, we need a comment for each driver saying
>> >"it is OK, we do not need any checksum preparation for TSO"
>> >or
>> >"yes we have to implement tx_prepare or TSO will not work in this mode"
>> >
>> 
>> qede PMD doesn’t currently support TSO yet, it only supports Tx
>>TCP/UDP/IP
>> csum offloads.
>> So Tx preparation isn’t applicable. So as of now -
>> "it is OK, we do not need any checksum preparation for TSO"
>
>Thanks for the answer.
>Though please note that it not only for TSO.

Okay. I initially thought so. But was not sure, so I explicitly indicated
that there is no TSO support.

>This is for any TX offload for which the upper layer SW would have
>to modify the contents of the packet.
>Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
>exhibits any extra requirements for the user.
>Is that correct?

That’s right.

>
>Konstantin   
>
>
>> 
>> 
>> Thanks,
>> Harish
>
>



^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
                                           ` (2 preceding siblings ...)
  2016-11-30 16:34                         ` Harish Patil
@ 2016-11-30 19:37                         ` Ajit Khaparde
  2016-12-01  8:24                         ` Rahul Lakkireddy
  2016-12-06 15:53                         ` Ferruh Yigit
  5 siblings, 0 replies; 261+ messages in thread
From: Ajit Khaparde @ 2016-11-30 19:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala, Jakub Palider,
	John Daley, Adrien Mazarguil, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Tomasz Kulasek,
	konstantin.ananyev, olivier.matz

On Mon,
​​
Nov 28, 2016 at 5:03 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> We need attention of every PMD developers on this thread.
>
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would
> implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set
> to NULL.
> "
>
> I copy/paste also my previous conclusion:
>
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for
> TSO?
> Will it behave well if txprep does nothing in these drivers?
>
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
>
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG'
> drivers/net/
>
> drivers/net/bnxt/bnxt_txr.c
>
​::: snip:::
​


>
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"
>

​The bnxt devices
 don't need pse
​​
udo header checksum in the packet for TSO or TX
checksum offload.
​ So..
​
"it is OK, we do not need any checksum preparation for TSO"

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 18:26                             ` Thomas Monjalon
@ 2016-11-30 21:01                               ` Jerin Jacob
  2016-12-01 10:50                               ` Ferruh Yigit
  2016-12-02 23:55                               ` Yong Wang
  2 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-11-30 21:01 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ananyev, Konstantin, Harish Patil, dev, Rahul Lakkireddy,
	Stephen Hurd, Jan Medala, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Yong Wang, Kulasek,  TomaszX, olivier.matz

On Wed, Nov 30, 2016 at 07:26:36PM +0100, Thomas Monjalon wrote:
> 2016-11-30 17:42, Ananyev, Konstantin:
> > > >Please, we need a comment for each driver saying
> > > >"it is OK, we do not need any checksum preparation for TSO"
> > > >or
> > > >"yes we have to implement tx_prepare or TSO will not work in this mode"
> > > >
> > > 
> > > qede PMD doesn’t currently support TSO yet, it only supports Tx TCP/UDP/IP
> > > csum offloads.
> > > So Tx preparation isn’t applicable. So as of now -
> > > "it is OK, we do not need any checksum preparation for TSO"
> > 
> > Thanks for the answer.
> > Though please note that it not only for TSO.
> 
> Oh yes, sorry, my wording was incorrect.
> We need to know if any checksum preparation is needed prior
> offloading its final computation to the hardware or driver.
> So the question applies to TSO and simple checksum offload.
> 
> We are still waiting answers for
> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

The thunderx devices don't need pseudo header checksum
in the packet for TSO or TX checksum offload. So..
"it is OK, we do not need any checksum preparation for TSO"

> 
> > This is for any TX offload for which the upper layer SW would have
> > to modify the contents of the packet.
> > Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
> > exhibits any extra requirements for the user.
> > Is that correct?
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 10:54                           ` Ananyev, Konstantin
@ 2016-12-01  7:15                             ` Adrien Mazarguil
  2016-12-01  8:58                               ` Thomas Monjalon
  2016-12-02  1:00                               ` Ananyev, Konstantin
  0 siblings, 2 replies; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-01  7:15 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

Hi Konstantin,

On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
[...]
> > Something is definitely needed here, and only PMDs can provide it. I think
> > applications should not have to clear checksum fields or initialize them to
> > some magic value, same goes for any other offload or hardware limitation
> > that needs to be worked around.
> > 
> > tx_prep() is one possible answer to this issue, however as mentioned in the
> > original patch it can be very expensive if exposed by the PMD.
> > 
> > Another issue I'm more concerned about is the way limitations are managed
> > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> > structure contains new fields that are only relevant to a few devices, and I
> > fear it will keep growing with each new hardware quirk to manage, breaking
> > ABIs in the process.
> 
> Well, if some new HW capability/limitation would arise and we'd like to support
> it in DPDK, then yes we probably would need to think how to incorporate it here.
> Do you have anything particular in mind here?

Nothing in particular, so for the sake of the argument, let's suppose that I
would like to add a field to expose some limitation that only applies to my
PMD during TX but looks generic enough to make sense, e.g. maximum packet
size when VLAN tagging is requested. PMDs are free to set that field to some
special value (say, 0) if they do not care.

Since that field exists however, conscious applications should check its
value for each packet that needs to be transmitted. This extra code causes a
slowdown just by sitting in the data path. Since it is not the only field in
that structure, the performance impact can be significant.

Even though this code is inside applications, it remains unfair to PMDs for
which these tests are irrelevant. This problem is identified and addressed
by tx_prepare().

Thanks to tx_prepare(), these checks are moved back into PMDs where they
belong. PMDs that do not need them do not have to provide support for
tx_prepare() and do not suffer any performance impact as result;
applications only have to make sure tx_prepare() is always called at some
point before tx_burst().

Once you reach this stage, you've effectively made tx_prepare() mandatory
before tx_burst(). If some bug occurs, then perhaps you forgot to call
tx_prepare(), you just need to add it. The total cost for doing TX is
therefore tx_prepare() + tx_burst().

I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
remain optional for long. Sure, PMDs that do not implement it do not care,
I'm focusing on applications, for which the performance impact of calling
tx_prepare() followed by tx_burst() is higher than a single tx_burst()
performing all the necessary preparation at once.

[...]
> > Following the same logic, why can't such a thing be made part of the TX
> > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > whenever necessary). From an application standpoint, what are the advantages
> > of having to:
> > 
> >  if (tx_prep()) // iterate and update mbufs as needed
> >      tx_burst(); // iterate and send
> > 
> > Compared to:
> > 
> >  tx_burst(); // iterate, update as needed and send
> 
> I think that was discussed extensively quite a lot previously here:
> As Thomas already replied - main motivation is to allow user
> to execute them on different stages of packet TX pipeline,
> and probably on different cores.
> I think that provides better flexibility to the user to when/where
> do these preparations and hopefully would lead to better performance.

And I agree, I think this use case is valid but does not warrant such a high
penalty when your application does not need that much flexibility. Simple
(yet conscious) applications need the highest performance. Complex ones as
you described already suffer quite a bit from IPCs and won't mind a couple
of extra CPU cycles right?

Yes they will, therefore we need a method that satisfies both cases.

As a possible solution, a special mbuf flag could be added to each mbuf
having gone through tx_prepare(). That way, tx_burst() could skip some
checks and things it would otherwise have done.

Another possibility, telling the PMD first that you always intend to use
tx_prepare() and getting a simpler/faster tx_burst() callback as a result.

> Though, if you or any other PMD developer/maintainer would prefer
> for particular PMD to combine both functionalities into tx_burst() and
> keep tx_prep() as NOP - this is still possible too.  

Whether they implement it or not, this issue does not impact PMDs anyway, we
should probably ask DPDK application developers instead.

[...]
> > For both mlx4 and mlx5 then,
> > "it is OK, we do not need any checksum preparation for TSO".
> > 
> > Actually I do not think we'll ever need tx_prep() unless we add our own
> > quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
> > handled by TX burst functions.
> 
> Ok, so MLX PMD is not affected by these changes and tx_prep for MLX can be safely
> set to NULL, correct?

Correct, actually the rest of this message should be in a separate
thread. From the MLX side, there is no issue with tx_prepare().

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 10:30                             ` Kulasek, TomaszX
@ 2016-12-01  7:19                               ` Adrien Mazarguil
  0 siblings, 0 replies; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-01  7:19 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: Thomas Monjalon, dev, Ananyev, Konstantin, olivier.matz

Hi Tomasz,

On Wed, Nov 30, 2016 at 10:30:54AM +0000, Kulasek, TomaszX wrote:
[...]
> > > In my opinion the second approach is both faster to applications and
> > > more friendly from a usability perspective, am I missing something
> > obvious?
> > 
> > I think it was not clearly explained in this patchset, but this is my
> > understanding:
> > tx_prepare and tx_burst can be called at different stages of a pipeline,
> > on different cores.
> 
> Yes, this API is intended to be used optionaly, not only just before tx_burst.
> 
> 1. Separating both stages:
>    a) We may have a control over burst (packet content, validation) when needed.
>    b) For invalid packets we may restore them or do some another task if needed (even on early stage of processing).
>    c) Tx burst keep as simple as it should be.
> 
> 2. Joining the functionality of tx_prepare and tx_burst have some disadvantages:
>    a) When packet is invalid it cannot be restored by application should be dropped.
>    b) Tx burst needs to modify the content of the packet.
>    c) We have no way to eliminate overhead of preparation (tx_prepare) for the application where performance is a key.
> 
> 3. Using tx callbacks
>    a) We still need to have different implementations for different devices.
>    b) The overhead in performance (comparing to the pair tx_prepare/tx_burst) will not be better while both ways uses very similar mechanism.
> 
> In addition, tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).

Thanks for the reminder, also I've missed v12 for some reason and still
thought rte_phdr_cksum_fix() was some generic function that applications had
to use directly regardless.

Although I agree with your description, I still think there is an issue,
please see my reply to Konstantin [1].

[1] http://dpdk.org/ml/archives/dev/2016-December/050970.html

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
                                           ` (3 preceding siblings ...)
  2016-11-30 19:37                         ` Ajit Khaparde
@ 2016-12-01  8:24                         ` Rahul Lakkireddy
  2016-12-06 15:53                         ` Ferruh Yigit
  5 siblings, 0 replies; 261+ messages in thread
From: Rahul Lakkireddy @ 2016-12-01  8:24 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Stephen Hurd, Jan Medala, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Harish Patil, Rasesh Mody,
	Jerin Jacob, Yuanhan Liu, Yong Wang, Tomasz Kulasek,
	konstantin.ananyev, olivier.matz

Hi Thomas,

On Monday, November 11/28/16, 2016 at 16:33:06 +0530, Thomas Monjalon wrote:
> We need attention of every PMD developers on this thread.
> 
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

For CXGBE PMD, "it is OK, we do not need any checksum preparation for
TSO".

Thanks,
Rahul

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-01  7:15                             ` Adrien Mazarguil
@ 2016-12-01  8:58                               ` Thomas Monjalon
  2016-12-01 22:03                                 ` Jerin Jacob
  2016-12-02  1:00                               ` Ananyev, Konstantin
  1 sibling, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01  8:58 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Ananyev, Konstantin, dev, Rahul Lakkireddy, Stephen Hurd,
	Jan Medala, Jakub Palider, John Daley, Alejandro Lucero,
	Harish Patil, Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang,
	Kulasek, TomaszX, olivier.matz

2016-12-01 08:15, Adrien Mazarguil:
> I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> remain optional for long. Sure, PMDs that do not implement it do not care,
> I'm focusing on applications, for which the performance impact of calling
> tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> performing all the necessary preparation at once.

I agree that tx_prepare() should become mandatory shortly.

> [...]
> > > Following the same logic, why can't such a thing be made part of the TX
> > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > whenever necessary). From an application standpoint, what are the advantages
> > > of having to:
> > > 
> > >  if (tx_prep()) // iterate and update mbufs as needed
> > >      tx_burst(); // iterate and send
> > > 
> > > Compared to:
> > > 
> > >  tx_burst(); // iterate, update as needed and send
> > 
> > I think that was discussed extensively quite a lot previously here:
> > As Thomas already replied - main motivation is to allow user
> > to execute them on different stages of packet TX pipeline,
> > and probably on different cores.
> > I think that provides better flexibility to the user to when/where
> > do these preparations and hopefully would lead to better performance.
> 
> And I agree, I think this use case is valid but does not warrant such a high
> penalty when your application does not need that much flexibility. Simple
> (yet conscious) applications need the highest performance. Complex ones as
> you described already suffer quite a bit from IPCs and won't mind a couple
> of extra CPU cycles right?
> 
> Yes they will, therefore we need a method that satisfies both cases.
> 
> As a possible solution, a special mbuf flag could be added to each mbuf
> having gone through tx_prepare(). That way, tx_burst() could skip some
> checks and things it would otherwise have done.

I like this idea!

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 18:26                             ` Thomas Monjalon
  2016-11-30 21:01                               ` Jerin Jacob
@ 2016-12-01 10:50                               ` Ferruh Yigit
  2016-12-02 23:55                               ` Yong Wang
  2 siblings, 0 replies; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-01 10:50 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin
  Cc: dev, Jan Medala, Jakub Palider, Alejandro Lucero, Yuanhan Liu,
	Yong Wang, Kulasek, TomaszX, Netanel Belgazal, Evgeny Schemeilin

On 11/30/2016 6:26 PM, Thomas Monjalon wrote:
> 2016-11-30 17:42, Ananyev, Konstantin:
>>>> Please, we need a comment for each driver saying
>>>> "it is OK, we do not need any checksum preparation for TSO"
>>>> or
>>>> "yes we have to implement tx_prepare or TSO will not work in this mode"
>>>>
>>>
>>> qede PMD doesn’t currently support TSO yet, it only supports Tx TCP/UDP/IP
>>> csum offloads.
>>> So Tx preparation isn’t applicable. So as of now -
>>> "it is OK, we do not need any checksum preparation for TSO"
>>
>> Thanks for the answer.
>> Though please note that it not only for TSO.
> 
> Oh yes, sorry, my wording was incorrect.
> We need to know if any checksum preparation is needed prior
> offloading its final computation to the hardware or driver.
> So the question applies to TSO and simple checksum offload.
> 
> We are still waiting answers for
> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> 

Remaining ones:

ena
nfp
virtio
vmxnet3

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-11-28 10:54                         ` Thomas Monjalon
@ 2016-12-01 16:24                           ` Thomas Monjalon
  2016-12-01 19:20                             ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01 16:24 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev, olivier.matz, bruce.richardson

Please, a reply to this question would be greatly appreciated.

2016-11-28 11:54, Thomas Monjalon:
> Hi,
> 
> 2016-11-23 18:36, Tomasz Kulasek:
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> >  CONFIG_RTE_LIBRTE_IEEE1588=n
> >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> 
> Please, remind me why is there a configuration here.
> It should be the responsibility of the application to call tx_prepare
> or not. If the application choose to use this new API but it is
> disabled, then the packets won't be prepared and there is no error code:
> 
> > +#else
> > +
> > +static inline uint16_t
> > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
> > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > +{
> > +       return nb_pkts;
> > +}
> > +
> > +#endif
> 
> So the application is not aware of the issue and it will not use
> any fallback.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
  2016-11-28 10:54                         ` Thomas Monjalon
@ 2016-12-01 16:26                         ` Thomas Monjalon
  2016-12-01 16:28                         ` Thomas Monjalon
  2 siblings, 0 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01 16:26 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev, olivier.matz

2016-11-23 18:36, Tomasz Kulasek:
> Added fields to the `struct rte_eth_desc_lim`:
> 
> 	uint16_t nb_seg_max;
> 		/**< Max number of segments per whole packet. */
> 
> 	uint16_t nb_mtu_seg_max;
> 		/**< Max number of segments per one MTU */

How (and when) an application is supposed to use these fields?
Is it useful to expose them if we make tx_prepare() mandatory?

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
  2016-11-28 10:54                         ` Thomas Monjalon
  2016-12-01 16:26                         ` Thomas Monjalon
@ 2016-12-01 16:28                         ` Thomas Monjalon
  2016-12-02  1:06                           ` Ananyev, Konstantin
  2 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01 16:28 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev, konstantin.ananyev, olivier.matz

2016-11-23 18:36, Tomasz Kulasek:
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet device.
> + *
> + * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prepare() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.
> + *
> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.
> + *
> + * - Update and/or reset required checksums when tx offload is set for packet.
> + *
> + * Since this function can modify packet data, provided mbufs must be safely
> + * writable (e.g. modified data cannot be in shared segment).

I think we will have to remove this limitation in next releases.
As we don't know how it could affect the API, I suggest to declare this
API EXPERIMENTAL.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 16:24                           ` Thomas Monjalon
@ 2016-12-01 19:20                             ` Kulasek, TomaszX
  2016-12-01 19:52                               ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-01 19:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin, olivier.matz, Richardson, Bruce

Hi Thomas,

Sorry, I have answered for this question in another thread and I missed about this one. Detailed answer is below.

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, December 1, 2016 17:24
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> Please, a reply to this question would be greatly appreciated.
> 
> 2016-11-28 11:54, Thomas Monjalon:
> > Hi,
> >
> > 2016-11-23 18:36, Tomasz Kulasek:
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> >
> > Please, remind me why is there a configuration here.
> > It should be the responsibility of the application to call tx_prepare
> > or not. If the application choose to use this new API but it is
> > disabled, then the packets won't be prepared and there is no error code:
> >
> > > +#else
> > > +
> > > +static inline uint16_t
> > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> uint16_t queue_id,
> > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > +nb_pkts) {
> > > +       return nb_pkts;
> > > +}
> > > +
> > > +#endif
> >
> > So the application is not aware of the issue and it will not use any
> > fallback.

tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).

Jerin observed that on some architectures (e.g. low-end ARM with embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause significant performance drop, so he proposed to introduce this configuration flag to provide real NOOP when tx_prepare functionality is not required, and can be turned on based on the _target_ configuration.

For other cases, when this flag is turned on (by default), and tx_prepare is not implemented, functional NOOP is used based on comparison (dev->tx_pkt_prepare == NULL).

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 19:20                             ` Kulasek, TomaszX
@ 2016-12-01 19:52                               ` Thomas Monjalon
  2016-12-01 21:56                                 ` Jerin Jacob
  2016-12-01 22:31                                 ` Kulasek, TomaszX
  0 siblings, 2 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01 19:52 UTC (permalink / raw)
  To: Kulasek, TomaszX
  Cc: dev, Ananyev, Konstantin, olivier.matz, Richardson, Bruce

2016-12-01 19:20, Kulasek, TomaszX:
> Hi Thomas,
> 
> Sorry, I have answered for this question in another thread and I missed about this one. Detailed answer is below.

Yes you already gave this answer.
And I will continue asking the question until you understand it.

> > 2016-11-28 11:54, Thomas Monjalon:
> > > Hi,
> > >
> > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > --- a/config/common_base
> > > > +++ b/config/common_base
> > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > >
> > > Please, remind me why is there a configuration here.
> > > It should be the responsibility of the application to call tx_prepare
> > > or not. If the application choose to use this new API but it is
> > > disabled, then the packets won't be prepared and there is no error code:
> > >
> > > > +#else
> > > > +
> > > > +static inline uint16_t
> > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> > uint16_t queue_id,
> > > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > > +nb_pkts) {
> > > > +       return nb_pkts;
> > > > +}
> > > > +
> > > > +#endif
> > >
> > > So the application is not aware of the issue and it will not use any
> > > fallback.
> 
> tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).
> 
> Jerin observed that on some architectures (e.g. low-end ARM with embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause significant performance drop, so he proposed to introduce this configuration flag to provide real NOOP when tx_prepare functionality is not required, and can be turned on based on the _target_ configuration.
> 
> For other cases, when this flag is turned on (by default), and tx_prepare is not implemented, functional NOOP is used based on comparison (dev->tx_pkt_prepare == NULL).

So if the application call this function and if it is disabled, it simply
won't work. Packets won't be prepared, checksum won't be computed.

I give up, I just NACK.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 19:52                               ` Thomas Monjalon
@ 2016-12-01 21:56                                 ` Jerin Jacob
  2016-12-01 22:31                                 ` Kulasek, TomaszX
  1 sibling, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-12-01 21:56 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Kulasek, TomaszX, dev, Ananyev,  Konstantin, olivier.matz,
	Richardson, Bruce

On Thu, Dec 01, 2016 at 08:52:22PM +0100, Thomas Monjalon wrote:
> 2016-12-01 19:20, Kulasek, TomaszX:
> > Hi Thomas,
> > 
> > Sorry, I have answered for this question in another thread and I missed about this one. Detailed answer is below.
> 
> Yes you already gave this answer.
> And I will continue asking the question until you understand it.
> 
> > > 2016-11-28 11:54, Thomas Monjalon:
> > > > Hi,
> > > >
> > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > > >
> > > > Please, remind me why is there a configuration here.
> > > > It should be the responsibility of the application to call tx_prepare
> > > > or not. If the application choose to use this new API but it is
> > > > disabled, then the packets won't be prepared and there is no error code:
> > > >
> > > > > +#else
> > > > > +
> > > > > +static inline uint16_t
> > > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> > > uint16_t queue_id,
> > > > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > > > +nb_pkts) {
> > > > > +       return nb_pkts;
> > > > > +}
> > > > > +
> > > > > +#endif
> > > >
> > > > So the application is not aware of the issue and it will not use any
> > > > fallback.
> > 
> > tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).
> > 
> > Jerin observed that on some architectures (e.g. low-end ARM with embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause significant performance drop, so he proposed to introduce this configuration flag to provide real NOOP when tx_prepare functionality is not required, and can be turned on based on the _target_ configuration.
> > 
> > For other cases, when this flag is turned on (by default), and tx_prepare is not implemented, functional NOOP is used based on comparison (dev->tx_pkt_prepare == NULL).
> 
> So if the application call this function and if it is disabled, it simply
> won't work. Packets won't be prepared, checksum won't be computed.
The use case I was referring  was "integrated NIC" case where
- DPDK target with no external NW PCI card support
AND
- The "integrated NIC" does not need tx_prepare

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-01  8:58                               ` Thomas Monjalon
@ 2016-12-01 22:03                                 ` Jerin Jacob
  0 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-12-01 22:03 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Adrien Mazarguil, Ananyev, Konstantin, dev, Rahul Lakkireddy,
	Stephen Hurd, Jan Medala, Jakub Palider, John Daley,
	Alejandro Lucero, Harish Patil, Rasesh Mody, Yuanhan Liu,
	Yong Wang, Kulasek, TomaszX, olivier.matz

On Thu, Dec 01, 2016 at 09:58:31AM +0100, Thomas Monjalon wrote:
> 2016-12-01 08:15, Adrien Mazarguil:
> > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > remain optional for long. Sure, PMDs that do not implement it do not care,
> > I'm focusing on applications, for which the performance impact of calling
> > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > performing all the necessary preparation at once.
> 
> I agree that tx_prepare() should become mandatory shortly.

I agree. The tx_prepare has to be mandatory. Application will have no
idea on how PMD drivers use this hook to fix up PMD tx side limitations.
On other side, if it turns out to be mandatory, what real benefit it is
going to have compared to existing scheme of just tx_burst.

> 
> > [...]
> > > > Following the same logic, why can't such a thing be made part of the TX
> > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > whenever necessary). From an application standpoint, what are the advantages
> > > > of having to:
> > > > 
> > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > >      tx_burst(); // iterate and send
> > > > 
> > > > Compared to:
> > > > 
> > > >  tx_burst(); // iterate, update as needed and send
> > > 
> > > I think that was discussed extensively quite a lot previously here:
> > > As Thomas already replied - main motivation is to allow user
> > > to execute them on different stages of packet TX pipeline,
> > > and probably on different cores.
> > > I think that provides better flexibility to the user to when/where
> > > do these preparations and hopefully would lead to better performance.
> > 
> > And I agree, I think this use case is valid but does not warrant such a high
> > penalty when your application does not need that much flexibility. Simple
> > (yet conscious) applications need the highest performance. Complex ones as
> > you described already suffer quite a bit from IPCs and won't mind a couple
> > of extra CPU cycles right?
> > 
> > Yes they will, therefore we need a method that satisfies both cases.
> > 
> > As a possible solution, a special mbuf flag could be added to each mbuf
> > having gone through tx_prepare(). That way, tx_burst() could skip some
> > checks and things it would otherwise have done.
> 
> I like this idea!
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 19:52                               ` Thomas Monjalon
  2016-12-01 21:56                                 ` Jerin Jacob
@ 2016-12-01 22:31                                 ` Kulasek, TomaszX
  2016-12-01 23:50                                   ` Thomas Monjalon
  2016-12-02  0:10                                   ` Ananyev, Konstantin
  1 sibling, 2 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-01 22:31 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin, olivier.matz, Richardson, Bruce

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, December 1, 2016 20:52
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> 2016-12-01 19:20, Kulasek, TomaszX:
> > Hi Thomas,
> >
> > Sorry, I have answered for this question in another thread and I missed
> about this one. Detailed answer is below.
> 
> Yes you already gave this answer.
> And I will continue asking the question until you understand it.
> 
> > > 2016-11-28 11:54, Thomas Monjalon:
> > > > Hi,
> > > >
> > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > > >
> > > > Please, remind me why is there a configuration here.
> > > > It should be the responsibility of the application to call
> > > > tx_prepare or not. If the application choose to use this new API
> > > > but it is disabled, then the packets won't be prepared and there is
> no error code:
> > > >
> > > > > +#else
> > > > > +
> > > > > +static inline uint16_t
> > > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> > > uint16_t queue_id,
> > > > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > > > +nb_pkts) {
> > > > > +       return nb_pkts;
> > > > > +}
> > > > > +
> > > > > +#endif
> > > >
> > > > So the application is not aware of the issue and it will not use
> > > > any fallback.
> >
> > tx_prepare mechanism can be turned off by compilation flag (as discussed
> with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real
> NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory
> dereference and check can have significant impact on performance).
> >
> > Jerin observed that on some architectures (e.g. low-end ARM with
> embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause
> significant performance drop, so he proposed to introduce this
> configuration flag to provide real NOOP when tx_prepare functionality is
> not required, and can be turned on based on the _target_ configuration.
> >
> > For other cases, when this flag is turned on (by default), and
> tx_prepare is not implemented, functional NOOP is used based on comparison
> (dev->tx_pkt_prepare == NULL).
> 
> So if the application call this function and if it is disabled, it simply
> won't work. Packets won't be prepared, checksum won't be computed.
> 
> I give up, I just NACK.

It is not to be turned on/off whatever someone wants, but only and only for the case, when platform developer knows, that his platform doesn't need this callback, so, he may turn off it and then save some performance (this option is per target).

For this case, the behavior of tx_prepare will be exactly the same when it is turned on or off. If is not the same, there's no sense to turn it off. There were long topic, where we've tried to convince you, that it should be turned on for all devices.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 22:31                                 ` Kulasek, TomaszX
@ 2016-12-01 23:50                                   ` Thomas Monjalon
  2016-12-09 13:25                                     ` Kulasek, TomaszX
  2016-12-02  0:10                                   ` Ananyev, Konstantin
  1 sibling, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-01 23:50 UTC (permalink / raw)
  To: Kulasek, TomaszX
  Cc: dev, Ananyev, Konstantin, olivier.matz, Richardson, Bruce

2016-12-01 22:31, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-12-01 19:20, Kulasek, TomaszX:
> > > Hi Thomas,
> > >
> > > Sorry, I have answered for this question in another thread and I missed
> > about this one. Detailed answer is below.
> > 
> > Yes you already gave this answer.
> > And I will continue asking the question until you understand it.
> > 
> > > > 2016-11-28 11:54, Thomas Monjalon:
> > > > > Hi,
> > > > >
> > > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > > > >
> > > > > Please, remind me why is there a configuration here.
> > > > > It should be the responsibility of the application to call
> > > > > tx_prepare or not. If the application choose to use this new API
> > > > > but it is disabled, then the packets won't be prepared and there is
> > no error code:
> > > > >
> > > > > > +#else
> > > > > > +
> > > > > > +static inline uint16_t
> > > > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> > > > uint16_t queue_id,
> > > > > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > > > > +nb_pkts) {
> > > > > > +       return nb_pkts;
> > > > > > +}
> > > > > > +
> > > > > > +#endif
> > > > >
> > > > > So the application is not aware of the issue and it will not use
> > > > > any fallback.
> > >
> > > tx_prepare mechanism can be turned off by compilation flag (as discussed
> > with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real
> > NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory
> > dereference and check can have significant impact on performance).
> > >
> > > Jerin observed that on some architectures (e.g. low-end ARM with
> > embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause
> > significant performance drop, so he proposed to introduce this
> > configuration flag to provide real NOOP when tx_prepare functionality is
> > not required, and can be turned on based on the _target_ configuration.
> > >
> > > For other cases, when this flag is turned on (by default), and
> > tx_prepare is not implemented, functional NOOP is used based on comparison
> > (dev->tx_pkt_prepare == NULL).
> > 
> > So if the application call this function and if it is disabled, it simply
> > won't work. Packets won't be prepared, checksum won't be computed.
> > 
> > I give up, I just NACK.
> 
> It is not to be turned on/off whatever someone wants, but only and only for the case, when platform developer knows, that his platform doesn't need this callback, so, he may turn off it and then save some performance (this option is per target).

How may he know? There is no comment in the config file, no documentation.

> For this case, the behavior of tx_prepare will be exactly the same when it is turned on or off. If is not the same, there's no sense to turn it off. There were long topic, where we've tried to convince you, that it should be turned on for all devices.

Really? You tried to convince me to turn it on?
No you were trying to convince Jerin.
I think it is a wrong idea to allow disabling this function.
I didn't comment in first discussion because Jerin told it was really
important for small hardware with fixed NIC, and I thought it would
be implemented in a way the application cannot be misleaded.

The only solution I see here is to add some comments in the configuration
file, below the #else and in the doc.
Have you checked doc/guides/prog_guide/poll_mode_drv.rst?

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 22:31                                 ` Kulasek, TomaszX
  2016-12-01 23:50                                   ` Thomas Monjalon
@ 2016-12-02  0:10                                   ` Ananyev, Konstantin
  2016-12-22 13:14                                     ` Thomas Monjalon
  1 sibling, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-02  0:10 UTC (permalink / raw)
  To: Kulasek, TomaszX, Thomas Monjalon; +Cc: dev, olivier.matz, Richardson, Bruce



> 
> Hi Thomas,
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Thursday, December 1, 2016 20:52
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> >
> > 2016-12-01 19:20, Kulasek, TomaszX:
> > > Hi Thomas,
> > >
> > > Sorry, I have answered for this question in another thread and I missed
> > about this one. Detailed answer is below.
> >
> > Yes you already gave this answer.
> > And I will continue asking the question until you understand it.
> >
> > > > 2016-11-28 11:54, Thomas Monjalon:
> > > > > Hi,
> > > > >
> > > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > > > >
> > > > > Please, remind me why is there a configuration here.
> > > > > It should be the responsibility of the application to call
> > > > > tx_prepare or not. If the application choose to use this new API
> > > > > but it is disabled, then the packets won't be prepared and there is
> > no error code:
> > > > >
> > > > > > +#else
> > > > > > +
> > > > > > +static inline uint16_t
> > > > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused
> > > > uint16_t queue_id,
> > > > > > +               __rte_unused struct rte_mbuf **tx_pkts, uint16_t
> > > > > > +nb_pkts) {
> > > > > > +       return nb_pkts;
> > > > > > +}
> > > > > > +
> > > > > > +#endif
> > > > >
> > > > > So the application is not aware of the issue and it will not use
> > > > > any fallback.
> > >
> > > tx_prepare mechanism can be turned off by compilation flag (as discussed
> > with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real
> > NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory
> > dereference and check can have significant impact on performance).
> > >
> > > Jerin observed that on some architectures (e.g. low-end ARM with
> > embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may cause
> > significant performance drop, so he proposed to introduce this
> > configuration flag to provide real NOOP when tx_prepare functionality is
> > not required, and can be turned on based on the _target_ configuration.
> > >
> > > For other cases, when this flag is turned on (by default), and
> > tx_prepare is not implemented, functional NOOP is used based on comparison
> > (dev->tx_pkt_prepare == NULL).
> >
> > So if the application call this function and if it is disabled, it simply
> > won't work. Packets won't be prepared, checksum won't be computed.
> >
> > I give up, I just NACK.
> 
> It is not to be turned on/off whatever someone wants, but only and only for the case, when platform developer knows, that his platform
> doesn't need this callback, so, he may turn off it and then save some performance (this option is per target).
> 
> For this case, the behavior of tx_prepare will be exactly the same when it is turned on or off. If is not the same, there's no sense to turn it
> off. There were long topic, where we've tried to convince you, that it should be turned on for all devices.

As Tomasz pointed out the RTE_ETHDEV_TX_PREPARE was introduced to fulfill Jerin request.
>From here:
"Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
only integrated NIC controller. On those targets/configs, where integrated NIC
controller does not use tx_prep service it can made it as NOOP to save
cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
< nb_rx))" checks in the application."
According to the measurements he done it can save ~7% on some low-end ARM machine.
You can read whole story here:
http://dpdk.org/dev/patchwork/patch/15770/
Though, if now you guys believe that this is not good enough reason,
I have absolutely no problem to remove the RTE_ETHDEV_TX_PREPARE and associated logic.
I personally don't use ARM boxes and don't plan to,
and in theory users can still do conditional compilation at the upper layer, if they want to. 
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-01  7:15                             ` Adrien Mazarguil
  2016-12-01  8:58                               ` Thomas Monjalon
@ 2016-12-02  1:00                               ` Ananyev, Konstantin
  2016-12-05 15:03                                 ` Adrien Mazarguil
  1 sibling, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-02  1:00 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz


Hi Adrien,

> 
> Hi Konstantin,
> 
> On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> [...]
> > > Something is definitely needed here, and only PMDs can provide it. I think
> > > applications should not have to clear checksum fields or initialize them to
> > > some magic value, same goes for any other offload or hardware limitation
> > > that needs to be worked around.
> > >
> > > tx_prep() is one possible answer to this issue, however as mentioned in the
> > > original patch it can be very expensive if exposed by the PMD.
> > >
> > > Another issue I'm more concerned about is the way limitations are managed
> > > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> > > structure contains new fields that are only relevant to a few devices, and I
> > > fear it will keep growing with each new hardware quirk to manage, breaking
> > > ABIs in the process.
> >
> > Well, if some new HW capability/limitation would arise and we'd like to support
> > it in DPDK, then yes we probably would need to think how to incorporate it here.
> > Do you have anything particular in mind here?
> 
> Nothing in particular, so for the sake of the argument, let's suppose that I
> would like to add a field to expose some limitation that only applies to my
> PMD during TX but looks generic enough to make sense, e.g. maximum packet
> size when VLAN tagging is requested.

Hmm, I didn't hear about such limitations so far, but if it is real case -
sure, feel free to submit the patch.   

> PMDs are free to set that field to some
> special value (say, 0) if they do not care.
> 
> Since that field exists however, conscious applications should check its
> value for each packet that needs to be transmitted. This extra code causes a
> slowdown just by sitting in the data path. Since it is not the only field in
> that structure, the performance impact can be significant.
> 
> Even though this code is inside applications, it remains unfair to PMDs for
> which these tests are irrelevant. This problem is identified and addressed
> by tx_prepare().

I suppose the question is why do we need:
uint16_t nb_seg_max;
uint16_t nb_mtu_seg_max;
as we now have tx_prepare(), right?

For two reasons:
1. Some people might feel that tx_prepare() is not good (smart/fast) enough
for them and would prefer to do necessary preparations for TX offloads themselves.
2. Even if people do use tx_prepare() they still should take this information into accout.
As an example ixgbe can't TX packets with then 40 segments.
Obviously ixbge_tx_prep() performs that check and returns an error.
But it wouldn't try to merge/reallocate mbufs for you.
User still has to do it himself, or just prevent creating such long chains somehow.

> 
> Thanks to tx_prepare(), these checks are moved back into PMDs where they
> belong. PMDs that do not need them do not have to provide support for
> tx_prepare() and do not suffer any performance impact as result;
> applications only have to make sure tx_prepare() is always called at some
> point before tx_burst().
> 
> Once you reach this stage, you've effectively made tx_prepare() mandatory
> before tx_burst(). If some bug occurs, then perhaps you forgot to call
> tx_prepare(), you just need to add it. The total cost for doing TX is
> therefore tx_prepare() + tx_burst().
> 
> I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> remain optional for long. Sure, PMDs that do not implement it do not care,
> I'm focusing on applications, for which the performance impact of calling
> tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> performing all the necessary preparation at once.
> 
> [...]
> > > Following the same logic, why can't such a thing be made part of the TX
> > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > whenever necessary). From an application standpoint, what are the advantages
> > > of having to:
> > >
> > >  if (tx_prep()) // iterate and update mbufs as needed
> > >      tx_burst(); // iterate and send
> > >
> > > Compared to:
> > >
> > >  tx_burst(); // iterate, update as needed and send
> >
> > I think that was discussed extensively quite a lot previously here:
> > As Thomas already replied - main motivation is to allow user
> > to execute them on different stages of packet TX pipeline,
> > and probably on different cores.
> > I think that provides better flexibility to the user to when/where
> > do these preparations and hopefully would lead to better performance.
> 
> And I agree, I think this use case is valid but does not warrant such a high
> penalty when your application does not need that much flexibility. Simple
> (yet conscious) applications need the highest performance. Complex ones as
> you described already suffer quite a bit from IPCs and won't mind a couple
> of extra CPU cycles right?

It would mean an extra cache-miss for every packet, so I think performance hit
would be quite significant. 
About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
Why do you believe that:
tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
tx_prep() itself is quite expensive, let say for Intel HW it includes:
- read mbuf fileds (2 cache-lines),
- read packet header (1/2 cache-lines)
- calculate pseudo-header csum
 - update packet header 
Comparing to that price of extra function call seems neglectable
(if we TX packets in bursts of course). 

> 
> Yes they will, therefore we need a method that satisfies both cases.
> 
> As a possible solution, a special mbuf flag could be added to each mbuf
> having gone through tx_prepare(). That way, tx_burst() could skip some
> checks and things it would otherwise have done.

That's an interesting idea, but it has one drawback:
As I understand, it means that from now on if user doing preparations on his own,
he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
So any existing applications that using TX offloads and do preparation by themselves
would have to be modified to avoid performance loss.

> 
> Another possibility, telling the PMD first that you always intend to use
> tx_prepare() and getting a simpler/faster tx_burst() callback as a result.

That what we have right now (at least for Intel HW):  
it is a user responsibility to do the necessary preparations/checks before calling tx_burst().  
With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
Now he can use a 'proper' PMD provided function.
My vote still would be for that model.

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 16:28                         ` Thomas Monjalon
@ 2016-12-02  1:06                           ` Ananyev, Konstantin
  2016-12-02  8:24                             ` Olivier Matz
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-02  1:06 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev, olivier.matz


> 
> 2016-11-23 18:36, Tomasz Kulasek:
> > +/**
> > + * Process a burst of output packets on a transmit queue of an Ethernet device.
> > + *
> > + * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
> > + * transmitted on the output queue *queue_id* of the Ethernet device designated
> > + * by its *port_id*.
> > + * The *nb_pkts* parameter is the number of packets to be prepared which are
> > + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> > + * allocated from a pool created with rte_pktmbuf_pool_create().
> > + * For each packet to send, the rte_eth_tx_prepare() function performs
> > + * the following operations:
> > + *
> > + * - Check if packet meets devices requirements for tx offloads.
> > + *
> > + * - Check limitations about number of segments.
> > + *
> > + * - Check additional requirements when debug is enabled.
> > + *
> > + * - Update and/or reset required checksums when tx offload is set for packet.
> > + *
> > + * Since this function can modify packet data, provided mbufs must be safely
> > + * writable (e.g. modified data cannot be in shared segment).
> 
> I think we will have to remove this limitation in next releases.
> As we don't know how it could affect the API, I suggest to declare this
> API EXPERIMENTAL.

While I don't really mind to mart it as experimental, I don't really understand the reasoning:
Why " this function can modify packet data, provided mbufs must be safely writable" suddenly becomes a problem?
That seems like and obvious limitation to me and let say tx_burst() has the same one.
Second, I don't see how you are going to remove it without introducing a heavy performance impact.
Konstantin
  

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-02  1:06                           ` Ananyev, Konstantin
@ 2016-12-02  8:24                             ` Olivier Matz
  2016-12-02 16:17                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Olivier Matz @ 2016-12-02  8:24 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, Kulasek, TomaszX, dev

Hi Konstantin,

On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin"
<konstantin.ananyev@intel.com> wrote:
> > 
> > 2016-11-23 18:36, Tomasz Kulasek:  
> > > +/**
> > > + * Process a burst of output packets on a transmit queue of an
> > > Ethernet device.
> > > + *
> > > + * The rte_eth_tx_prepare() function is invoked to prepare
> > > output packets to be
> > > + * transmitted on the output queue *queue_id* of the Ethernet
> > > device designated
> > > + * by its *port_id*.
> > > + * The *nb_pkts* parameter is the number of packets to be
> > > prepared which are
> > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures,
> > > each of them
> > > + * allocated from a pool created with rte_pktmbuf_pool_create().
> > > + * For each packet to send, the rte_eth_tx_prepare() function
> > > performs
> > > + * the following operations:
> > > + *
> > > + * - Check if packet meets devices requirements for tx offloads.
> > > + *
> > > + * - Check limitations about number of segments.
> > > + *
> > > + * - Check additional requirements when debug is enabled.
> > > + *
> > > + * - Update and/or reset required checksums when tx offload is
> > > set for packet.
> > > + *
> > > + * Since this function can modify packet data, provided mbufs
> > > must be safely
> > > + * writable (e.g. modified data cannot be in shared segment).  
> > 
> > I think we will have to remove this limitation in next releases.
> > As we don't know how it could affect the API, I suggest to declare
> > this API EXPERIMENTAL.  
> 
> While I don't really mind to mart it as experimental, I don't really
> understand the reasoning: Why " this function can modify packet data,
> provided mbufs must be safely writable" suddenly becomes a problem?
> That seems like and obvious limitation to me and let say tx_burst()
> has the same one. Second, I don't see how you are going to remove it
> without introducing a heavy performance impact. Konstantin 
> 

About tx_burst(), I don't think we should force the user to provide a
writable mbuf. There are many use cases where passing a clone
already works as of today and it avoids duplicating the mbuf data. For
instance: traffic generator, multicast, bridging/tap, etc...

Moreover, this requirement would be inconsistent with the model you are
proposing in case of pipeline:
 - tx_prepare() on core X, may update the data
 - tx_burst() on core Y, should not touch the data to avoid cache misses


Regards,
Olivier

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-02  8:24                             ` Olivier Matz
@ 2016-12-02 16:17                               ` Ananyev, Konstantin
  2016-12-08 17:24                                 ` Olivier Matz
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-02 16:17 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Thomas Monjalon, Kulasek, TomaszX, dev

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, December 2, 2016 8:24 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> Hi Konstantin,
> 
> On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin"
> <konstantin.ananyev@intel.com> wrote:
> > >
> > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > +/**
> > > > + * Process a burst of output packets on a transmit queue of an
> > > > Ethernet device.
> > > > + *
> > > > + * The rte_eth_tx_prepare() function is invoked to prepare
> > > > output packets to be
> > > > + * transmitted on the output queue *queue_id* of the Ethernet
> > > > device designated
> > > > + * by its *port_id*.
> > > > + * The *nb_pkts* parameter is the number of packets to be
> > > > prepared which are
> > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures,
> > > > each of them
> > > > + * allocated from a pool created with rte_pktmbuf_pool_create().
> > > > + * For each packet to send, the rte_eth_tx_prepare() function
> > > > performs
> > > > + * the following operations:
> > > > + *
> > > > + * - Check if packet meets devices requirements for tx offloads.
> > > > + *
> > > > + * - Check limitations about number of segments.
> > > > + *
> > > > + * - Check additional requirements when debug is enabled.
> > > > + *
> > > > + * - Update and/or reset required checksums when tx offload is
> > > > set for packet.
> > > > + *
> > > > + * Since this function can modify packet data, provided mbufs
> > > > must be safely
> > > > + * writable (e.g. modified data cannot be in shared segment).
> > >
> > > I think we will have to remove this limitation in next releases.
> > > As we don't know how it could affect the API, I suggest to declare
> > > this API EXPERIMENTAL.
> >
> > While I don't really mind to mart it as experimental, I don't really
> > understand the reasoning: Why " this function can modify packet data,
> > provided mbufs must be safely writable" suddenly becomes a problem?
> > That seems like and obvious limitation to me and let say tx_burst()
> > has the same one. Second, I don't see how you are going to remove it
> > without introducing a heavy performance impact. Konstantin
> >
> 
> About tx_burst(), I don't think we should force the user to provide a
> writable mbuf. There are many use cases where passing a clone
> already works as of today and it avoids duplicating the mbuf data. For
> instance: traffic generator, multicast, bridging/tap, etc...
> 
> Moreover, this requirement would be inconsistent with the model you are
> proposing in case of pipeline:
>  - tx_prepare() on core X, may update the data
>  - tx_burst() on core Y, should not touch the data to avoid cache misses
> 

Probably I wasn't very clear in my previous mail.
I am not saying that we should force the user to pass a writable mbuf.
What I am saying that for tx_burst() current expectation is that
after mbuf is handled to tx_burst() user shouldn't try to modify its buffer contents
till TX engine is done with the buffer (mbuf_free() is called by TX func for it).
For tx_prep(), I think, it is the same though restrictions are a bit more strict:
user should not try to read/write to the mbuf while tx_prep() is not finished with it.
What puzzles me is that why that should be the reason to mark tx_prep() as experimental. 
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-30 18:26                             ` Thomas Monjalon
  2016-11-30 21:01                               ` Jerin Jacob
  2016-12-01 10:50                               ` Ferruh Yigit
@ 2016-12-02 23:55                               ` Yong Wang
  2016-12-04 12:11                                 ` Ananyev, Konstantin
  2 siblings, 1 reply; 261+ messages in thread
From: Yong Wang @ 2016-12-02 23:55 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, November 30, 2016 10:27 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Rasesh Mody
> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Yong Wang
> <yongwang@vmware.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 2016-11-30 17:42, Ananyev, Konstantin:
> > > >Please, we need a comment for each driver saying
> > > >"it is OK, we do not need any checksum preparation for TSO"
> > > >or
> > > >"yes we have to implement tx_prepare or TSO will not work in this
> mode"
> > > >
> > >
> > > qede PMD doesn’t currently support TSO yet, it only supports Tx
> TCP/UDP/IP
> > > csum offloads.
> > > So Tx preparation isn’t applicable. So as of now -
> > > "it is OK, we do not need any checksum preparation for TSO"
> >
> > Thanks for the answer.
> > Though please note that it not only for TSO.
> 
> Oh yes, sorry, my wording was incorrect.
> We need to know if any checksum preparation is needed prior
> offloading its final computation to the hardware or driver.
> So the question applies to TSO and simple checksum offload.
> 
> We are still waiting answers for
> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

The case for a virtual device is a little bit more complicated as packets offloaded from a virtual device can eventually be delivered to another virtual NIC or different physical NICs that have different offload requirements.  In ESX, the hypervisor will enforce that the packets offloaded will be something that the hardware expects.  The contract for vmxnet3 is that the guest needs to fill in pseudo header checksum for both l4 checksum only and TSO + l4 checksum offload cases.

> > This is for any TX offload for which the upper layer SW would have
> > to modify the contents of the packet.
> > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> PKT_TX_TCP_CKSUM
> > exhibits any extra requirements for the user.
> > Is that correct?


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-02 23:55                               ` Yong Wang
@ 2016-12-04 12:11                                 ` Ananyev, Konstantin
  2016-12-06 18:25                                   ` Yong Wang
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-04 12:11 UTC (permalink / raw)
  To: Yong Wang, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz

Hi 

> >
> > 2016-11-30 17:42, Ananyev, Konstantin:
> > > > >Please, we need a comment for each driver saying
> > > > >"it is OK, we do not need any checksum preparation for TSO"
> > > > >or
> > > > >"yes we have to implement tx_prepare or TSO will not work in this
> > mode"
> > > > >
> > > >
> > > > qede PMD doesn’t currently support TSO yet, it only supports Tx
> > TCP/UDP/IP
> > > > csum offloads.
> > > > So Tx preparation isn’t applicable. So as of now -
> > > > "it is OK, we do not need any checksum preparation for TSO"
> > >
> > > Thanks for the answer.
> > > Though please note that it not only for TSO.
> >
> > Oh yes, sorry, my wording was incorrect.
> > We need to know if any checksum preparation is needed prior
> > offloading its final computation to the hardware or driver.
> > So the question applies to TSO and simple checksum offload.
> >
> > We are still waiting answers for
> > 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> 
> The case for a virtual device is a little bit more complicated as packets offloaded from a virtual device can eventually be delivered to
> another virtual NIC or different physical NICs that have different offload requirements.  In ESX, the hypervisor will enforce that the packets
> offloaded will be something that the hardware expects.  The contract for vmxnet3 is that the guest needs to fill in pseudo header checksum
> for both l4 checksum only and TSO + l4 checksum offload cases.

Ok, so at first glance that looks to me very similar to Intel HW requirements.
Could you confirm would rte_net_intel_cksum_prepare() 
also work for vmxnet3 or some extra modifications are required?
You can look at it here: http://dpdk.org/dev/patchwork/patch/17184/.
Note that for Intel HW the rules for pseudo-header csum calculation
differ for TSO and non-TSO case.
For TSO length inside pseudo-header are set to 0, while for non-tso case
It should be set to L3 payload length.
Is it the same for vmxnet3 or no?
Thanks
Konstantin

 

> 
> > > This is for any TX offload for which the upper layer SW would have
> > > to modify the contents of the packet.
> > > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> > PKT_TX_TCP_CKSUM
> > > exhibits any extra requirements for the user.
> > > Is that correct?


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-02  1:00                               ` Ananyev, Konstantin
@ 2016-12-05 15:03                                 ` Adrien Mazarguil
  2016-12-05 16:43                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-05 15:03 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

Hi Konstantin,

On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
[...]
> > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > [...]
> > > > Something is definitely needed here, and only PMDs can provide it. I think
> > > > applications should not have to clear checksum fields or initialize them to
> > > > some magic value, same goes for any other offload or hardware limitation
> > > > that needs to be worked around.
> > > >
> > > > tx_prep() is one possible answer to this issue, however as mentioned in the
> > > > original patch it can be very expensive if exposed by the PMD.
> > > >
> > > > Another issue I'm more concerned about is the way limitations are managed
> > > > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> > > > structure contains new fields that are only relevant to a few devices, and I
> > > > fear it will keep growing with each new hardware quirk to manage, breaking
> > > > ABIs in the process.
> > >
> > > Well, if some new HW capability/limitation would arise and we'd like to support
> > > it in DPDK, then yes we probably would need to think how to incorporate it here.
> > > Do you have anything particular in mind here?
> > 
> > Nothing in particular, so for the sake of the argument, let's suppose that I
> > would like to add a field to expose some limitation that only applies to my
> > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > size when VLAN tagging is requested.
> 
> Hmm, I didn't hear about such limitations so far, but if it is real case -
> sure, feel free to submit the patch.   

I won't, that was hypothetical.

> > PMDs are free to set that field to some
> > special value (say, 0) if they do not care.
> > 
> > Since that field exists however, conscious applications should check its
> > value for each packet that needs to be transmitted. This extra code causes a
> > slowdown just by sitting in the data path. Since it is not the only field in
> > that structure, the performance impact can be significant.
> > 
> > Even though this code is inside applications, it remains unfair to PMDs for
> > which these tests are irrelevant. This problem is identified and addressed
> > by tx_prepare().
> 
> I suppose the question is why do we need:
> uint16_t nb_seg_max;
> uint16_t nb_mtu_seg_max;
> as we now have tx_prepare(), right?
> 
> For two reasons:
> 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> for them and would prefer to do necessary preparations for TX offloads themselves.
> 
> 2. Even if people do use tx_prepare() they still should take this information into accout.
> As an example ixgbe can't TX packets with then 40 segments.
> Obviously ixbge_tx_prep() performs that check and returns an error.

Problem is that tx_prepare() also provides safeties which are not part of
tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
fact struct rte_eth_desc_lim can grow new fields anytime, application
developers will be tempted to just call tx_prepare() and focus on more
useful things.

Put another way, from a user's point of view, tx_prepare() is an opaque
function that greatly increases tx_burst()'s ability to send mbufs as
requested, with extra error checking on top; applications not written to run
on a specific PMD/device (all of them ideally) will thus call tx_prepare()
at some point.

> But it wouldn't try to merge/reallocate mbufs for you.
> User still has to do it himself, or just prevent creating such long chains somehow.

Yes, that's another debate. PMDs could still implement a software fallback
for unlikely slow events like these. The number of PMDs is not going to
decrease, each device having its own set of weird limitations in specific
cases, PMDs should do their best to process mbufs even if that means slowly
due to the lack of preparation.

tx_prepare() has its uses but should really be optional, in the sense that
if that function is not called, tx_burst() should deal with it somehow.

> > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > belong. PMDs that do not need them do not have to provide support for
> > tx_prepare() and do not suffer any performance impact as result;
> > applications only have to make sure tx_prepare() is always called at some
> > point before tx_burst().
> > 
> > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > tx_prepare(), you just need to add it. The total cost for doing TX is
> > therefore tx_prepare() + tx_burst().
> > 
> > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > remain optional for long. Sure, PMDs that do not implement it do not care,
> > I'm focusing on applications, for which the performance impact of calling
> > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > performing all the necessary preparation at once.
> > 
> > [...]
> > > > Following the same logic, why can't such a thing be made part of the TX
> > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > whenever necessary). From an application standpoint, what are the advantages
> > > > of having to:
> > > >
> > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > >      tx_burst(); // iterate and send
> > > >
> > > > Compared to:
> > > >
> > > >  tx_burst(); // iterate, update as needed and send
> > >
> > > I think that was discussed extensively quite a lot previously here:
> > > As Thomas already replied - main motivation is to allow user
> > > to execute them on different stages of packet TX pipeline,
> > > and probably on different cores.
> > > I think that provides better flexibility to the user to when/where
> > > do these preparations and hopefully would lead to better performance.
> > 
> > And I agree, I think this use case is valid but does not warrant such a high
> > penalty when your application does not need that much flexibility. Simple
> > (yet conscious) applications need the highest performance. Complex ones as
> > you described already suffer quite a bit from IPCs and won't mind a couple
> > of extra CPU cycles right?
> 
> It would mean an extra cache-miss for every packet, so I think performance hit
> would be quite significant. 

A performance hit has to occur somewhere regardless, because something has
to be done in order to send packets that need it. Whether this cost is in
application code or in a PMD function, it remains part of TX.

> About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> Why do you believe that:
> tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?

I mean instead of two function calls with their own loops:

 tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }

 tx_burst() { foreach (pkt) { check(); stuff(); ... } }

You end up with one:

 tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }

Which usually is more efficient.

> tx_prep() itself is quite expensive, let say for Intel HW it includes:
> - read mbuf fileds (2 cache-lines),
> - read packet header (1/2 cache-lines)
> - calculate pseudo-header csum
>  - update packet header 
> Comparing to that price of extra function call seems neglectable
> (if we TX packets in bursts of course). 

We agree its performance is a critical issue then, sharing half the read
steps with tx_burst() would make sense to me.

> > Yes they will, therefore we need a method that satisfies both cases.
> > 
> > As a possible solution, a special mbuf flag could be added to each mbuf
> > having gone through tx_prepare(). That way, tx_burst() could skip some
> > checks and things it would otherwise have done.
> 
> That's an interesting idea, but it has one drawback:
> As I understand, it means that from now on if user doing preparations on his own,
> he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> So any existing applications that using TX offloads and do preparation by themselves
> would have to be modified to avoid performance loss.

In my opinion, users should not do preparation on their own. If we provide a
generic method, it has to be fast enough to replace theirs. Perhaps not as
fast since it would work with all PMDs (usual trade-off), but acceptably so.

> > Another possibility, telling the PMD first that you always intend to use
> > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> 
> That what we have right now (at least for Intel HW):  
> it is a user responsibility to do the necessary preparations/checks before calling tx_burst().  
> With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> Now he can use a 'proper' PMD provided function.
> 
> My vote still would be for that model.

OK, then in a nutshell:

1. Users are not expected to perform preparation/checks themselves anymore,
   if they do, it's their problem.

2. If configured through an API to be defined, tx_burst() can be split in
   two and applications must call tx_prepare() at some point before
   tx_burst().

3. Otherwise tx_burst() should perform the necessary preparation and checks
   on its own by default (when tx_prepare() is not expected).

4. We probably still need some mbuf flag to mark mbufs that cannot be
   modified, the refcount could also serve as a hint.

Anything else?

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-05 15:03                                 ` Adrien Mazarguil
@ 2016-12-05 16:43                                   ` Ananyev, Konstantin
  2016-12-05 18:10                                     ` Adrien Mazarguil
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-05 16:43 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

Hi Adrien,

> 
> Hi Konstantin,
> 
> On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> [...]
> > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > [...]
> > > > > Something is definitely needed here, and only PMDs can provide it. I think
> > > > > applications should not have to clear checksum fields or initialize them to
> > > > > some magic value, same goes for any other offload or hardware limitation
> > > > > that needs to be worked around.
> > > > >
> > > > > tx_prep() is one possible answer to this issue, however as mentioned in the
> > > > > original patch it can be very expensive if exposed by the PMD.
> > > > >
> > > > > Another issue I'm more concerned about is the way limitations are managed
> > > > > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> > > > > structure contains new fields that are only relevant to a few devices, and I
> > > > > fear it will keep growing with each new hardware quirk to manage, breaking
> > > > > ABIs in the process.
> > > >
> > > > Well, if some new HW capability/limitation would arise and we'd like to support
> > > > it in DPDK, then yes we probably would need to think how to incorporate it here.
> > > > Do you have anything particular in mind here?
> > >
> > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > would like to add a field to expose some limitation that only applies to my
> > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > size when VLAN tagging is requested.
> >
> > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > sure, feel free to submit the patch.
> 
> I won't, that was hypothetical.

Then why we discussing it? :)

> 
> > > PMDs are free to set that field to some
> > > special value (say, 0) if they do not care.
> > >
> > > Since that field exists however, conscious applications should check its
> > > value for each packet that needs to be transmitted. This extra code causes a
> > > slowdown just by sitting in the data path. Since it is not the only field in
> > > that structure, the performance impact can be significant.
> > >
> > > Even though this code is inside applications, it remains unfair to PMDs for
> > > which these tests are irrelevant. This problem is identified and addressed
> > > by tx_prepare().
> >
> > I suppose the question is why do we need:
> > uint16_t nb_seg_max;
> > uint16_t nb_mtu_seg_max;
> > as we now have tx_prepare(), right?
> >
> > For two reasons:
> > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > for them and would prefer to do necessary preparations for TX offloads themselves.
> >
> > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > As an example ixgbe can't TX packets with then 40 segments.
> > Obviously ixbge_tx_prep() performs that check and returns an error.
> 
> Problem is that tx_prepare() also provides safeties which are not part of
> tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> fact struct rte_eth_desc_lim can grow new fields anytime, application
> developers will be tempted to just call tx_prepare() and focus on more
> useful things.

NP with that, that was an intention beyond introducing it.

> 
> Put another way, from a user's point of view, tx_prepare() is an opaque
> function that greatly increases tx_burst()'s ability to send mbufs as
> requested, with extra error checking on top; applications not written to run
> on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> at some point.
> 
> > But it wouldn't try to merge/reallocate mbufs for you.
> > User still has to do it himself, or just prevent creating such long chains somehow.
> 
> Yes, that's another debate. PMDs could still implement a software fallback
> for unlikely slow events like these. The number of PMDs is not going to
> decrease, each device having its own set of weird limitations in specific
> cases, PMDs should do their best to process mbufs even if that means slowly
> due to the lack of preparation.
> 
> tx_prepare() has its uses but should really be optional, in the sense that
> if that function is not called, tx_burst() should deal with it somehow.

As I said before, I don't think it is a good idea to put everything in tx_burst().
If PMD driver prefer things that way, yes tx_burst() can deal with each and
possible offload requirement itself, but it shouldn't be mandatory. 

> 
> > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > belong. PMDs that do not need them do not have to provide support for
> > > tx_prepare() and do not suffer any performance impact as result;
> > > applications only have to make sure tx_prepare() is always called at some
> > > point before tx_burst().
> > >
> > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > therefore tx_prepare() + tx_burst().
> > >
> > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > I'm focusing on applications, for which the performance impact of calling
> > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > performing all the necessary preparation at once.
> > >
> > > [...]
> > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > of having to:
> > > > >
> > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > >      tx_burst(); // iterate and send
> > > > >
> > > > > Compared to:
> > > > >
> > > > >  tx_burst(); // iterate, update as needed and send
> > > >
> > > > I think that was discussed extensively quite a lot previously here:
> > > > As Thomas already replied - main motivation is to allow user
> > > > to execute them on different stages of packet TX pipeline,
> > > > and probably on different cores.
> > > > I think that provides better flexibility to the user to when/where
> > > > do these preparations and hopefully would lead to better performance.
> > >
> > > And I agree, I think this use case is valid but does not warrant such a high
> > > penalty when your application does not need that much flexibility. Simple
> > > (yet conscious) applications need the highest performance. Complex ones as
> > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > of extra CPU cycles right?
> >
> > It would mean an extra cache-miss for every packet, so I think performance hit
> > would be quite significant.
> 
> A performance hit has to occur somewhere regardless, because something has
> to be done in order to send packets that need it. Whether this cost is in
> application code or in a PMD function, it remains part of TX.

Depending on the place the final cost would differ quite a lot.
If you call tx_prepare() somewhere close to the place where you fill the packet header
contents, then most likely the data that tx_prepare() has to access will be already in the cache.
So the performance penalty will be minimal.
If you'll try to access the same data later (at tx_burst), then the possibility that it would still
be in cache is much less.
If you calling tx_burst() from other core then data would for sure be out of cache,
and  even worse can still be in another core cache.

> 
> > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > Why do you believe that:
> > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> 
> I mean instead of two function calls with their own loops:
> 
>  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> 
>  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> 
> You end up with one:
> 
>  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> 
> Which usually is more efficient.

I really doubt that.
If it would be that, what is the point to process packet in bulks?
Usually dividing processing into different stages and at each stage processing
multiple packet at once helps to improve performance.
At  least for IA.
Look for example how we had to change l3fwd to improve its performance.

> 
> > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > - read mbuf fileds (2 cache-lines),
> > - read packet header (1/2 cache-lines)
> > - calculate pseudo-header csum
> >  - update packet header
> > Comparing to that price of extra function call seems neglectable
> > (if we TX packets in bursts of course).
> 
> We agree its performance is a critical issue then, sharing half the read
> steps with tx_burst() would make sense to me.

I didn't understand that sentence.

> 
> > > Yes they will, therefore we need a method that satisfies both cases.
> > >
> > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > checks and things it would otherwise have done.
> >
> > That's an interesting idea, but it has one drawback:
> > As I understand, it means that from now on if user doing preparations on his own,
> > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > So any existing applications that using TX offloads and do preparation by themselves
> > would have to be modified to avoid performance loss.
> 
> In my opinion, users should not do preparation on their own.

People already do it now.

> If we provide a
> generic method, it has to be fast enough to replace theirs. Perhaps not as
> fast since it would work with all PMDs (usual trade-off), but acceptably so.
> 
> > > Another possibility, telling the PMD first that you always intend to use
> > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> >
> > That what we have right now (at least for Intel HW):
> > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > Now he can use a 'proper' PMD provided function.
> >
> > My vote still would be for that model.
> 
> OK, then in a nutshell:
> 
> 1. Users are not expected to perform preparation/checks themselves anymore,
>    if they do, it's their problem.

I think we need to be backward compatible here.
If the existing app doing what tx_prepare() supposed to do, it should keep working.
 
> 
> 2. If configured through an API to be defined, tx_burst() can be split in
>    two and applications must call tx_prepare() at some point before
>    tx_burst().
> 
> 3. Otherwise tx_burst() should perform the necessary preparation and checks
>    on its own by default (when tx_prepare() is not expected).

As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
If some particular implementation of tx_burst() prefers to do things that way - that's fine.
But it shouldn't be required to.

> 
> 4. We probably still need some mbuf flag to mark mbufs that cannot be
>    modified, the refcount could also serve as a hint.

If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
tx_prepare() in that case.  

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-05 16:43                                   ` Ananyev, Konstantin
@ 2016-12-05 18:10                                     ` Adrien Mazarguil
  2016-12-06 10:56                                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-05 18:10 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

On Mon, Dec 05, 2016 at 04:43:52PM +0000, Ananyev, Konstantin wrote:
[...]
> > On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> > [...]
> > > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > > [...]
> > > > > Do you have anything particular in mind here?
> > > >
> > > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > > would like to add a field to expose some limitation that only applies to my
> > > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > > size when VLAN tagging is requested.
> > >
> > > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > > sure, feel free to submit the patch.
> > 
> > I won't, that was hypothetical.
> 
> Then why we discussing it? :)

Just to make a point, which is that new limitations may appear anytime and
tx_prepare() can now be used to check for them. First patch of the series
does it:

 +   uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
 +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */

And states that:

 + * For each packet to send, the rte_eth_tx_prepare() function performs
 + * the following operations:
 + *
 + * - Check if packet meets devices requirements for tx offloads.
 + *
 + * - Check limitations about number of segments.
 + *
 + * - Check additional requirements when debug is enabled.
 + *
 + * - Update and/or reset required checksums when tx offload is set for packet.

It's like making this function mandatory IMO.

> > > > PMDs are free to set that field to some
> > > > special value (say, 0) if they do not care.
> > > >
> > > > Since that field exists however, conscious applications should check its
> > > > value for each packet that needs to be transmitted. This extra code causes a
> > > > slowdown just by sitting in the data path. Since it is not the only field in
> > > > that structure, the performance impact can be significant.
> > > >
> > > > Even though this code is inside applications, it remains unfair to PMDs for
> > > > which these tests are irrelevant. This problem is identified and addressed
> > > > by tx_prepare().
> > >
> > > I suppose the question is why do we need:
> > > uint16_t nb_seg_max;
> > > uint16_t nb_mtu_seg_max;
> > > as we now have tx_prepare(), right?
> > >
> > > For two reasons:
> > > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > > for them and would prefer to do necessary preparations for TX offloads themselves.
> > >
> > > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > > As an example ixgbe can't TX packets with then 40 segments.
> > > Obviously ixbge_tx_prep() performs that check and returns an error.
> > 
> > Problem is that tx_prepare() also provides safeties which are not part of
> > tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> > fact struct rte_eth_desc_lim can grow new fields anytime, application
> > developers will be tempted to just call tx_prepare() and focus on more
> > useful things.
> 
> NP with that, that was an intention beyond introducing it.
> 
> > Put another way, from a user's point of view, tx_prepare() is an opaque
> > function that greatly increases tx_burst()'s ability to send mbufs as
> > requested, with extra error checking on top; applications not written to run
> > on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> > at some point.
> > 
> > > But it wouldn't try to merge/reallocate mbufs for you.
> > > User still has to do it himself, or just prevent creating such long chains somehow.
> > 
> > Yes, that's another debate. PMDs could still implement a software fallback
> > for unlikely slow events like these. The number of PMDs is not going to
> > decrease, each device having its own set of weird limitations in specific
> > cases, PMDs should do their best to process mbufs even if that means slowly
> > due to the lack of preparation.
> > 
> > tx_prepare() has its uses but should really be optional, in the sense that
> > if that function is not called, tx_burst() should deal with it somehow.
> 
> As I said before, I don't think it is a good idea to put everything in tx_burst().
> If PMD driver prefer things that way, yes tx_burst() can deal with each and
> possible offload requirement itself, but it shouldn't be mandatory. 

In effect, having to call tx_prepare() otherwise makes this step mandatory
anyway. Looks like we are not going to agree here.

> > > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > > belong. PMDs that do not need them do not have to provide support for
> > > > tx_prepare() and do not suffer any performance impact as result;
> > > > applications only have to make sure tx_prepare() is always called at some
> > > > point before tx_burst().
> > > >
> > > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > > therefore tx_prepare() + tx_burst().
> > > >
> > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > > I'm focusing on applications, for which the performance impact of calling
> > > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > > performing all the necessary preparation at once.
> > > >
> > > > [...]
> > > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > > of having to:
> > > > > >
> > > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > > >      tx_burst(); // iterate and send
> > > > > >
> > > > > > Compared to:
> > > > > >
> > > > > >  tx_burst(); // iterate, update as needed and send
> > > > >
> > > > > I think that was discussed extensively quite a lot previously here:
> > > > > As Thomas already replied - main motivation is to allow user
> > > > > to execute them on different stages of packet TX pipeline,
> > > > > and probably on different cores.
> > > > > I think that provides better flexibility to the user to when/where
> > > > > do these preparations and hopefully would lead to better performance.
> > > >
> > > > And I agree, I think this use case is valid but does not warrant such a high
> > > > penalty when your application does not need that much flexibility. Simple
> > > > (yet conscious) applications need the highest performance. Complex ones as
> > > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > > of extra CPU cycles right?
> > >
> > > It would mean an extra cache-miss for every packet, so I think performance hit
> > > would be quite significant.
> > 
> > A performance hit has to occur somewhere regardless, because something has
> > to be done in order to send packets that need it. Whether this cost is in
> > application code or in a PMD function, it remains part of TX.
> 
> Depending on the place the final cost would differ quite a lot.
> If you call tx_prepare() somewhere close to the place where you fill the packet header
> contents, then most likely the data that tx_prepare() has to access will be already in the cache.
> So the performance penalty will be minimal.
> If you'll try to access the same data later (at tx_burst), then the possibility that it would still
> be in cache is much less.
> If you calling tx_burst() from other core then data would for sure be out of cache,
> and  even worse can still be in another core cache.

Well sure, that's why I also think tx_prepare() has its uses, only that
since tx_prepare() is optional, tx_burst() should provide the same
functionality when tx_prepare() is not called.

> > > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > > Why do you believe that:
> > > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> > 
> > I mean instead of two function calls with their own loops:
> > 
> >  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> > 
> >  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> > 
> > You end up with one:
> > 
> >  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> > 
> > Which usually is more efficient.
> 
> I really doubt that.
> If it would be that, what is the point to process packet in bulks?
> Usually dividing processing into different stages and at each stage processing
> multiple packet at once helps to improve performance.
> At  least for IA.
> Look for example how we had to change l3fwd to improve its performance.

Depends quite a bit on usage pattern. It is less efficient for applications
that do not modify mbuf contents because of the additional function call and
inner loop.

Note that I'm only pushing for the ability to conveniently address both
cases with maximum performance.

> > > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > > - read mbuf fileds (2 cache-lines),
> > > - read packet header (1/2 cache-lines)
> > > - calculate pseudo-header csum
> > >  - update packet header
> > > Comparing to that price of extra function call seems neglectable
> > > (if we TX packets in bursts of course).
> > 
> > We agree its performance is a critical issue then, sharing half the read
> > steps with tx_burst() would make sense to me.
> 
> I didn't understand that sentence.

I meant this step can be shared (in addition to loop etc):

 - read mbuf fileds (2 cache-lines),

> > > > Yes they will, therefore we need a method that satisfies both cases.
> > > >
> > > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > > checks and things it would otherwise have done.
> > >
> > > That's an interesting idea, but it has one drawback:
> > > As I understand, it means that from now on if user doing preparations on his own,
> > > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > > So any existing applications that using TX offloads and do preparation by themselves
> > > would have to be modified to avoid performance loss.
> > 
> > In my opinion, users should not do preparation on their own.
> 
> People already do it now.

But we do not want them to anymore thanks to this new API, for reasons
described in the motivation section of the cover letter, right?

> > If we provide a
> > generic method, it has to be fast enough to replace theirs. Perhaps not as
> > fast since it would work with all PMDs (usual trade-off), but acceptably so.
> > 
> > > > Another possibility, telling the PMD first that you always intend to use
> > > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> > >
> > > That what we have right now (at least for Intel HW):
> > > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > > Now he can use a 'proper' PMD provided function.
> > >
> > > My vote still would be for that model.
> > 
> > OK, then in a nutshell:
> > 
> > 1. Users are not expected to perform preparation/checks themselves anymore,
> >    if they do, it's their problem.
> 
> I think we need to be backward compatible here.
> If the existing app doing what tx_prepare() supposed to do, it should keep working.

It should work, only if they keep doing it as well as call tx_burst()
directly, they will likely get lower performance.

> > 2. If configured through an API to be defined, tx_burst() can be split in
> >    two and applications must call tx_prepare() at some point before
> >    tx_burst().
> > 
> > 3. Otherwise tx_burst() should perform the necessary preparation and checks
> >    on its own by default (when tx_prepare() is not expected).
> 
> As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
> If some particular implementation of tx_burst() prefers to do things that way - that's fine.
> But it shouldn't be required to.

You're right, however applications might find it convenient. I think most
will end up with something like the following:

 if (tx_prepare(pkts))
     tx_burst(pkts));

> > 4. We probably still need some mbuf flag to mark mbufs that cannot be
> >    modified, the refcount could also serve as a hint.
> 
> If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
> tx_prepare() in that case.  

I think it would be easier to document what offload flags may cause the
tx_burst() function to modify mbuf contents, so applications have the
ability to set or strip these flags on a mbuf basis. That way there is no
need to call tx_prepare() without knowing exactly what it's going to do.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-05 18:10                                     ` Adrien Mazarguil
@ 2016-12-06 10:56                                       ` Ananyev, Konstantin
  2016-12-06 13:59                                         ` Adrien Mazarguil
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-06 10:56 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz


Hi Adrien,

> 
> On Mon, Dec 05, 2016 at 04:43:52PM +0000, Ananyev, Konstantin wrote:
> [...]
> > > On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> > > [...]
> > > > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > > > [...]
> > > > > > Do you have anything particular in mind here?
> > > > >
> > > > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > > > would like to add a field to expose some limitation that only applies to my
> > > > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > > > size when VLAN tagging is requested.
> > > >
> > > > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > > > sure, feel free to submit the patch.
> > >
> > > I won't, that was hypothetical.
> >
> > Then why we discussing it? :)
> 
> Just to make a point, which is that new limitations may appear anytime and
> tx_prepare() can now be used to check for them. First patch of the series
> does it:
> 
>  +   uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
>  +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> 
> And states that:
> 
>  + * For each packet to send, the rte_eth_tx_prepare() function performs
>  + * the following operations:
>  + *
>  + * - Check if packet meets devices requirements for tx offloads.
>  + *
>  + * - Check limitations about number of segments.
>  + *
>  + * - Check additional requirements when debug is enabled.
>  + *
>  + * - Update and/or reset required checksums when tx offload is set for packet.

I think I already explained in my previous email why I think that
nb_seg_max and nb_mtu_seg_max are not redundant because of tx_prepare().
>From my point they are complement to tx_prepare(): 
Even if people do use tx_prepare() they still should take this information into account.
As an example ixgbe can't TX packets with then 40 segments.
tx_prepare() for ixgbe will flag that issue, but it can't make a decision on user behalf 
what to do in that case: drop the packet, try to coalesce it into the packet with less
number of segments, split the packet into several smaller, etc.
That's up to user to make such decision, and to make it, user might need this information.

> 
> It's like making this function mandatory IMO.

That's probably where confusion starts: I don't think that
tx_prepare() should be mandatory for the user to call.
Yes, it should be a recommended way.
But the user still should have the ability to by-pass it,
if he believes there is no need for it, or he prefers to implement
the same functionality on his own.
As an example,  if the user knows that he is going to send  a group
of one-segment packets that don't require any tx offloads, he can safely skip
tx_prepare() for them.

> 
> > > > > PMDs are free to set that field to some
> > > > > special value (say, 0) if they do not care.
> > > > >
> > > > > Since that field exists however, conscious applications should check its
> > > > > value for each packet that needs to be transmitted. This extra code causes a
> > > > > slowdown just by sitting in the data path. Since it is not the only field in
> > > > > that structure, the performance impact can be significant.

Conscious user will  probably use this information at the stage of packet formation.
He probably has to do this sort of things for large packets anyway:
Check what is the underlying mtu, to decide does he need to split the packet,
or enable tso for it, etc.  

> > > > >
> > > > > Even though this code is inside applications, it remains unfair to PMDs for
> > > > > which these tests are irrelevant. This problem is identified and addressed
> > > > > by tx_prepare().
> > > >
> > > > I suppose the question is why do we need:
> > > > uint16_t nb_seg_max;
> > > > uint16_t nb_mtu_seg_max;
> > > > as we now have tx_prepare(), right?
> > > >
> > > > For two reasons:
> > > > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > > > for them and would prefer to do necessary preparations for TX offloads themselves.
> > > >
> > > > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > > > As an example ixgbe can't TX packets with then 40 segments.
> > > > Obviously ixbge_tx_prep() performs that check and returns an error.
> > >
> > > Problem is that tx_prepare() also provides safeties which are not part of
> > > tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> > > fact struct rte_eth_desc_lim can grow new fields anytime, application
> > > developers will be tempted to just call tx_prepare() and focus on more
> > > useful things.
> >
> > NP with that, that was an intention beyond introducing it.
> >
> > > Put another way, from a user's point of view, tx_prepare() is an opaque
> > > function that greatly increases tx_burst()'s ability to send mbufs as
> > > requested, with extra error checking on top; applications not written to run
> > > on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> > > at some point.
> > >
> > > > But it wouldn't try to merge/reallocate mbufs for you.
> > > > User still has to do it himself, or just prevent creating such long chains somehow.
> > >
> > > Yes, that's another debate. PMDs could still implement a software fallback
> > > for unlikely slow events like these. The number of PMDs is not going to
> > > decrease, each device having its own set of weird limitations in specific
> > > cases, PMDs should do their best to process mbufs even if that means slowly
> > > due to the lack of preparation.
> > >
> > > tx_prepare() has its uses but should really be optional, in the sense that
> > > if that function is not called, tx_burst() should deal with it somehow.
> >
> > As I said before, I don't think it is a good idea to put everything in tx_burst().
> > If PMD driver prefer things that way, yes tx_burst() can deal with each and
> > possible offload requirement itself, but it shouldn't be mandatory.
> 
> In effect, having to call tx_prepare() otherwise makes this step mandatory
> anyway. Looks like we are not going to agree here.
> 
> > > > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > > > belong. PMDs that do not need them do not have to provide support for
> > > > > tx_prepare() and do not suffer any performance impact as result;
> > > > > applications only have to make sure tx_prepare() is always called at some
> > > > > point before tx_burst().
> > > > >
> > > > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > > > therefore tx_prepare() + tx_burst().
> > > > >
> > > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > > > I'm focusing on applications, for which the performance impact of calling
> > > > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > > > performing all the necessary preparation at once.
> > > > >
> > > > > [...]
> > > > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > > > of having to:
> > > > > > >
> > > > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > > > >      tx_burst(); // iterate and send
> > > > > > >
> > > > > > > Compared to:
> > > > > > >
> > > > > > >  tx_burst(); // iterate, update as needed and send
> > > > > >
> > > > > > I think that was discussed extensively quite a lot previously here:
> > > > > > As Thomas already replied - main motivation is to allow user
> > > > > > to execute them on different stages of packet TX pipeline,
> > > > > > and probably on different cores.
> > > > > > I think that provides better flexibility to the user to when/where
> > > > > > do these preparations and hopefully would lead to better performance.
> > > > >
> > > > > And I agree, I think this use case is valid but does not warrant such a high
> > > > > penalty when your application does not need that much flexibility. Simple
> > > > > (yet conscious) applications need the highest performance. Complex ones as
> > > > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > > > of extra CPU cycles right?
> > > >
> > > > It would mean an extra cache-miss for every packet, so I think performance hit
> > > > would be quite significant.
> > >
> > > A performance hit has to occur somewhere regardless, because something has
> > > to be done in order to send packets that need it. Whether this cost is in
> > > application code or in a PMD function, it remains part of TX.
> >
> > Depending on the place the final cost would differ quite a lot.
> > If you call tx_prepare() somewhere close to the place where you fill the packet header
> > contents, then most likely the data that tx_prepare() has to access will be already in the cache.
> > So the performance penalty will be minimal.
> > If you'll try to access the same data later (at tx_burst), then the possibility that it would still
> > be in cache is much less.
> > If you calling tx_burst() from other core then data would for sure be out of cache,
> > and  even worse can still be in another core cache.
> 
> Well sure, that's why I also think tx_prepare() has its uses, only that
> since tx_prepare() is optional, tx_burst() should provide the same
> functionality when tx_prepare() is not called.

As I understand, to implement what you are proposing (TX_PREPARED mbuf->ol_flag)
it will be required: 

a) Modify all existing applications that do similar to tx_prepare() stuff on their own,
otherwise they'll would hit performance penalty.
b) Modify at least all Intel PMDs and might be some others too (vmxnet3?).

Step b) probably wouldn't cause any significant performance impact straightway,
but it's for sure wouldn't make things faster, and would increase tx_burst() code
complexity quite a lot. 
>From other side, I can't see any real benefit that we will have in return.
So I still opposed to that idea.

> 
> > > > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > > > Why do you believe that:
> > > > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> > >
> > > I mean instead of two function calls with their own loops:
> > >
> > >  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> > >
> > >  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> > >
> > > You end up with one:
> > >
> > >  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> > >
> > > Which usually is more efficient.
> >
> > I really doubt that.
> > If it would be that, what is the point to process packet in bulks?
> > Usually dividing processing into different stages and at each stage processing
> > multiple packet at once helps to improve performance.
> > At  least for IA.
> > Look for example how we had to change l3fwd to improve its performance.
> 
> Depends quite a bit on usage pattern. It is less efficient for applications
> that do not modify mbuf contents because of the additional function call and
> inner loop.

If the application doesn't modify mbuf contents that it can simply skip calling tx_prepare().

> 
> Note that I'm only pushing for the ability to conveniently address both
> cases with maximum performance.
> 
> > > > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > > > - read mbuf fileds (2 cache-lines),
> > > > - read packet header (1/2 cache-lines)
> > > > - calculate pseudo-header csum
> > > >  - update packet header
> > > > Comparing to that price of extra function call seems neglectable
> > > > (if we TX packets in bursts of course).
> > >
> > > We agree its performance is a critical issue then, sharing half the read
> > > steps with tx_burst() would make sense to me.
> >
> > I didn't understand that sentence.
> 
> I meant this step can be shared (in addition to loop etc):
> 
>  - read mbuf fileds (2 cache-lines),

Ah ok, you still believe that mixing tx_burst and tx_prepare code together
would give us noticeable performance benefit.
As I said above, I don't think it would, but you are welcome to try and
prove me wrong.

> 
> > > > > Yes they will, therefore we need a method that satisfies both cases.
> > > > >
> > > > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > > > checks and things it would otherwise have done.
> > > >
> > > > That's an interesting idea, but it has one drawback:
> > > > As I understand, it means that from now on if user doing preparations on his own,
> > > > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > > > So any existing applications that using TX offloads and do preparation by themselves
> > > > would have to be modified to avoid performance loss.
> > >
> > > In my opinion, users should not do preparation on their own.
> >
> > People already do it now.
> 
> But we do not want them to anymore thanks to this new API, for reasons
> described in the motivation section of the cover letter, right?

We probably wouldn't recommend that, but if people would like to use their own stuff,
or shortcuts - I don't want to stop them here.

> 
> > > If we provide a
> > > generic method, it has to be fast enough to replace theirs. Perhaps not as
> > > fast since it would work with all PMDs (usual trade-off), but acceptably so.
> > >
> > > > > Another possibility, telling the PMD first that you always intend to use
> > > > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> > > >
> > > > That what we have right now (at least for Intel HW):
> > > > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > > > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > > > Now he can use a 'proper' PMD provided function.
> > > >
> > > > My vote still would be for that model.
> > >
> > > OK, then in a nutshell:
> > >
> > > 1. Users are not expected to perform preparation/checks themselves anymore,
> > >    if they do, it's their problem.
> >
> > I think we need to be backward compatible here.
> > If the existing app doing what tx_prepare() supposed to do, it should keep working.
> 
> It should work, only if they keep doing it as well as call tx_burst()
> directly, they will likely get lower performance.
> 
> > > 2. If configured through an API to be defined, tx_burst() can be split in
> > >    two and applications must call tx_prepare() at some point before
> > >    tx_burst().
> > >
> > > 3. Otherwise tx_burst() should perform the necessary preparation and checks
> > >    on its own by default (when tx_prepare() is not expected).
> >
> > As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
> > If some particular implementation of tx_burst() prefers to do things that way - that's fine.
> > But it shouldn't be required to.
> 
> You're right, however applications might find it convenient. I think most
> will end up with something like the following:
> 
>  if (tx_prepare(pkts))
>      tx_burst(pkts));

Looking at existing DPDK apps - most of them do use some sort of TX bufferization.
So, even in a simplistic app it would probably be:
  
tx_prepare(pkts);
tx_buffer(pkts);

> 
> > > 4. We probably still need some mbuf flag to mark mbufs that cannot be
> > >    modified, the refcount could also serve as a hint.
> >
> > If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
> > tx_prepare() in that case.
> 
> I think it would be easier to document what offload flags may cause the
> tx_burst() function to modify mbuf contents, so applications have the
> ability to set or strip these flags on a mbuf basis. That way there is no
> need to call tx_prepare() without knowing exactly what it's going to do.

Not sure I understand what exactly do you propose in the last paragraph?
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 10:56                                       ` Ananyev, Konstantin
@ 2016-12-06 13:59                                         ` Adrien Mazarguil
  2016-12-06 20:31                                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-06 13:59 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

Hi Konstantin,

On Tue, Dec 06, 2016 at 10:56:26AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Adrien,
> 
> > 
> > On Mon, Dec 05, 2016 at 04:43:52PM +0000, Ananyev, Konstantin wrote:
> > [...]
> > > > On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> > > > [...]
> > > > > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > > > > [...]
> > > > > > > Do you have anything particular in mind here?
> > > > > >
> > > > > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > > > > would like to add a field to expose some limitation that only applies to my
> > > > > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > > > > size when VLAN tagging is requested.
> > > > >
> > > > > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > > > > sure, feel free to submit the patch.
> > > >
> > > > I won't, that was hypothetical.
> > >
> > > Then why we discussing it? :)
> > 
> > Just to make a point, which is that new limitations may appear anytime and
> > tx_prepare() can now be used to check for them. First patch of the series
> > does it:
> > 
> >  +   uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> >  +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> > 
> > And states that:
> > 
> >  + * For each packet to send, the rte_eth_tx_prepare() function performs
> >  + * the following operations:
> >  + *
> >  + * - Check if packet meets devices requirements for tx offloads.
> >  + *
> >  + * - Check limitations about number of segments.
> >  + *
> >  + * - Check additional requirements when debug is enabled.
> >  + *
> >  + * - Update and/or reset required checksums when tx offload is set for packet.
> 
> I think I already explained in my previous email why I think that
> nb_seg_max and nb_mtu_seg_max are not redundant because of tx_prepare().
> From my point they are complement to tx_prepare(): 
> Even if people do use tx_prepare() they still should take this information into account.
> As an example ixgbe can't TX packets with then 40 segments.
> tx_prepare() for ixgbe will flag that issue, but it can't make a decision on user behalf 
> what to do in that case: drop the packet, try to coalesce it into the packet with less
> number of segments, split the packet into several smaller, etc.
> That's up to user to make such decision, and to make it, user might need this information.

Yet tx_prepare() has already the ability to update mbuf contents, issue is
what will this function do in the future, where will it stop? It is defined
in a way that each PMD does what it wants to make mbufs edible for
tx_burst(), because of this applications will just always call it to be on
the safe side.

> > It's like making this function mandatory IMO.
> 
> That's probably where confusion starts: I don't think that
> tx_prepare() should be mandatory for the user to call.
> Yes, it should be a recommended way.
> But the user still should have the ability to by-pass it,
> if he believes there is no need for it, or he prefers to implement
> the same functionality on his own.
> As an example,  if the user knows that he is going to send  a group
> of one-segment packets that don't require any tx offloads, he can safely skip
> tx_prepare() for them.

I understand your point, and agree with the example you provide. Many
applications do not know what's inside mbufs though, except perhaps that
they contain TCP and may want to perform TSO because of that. Those will
have to call tx_prepare() to be future-proof.

> > > > > > PMDs are free to set that field to some
> > > > > > special value (say, 0) if they do not care.
> > > > > >
> > > > > > Since that field exists however, conscious applications should check its
> > > > > > value for each packet that needs to be transmitted. This extra code causes a
> > > > > > slowdown just by sitting in the data path. Since it is not the only field in
> > > > > > that structure, the performance impact can be significant.
> 
> Conscious user will  probably use this information at the stage of packet formation.
> He probably has to do this sort of things for large packets anyway:
> Check what is the underlying mtu, to decide does he need to split the packet,
> or enable tso for it, etc.  

There are already too many things to check, applications probably won't mind
a little help from PMDs. If we keep adding fields to this structure, we'll
have to provide some sort of PMD-specific function that checks what is
relevant.

Furthermore, assuming most packets are fine and do not require any extra
processing, what is rejected by tx_burst() could enter some unlikely() path
that attempts to rectify and re-send them. That would at least optimize the
common scenario.

> > > > > > Even though this code is inside applications, it remains unfair to PMDs for
> > > > > > which these tests are irrelevant. This problem is identified and addressed
> > > > > > by tx_prepare().
> > > > >
> > > > > I suppose the question is why do we need:
> > > > > uint16_t nb_seg_max;
> > > > > uint16_t nb_mtu_seg_max;
> > > > > as we now have tx_prepare(), right?
> > > > >
> > > > > For two reasons:
> > > > > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > > > > for them and would prefer to do necessary preparations for TX offloads themselves.
> > > > >
> > > > > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > > > > As an example ixgbe can't TX packets with then 40 segments.
> > > > > Obviously ixbge_tx_prep() performs that check and returns an error.
> > > >
> > > > Problem is that tx_prepare() also provides safeties which are not part of
> > > > tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> > > > fact struct rte_eth_desc_lim can grow new fields anytime, application
> > > > developers will be tempted to just call tx_prepare() and focus on more
> > > > useful things.
> > >
> > > NP with that, that was an intention beyond introducing it.
> > >
> > > > Put another way, from a user's point of view, tx_prepare() is an opaque
> > > > function that greatly increases tx_burst()'s ability to send mbufs as
> > > > requested, with extra error checking on top; applications not written to run
> > > > on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> > > > at some point.
> > > >
> > > > > But it wouldn't try to merge/reallocate mbufs for you.
> > > > > User still has to do it himself, or just prevent creating such long chains somehow.
> > > >
> > > > Yes, that's another debate. PMDs could still implement a software fallback
> > > > for unlikely slow events like these. The number of PMDs is not going to
> > > > decrease, each device having its own set of weird limitations in specific
> > > > cases, PMDs should do their best to process mbufs even if that means slowly
> > > > due to the lack of preparation.
> > > >
> > > > tx_prepare() has its uses but should really be optional, in the sense that
> > > > if that function is not called, tx_burst() should deal with it somehow.
> > >
> > > As I said before, I don't think it is a good idea to put everything in tx_burst().
> > > If PMD driver prefer things that way, yes tx_burst() can deal with each and
> > > possible offload requirement itself, but it shouldn't be mandatory.
> > 
> > In effect, having to call tx_prepare() otherwise makes this step mandatory
> > anyway. Looks like we are not going to agree here.
> > 
> > > > > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > > > > belong. PMDs that do not need them do not have to provide support for
> > > > > > tx_prepare() and do not suffer any performance impact as result;
> > > > > > applications only have to make sure tx_prepare() is always called at some
> > > > > > point before tx_burst().
> > > > > >
> > > > > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > > > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > > > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > > > > therefore tx_prepare() + tx_burst().
> > > > > >
> > > > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > > > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > > > > I'm focusing on applications, for which the performance impact of calling
> > > > > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > > > > performing all the necessary preparation at once.
> > > > > >
> > > > > > [...]
> > > > > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > > > > of having to:
> > > > > > > >
> > > > > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > > > > >      tx_burst(); // iterate and send
> > > > > > > >
> > > > > > > > Compared to:
> > > > > > > >
> > > > > > > >  tx_burst(); // iterate, update as needed and send
> > > > > > >
> > > > > > > I think that was discussed extensively quite a lot previously here:
> > > > > > > As Thomas already replied - main motivation is to allow user
> > > > > > > to execute them on different stages of packet TX pipeline,
> > > > > > > and probably on different cores.
> > > > > > > I think that provides better flexibility to the user to when/where
> > > > > > > do these preparations and hopefully would lead to better performance.
> > > > > >
> > > > > > And I agree, I think this use case is valid but does not warrant such a high
> > > > > > penalty when your application does not need that much flexibility. Simple
> > > > > > (yet conscious) applications need the highest performance. Complex ones as
> > > > > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > > > > of extra CPU cycles right?
> > > > >
> > > > > It would mean an extra cache-miss for every packet, so I think performance hit
> > > > > would be quite significant.
> > > >
> > > > A performance hit has to occur somewhere regardless, because something has
> > > > to be done in order to send packets that need it. Whether this cost is in
> > > > application code or in a PMD function, it remains part of TX.
> > >
> > > Depending on the place the final cost would differ quite a lot.
> > > If you call tx_prepare() somewhere close to the place where you fill the packet header
> > > contents, then most likely the data that tx_prepare() has to access will be already in the cache.
> > > So the performance penalty will be minimal.
> > > If you'll try to access the same data later (at tx_burst), then the possibility that it would still
> > > be in cache is much less.
> > > If you calling tx_burst() from other core then data would for sure be out of cache,
> > > and  even worse can still be in another core cache.
> > 
> > Well sure, that's why I also think tx_prepare() has its uses, only that
> > since tx_prepare() is optional, tx_burst() should provide the same
> > functionality when tx_prepare() is not called.
> 
> As I understand, to implement what you are proposing (TX_PREPARED mbuf->ol_flag)
> it will be required: 
> 
> a) Modify all existing applications that do similar to tx_prepare() stuff on their own,
> otherwise they'll would hit performance penalty.
> b) Modify at least all Intel PMDs and might be some others too (vmxnet3?).
> 
> Step b) probably wouldn't cause any significant performance impact straightway,
> but it's for sure wouldn't make things faster, and would increase tx_burst() code
> complexity quite a lot. 
> From other side, I can't see any real benefit that we will have in return.
> So I still opposed to that idea.

Applications gain the ability to perform tx_burst() with offloads without
having to prepare anything. Currently these applications either cannot use
offloads at all or need to perform PMD-specific voodoo first. The generic
alternative to this scenario being tx_prepare(), PMDs have to make this step
as cheap as possible.

Yes that would slow down existing applications, people may find it
acceptable since we're modifying the TX API here.

> > > > > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > > > > Why do you believe that:
> > > > > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> > > >
> > > > I mean instead of two function calls with their own loops:
> > > >
> > > >  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> > > >
> > > >  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> > > >
> > > > You end up with one:
> > > >
> > > >  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> > > >
> > > > Which usually is more efficient.
> > >
> > > I really doubt that.
> > > If it would be that, what is the point to process packet in bulks?
> > > Usually dividing processing into different stages and at each stage processing
> > > multiple packet at once helps to improve performance.
> > > At  least for IA.
> > > Look for example how we had to change l3fwd to improve its performance.
> > 
> > Depends quite a bit on usage pattern. It is less efficient for applications
> > that do not modify mbuf contents because of the additional function call and
> > inner loop.
> 
> If the application doesn't modify mbuf contents that it can simply skip calling tx_prepare().

What if that same application wants to enable some offload as well?

> > Note that I'm only pushing for the ability to conveniently address both
> > cases with maximum performance.
> > 
> > > > > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > > > > - read mbuf fileds (2 cache-lines),
> > > > > - read packet header (1/2 cache-lines)
> > > > > - calculate pseudo-header csum
> > > > >  - update packet header
> > > > > Comparing to that price of extra function call seems neglectable
> > > > > (if we TX packets in bursts of course).
> > > >
> > > > We agree its performance is a critical issue then, sharing half the read
> > > > steps with tx_burst() would make sense to me.
> > >
> > > I didn't understand that sentence.
> > 
> > I meant this step can be shared (in addition to loop etc):
> > 
> >  - read mbuf fileds (2 cache-lines),
> 
> Ah ok, you still believe that mixing tx_burst and tx_prepare code together
> would give us noticeable performance benefit.
> As I said above, I don't think it would, but you are welcome to try and
> prove me wrong.

Depends on what you call noticeable. I guess we can at least agree having
two separate functions and loops cause more instructions to be generated and
executed.

Now for the number of spent CPU cycles, depends of course whether mbufs are
still hot into the cache or not, and as I told you in my opinion we'll
usually see applications calling tx_prepare() just before tx_burst() to
benefit from offloads.

> > > > > > Yes they will, therefore we need a method that satisfies both cases.
> > > > > >
> > > > > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > > > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > > > > checks and things it would otherwise have done.
> > > > >
> > > > > That's an interesting idea, but it has one drawback:
> > > > > As I understand, it means that from now on if user doing preparations on his own,
> > > > > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > > > > So any existing applications that using TX offloads and do preparation by themselves
> > > > > would have to be modified to avoid performance loss.
> > > >
> > > > In my opinion, users should not do preparation on their own.
> > >
> > > People already do it now.
> > 
> > But we do not want them to anymore thanks to this new API, for reasons
> > described in the motivation section of the cover letter, right?
> 
> We probably wouldn't recommend that, but if people would like to use their own stuff,
> or shortcuts - I don't want to stop them here.
> 
> > 
> > > > If we provide a
> > > > generic method, it has to be fast enough to replace theirs. Perhaps not as
> > > > fast since it would work with all PMDs (usual trade-off), but acceptably so.
> > > >
> > > > > > Another possibility, telling the PMD first that you always intend to use
> > > > > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> > > > >
> > > > > That what we have right now (at least for Intel HW):
> > > > > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > > > > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > > > > Now he can use a 'proper' PMD provided function.
> > > > >
> > > > > My vote still would be for that model.
> > > >
> > > > OK, then in a nutshell:
> > > >
> > > > 1. Users are not expected to perform preparation/checks themselves anymore,
> > > >    if they do, it's their problem.
> > >
> > > I think we need to be backward compatible here.
> > > If the existing app doing what tx_prepare() supposed to do, it should keep working.
> > 
> > It should work, only if they keep doing it as well as call tx_burst()
> > directly, they will likely get lower performance.
> > 
> > > > 2. If configured through an API to be defined, tx_burst() can be split in
> > > >    two and applications must call tx_prepare() at some point before
> > > >    tx_burst().
> > > >
> > > > 3. Otherwise tx_burst() should perform the necessary preparation and checks
> > > >    on its own by default (when tx_prepare() is not expected).
> > >
> > > As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
> > > If some particular implementation of tx_burst() prefers to do things that way - that's fine.
> > > But it shouldn't be required to.
> > 
> > You're right, however applications might find it convenient. I think most
> > will end up with something like the following:
> > 
> >  if (tx_prepare(pkts))
> >      tx_burst(pkts));
> 
> Looking at existing DPDK apps - most of them do use some sort of TX bufferization.
> So, even in a simplistic app it would probably be:
>   
> tx_prepare(pkts);
> tx_buffer(pkts);

We're down to my word against yours here I guess, to leave the choice to
application developers, we'd need to provide tx_prepare() and a simpler
tx_burst() as well as the ability to call tx_burst() directly and still get
offloads.

> > > > 4. We probably still need some mbuf flag to mark mbufs that cannot be
> > > >    modified, the refcount could also serve as a hint.
> > >
> > > If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
> > > tx_prepare() in that case.
> > 
> > I think it would be easier to document what offload flags may cause the
> > tx_burst() function to modify mbuf contents, so applications have the
> > ability to set or strip these flags on a mbuf basis. That way there is no
> > need to call tx_prepare() without knowing exactly what it's going to do.
> 
> Not sure I understand what exactly do you propose in the last paragraph?

That for each TX offload flag, we document whether preparation might cause a
mbuf to be written to during the tx_prepare()/tx_burst() phase. One of the
reasons for tx_prepare() being:

 4) Fields in packet may require different initialization (like e.g. will
    require pseudo-header checksum precalculation, sometimes in a
    different way depending on packet type, and so on). Now application
    needs to care about it.

If we determine what offloads may cause mbuf contents to change (all of them
perhaps?), then applications can easily strip those flags from outgoing
"const" mbufs. Then it becomes acceptable for tx_burst() to modify mbuf
contents as per user request, which removes one reason to rely on
tx_prepare() for these.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
                                           ` (4 preceding siblings ...)
  2016-12-01  8:24                         ` Rahul Lakkireddy
@ 2016-12-06 15:53                         ` Ferruh Yigit
  2016-12-07  7:55                           ` Andrew Rybchenko
                                             ` (2 more replies)
  5 siblings, 3 replies; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-06 15:53 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Jan Medala, Jakub Palider,
	Netanel Belgazal, Evgeny Schemeilin, Alejandro Lucero,
	Yuanhan Liu, Yong Wang, Andrew Rybchenko, Hemant Agrawal
  Cc: Tomasz Kulasek, konstantin.ananyev

On 11/28/2016 11:03 AM, Thomas Monjalon wrote:
> We need attention of every PMD developers on this thread.
> 
> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.
> The question is: does non-Intel drivers need a checksum preparation for TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"
>

Still waiting response from PMDs:
- ena
- nfp
- virtio

Waiting clarification for preparation requirements:
- vmxnet3

Also including new PMDs to the thread:
- sfc
- dpaa2

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-04 12:11                                 ` Ananyev, Konstantin
@ 2016-12-06 18:25                                   ` Yong Wang
  2016-12-07  9:57                                     ` Ferruh Yigit
  0 siblings, 1 reply; 261+ messages in thread
From: Yong Wang @ 2016-12-06 18:25 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz

> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Sunday, December 4, 2016 4:11 AM
> To: Yong Wang <yongwang@vmware.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Rasesh Mody
> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> Hi
> 
> 
> 
> > >
> 
> > > 2016-11-30 17:42, Ananyev, Konstantin:
> 
> > > > > >Please, we need a comment for each driver saying
> 
> > > > > >"it is OK, we do not need any checksum preparation for TSO"
> 
> > > > > >or
> 
> > > > > >"yes we have to implement tx_prepare or TSO will not work in this
> 
> > > mode"
> 
> > > > > >
> 
> > > > >
> 
> > > > > qede PMD doesn’t currently support TSO yet, it only supports Tx
> 
> > > TCP/UDP/IP
> 
> > > > > csum offloads.
> 
> > > > > So Tx preparation isn’t applicable. So as of now -
> 
> > > > > "it is OK, we do not need any checksum preparation for TSO"
> 
> > > >
> 
> > > > Thanks for the answer.
> 
> > > > Though please note that it not only for TSO.
> 
> > >
> 
> > > Oh yes, sorry, my wording was incorrect.
> 
> > > We need to know if any checksum preparation is needed prior
> 
> > > offloading its final computation to the hardware or driver.
> 
> > > So the question applies to TSO and simple checksum offload.
> 
> > >
> 
> > > We are still waiting answers for
> 
> > > 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> 
> >
> 
> > The case for a virtual device is a little bit more complicated as packets
> offloaded from a virtual device can eventually be delivered to
> 
> > another virtual NIC or different physical NICs that have different offload
> requirements.  In ESX, the hypervisor will enforce that the packets
> 
> > offloaded will be something that the hardware expects.  The contract for
> vmxnet3 is that the guest needs to fill in pseudo header checksum
> 
> > for both l4 checksum only and TSO + l4 checksum offload cases.
> 
> 
> 
> Ok, so at first glance that looks to me very similar to Intel HW requirements.
> 
> Could you confirm would rte_net_intel_cksum_prepare()
> 
> also work for vmxnet3 or some extra modifications are required?
> 
> You can look at it here: https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> 
> Note that for Intel HW the rules for pseudo-header csum calculation
> 
> differ for TSO and non-TSO case.
> 
> For TSO length inside pseudo-header are set to 0, while for non-tso case
> 
> It should be set to L3 payload length.
> 
> Is it the same for vmxnet3 or no?
> 
> Thanks
> 
> Konstantin
> 

Yes and this is the same for vmxnet3.

> >
> 
> > > > This is for any TX offload for which the upper layer SW would have
> 
> > > > to modify the contents of the packet.
> 
> > > > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> 
> > > PKT_TX_TCP_CKSUM
> 
> > > > exhibits any extra requirements for the user.
> 
> > > > Is that correct?
> 
> 


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 13:59                                         ` Adrien Mazarguil
@ 2016-12-06 20:31                                           ` Ananyev, Konstantin
  2016-12-07 10:08                                             ` Adrien Mazarguil
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-06 20:31 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz



> Hi Konstantin,
> 
> On Tue, Dec 06, 2016 at 10:56:26AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Adrien,
> >
> > >
> > > On Mon, Dec 05, 2016 at 04:43:52PM +0000, Ananyev, Konstantin wrote:
> > > [...]
> > > > > On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> > > > > [...]
> > > > > > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > > > > > [...]
> > > > > > > > Do you have anything particular in mind here?
> > > > > > >
> > > > > > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > > > > > would like to add a field to expose some limitation that only applies to my
> > > > > > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > > > > > size when VLAN tagging is requested.
> > > > > >
> > > > > > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > > > > > sure, feel free to submit the patch.
> > > > >
> > > > > I won't, that was hypothetical.
> > > >
> > > > Then why we discussing it? :)
> > >
> > > Just to make a point, which is that new limitations may appear anytime and
> > > tx_prepare() can now be used to check for them. First patch of the series
> > > does it:
> > >
> > >  +   uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> > >  +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> > >
> > > And states that:
> > >
> > >  + * For each packet to send, the rte_eth_tx_prepare() function performs
> > >  + * the following operations:
> > >  + *
> > >  + * - Check if packet meets devices requirements for tx offloads.
> > >  + *
> > >  + * - Check limitations about number of segments.
> > >  + *
> > >  + * - Check additional requirements when debug is enabled.
> > >  + *
> > >  + * - Update and/or reset required checksums when tx offload is set for packet.
> >
> > I think I already explained in my previous email why I think that
> > nb_seg_max and nb_mtu_seg_max are not redundant because of tx_prepare().
> > From my point they are complement to tx_prepare():
> > Even if people do use tx_prepare() they still should take this information into account.
> > As an example ixgbe can't TX packets with then 40 segments.
> > tx_prepare() for ixgbe will flag that issue, but it can't make a decision on user behalf
> > what to do in that case: drop the packet, try to coalesce it into the packet with less
> > number of segments, split the packet into several smaller, etc.
> > That's up to user to make such decision, and to make it, user might need this information.
> 
> Yet tx_prepare() has already the ability to update mbuf contents, issue is
> what will this function do in the future, where will it stop? It is defined
> in a way that each PMD does what it wants to make mbufs edible for
> tx_burst(), because of this applications will just always call it to be on
> the safe side.
> 
> > > It's like making this function mandatory IMO.
> >
> > That's probably where confusion starts: I don't think that
> > tx_prepare() should be mandatory for the user to call.
> > Yes, it should be a recommended way.
> > But the user still should have the ability to by-pass it,
> > if he believes there is no need for it, or he prefers to implement
> > the same functionality on his own.
> > As an example,  if the user knows that he is going to send  a group
> > of one-segment packets that don't require any tx offloads, he can safely skip
> > tx_prepare() for them.
> 
> I understand your point, and agree with the example you provide. Many
> applications do not know what's inside mbufs though, except perhaps that
> they contain TCP and may want to perform TSO because of that. Those will
> have to call tx_prepare() to be future-proof.
> 
> > > > > > > PMDs are free to set that field to some
> > > > > > > special value (say, 0) if they do not care.
> > > > > > >
> > > > > > > Since that field exists however, conscious applications should check its
> > > > > > > value for each packet that needs to be transmitted. This extra code causes a
> > > > > > > slowdown just by sitting in the data path. Since it is not the only field in
> > > > > > > that structure, the performance impact can be significant.
> >
> > Conscious user will  probably use this information at the stage of packet formation.
> > He probably has to do this sort of things for large packets anyway:
> > Check what is the underlying mtu, to decide does he need to split the packet,
> > or enable tso for it, etc.
> 
> There are already too many things to check, 

There always been, that patch exposes them, before that upper layer probably
had to use some hard-coded defines.

> applications probably won't mind
> a little help from PMDs. If we keep adding fields to this structure, we'll
> have to provide some sort of PMD-specific function that checks what is
> relevant.

Why PMD specific?
These fields are generic enough and could be used by upper layer to consult
when packet is formed.

> 
> Furthermore, assuming most packets are fine and do not require any extra
> processing, what is rejected by tx_burst() could enter some unlikely() path
> that attempts to rectify and re-send them. That would at least optimize the
> common scenario.

It is up to the upper layer to decide what to do with ill-formed packets:
drop/log/try to cure/etc.
Obviously different applications would have different logic and make different decisions here.
If you'd like to introduce a new function (in rte_net or whatever) that would be smart and
generic  enough to cure ill-formed packets - you are more than welcome to try.
Though discussion of such fallback function is far of scope of that patch, I believe.

> 
> > > > > > > Even though this code is inside applications, it remains unfair to PMDs for
> > > > > > > which these tests are irrelevant. This problem is identified and addressed
> > > > > > > by tx_prepare().
> > > > > >
> > > > > > I suppose the question is why do we need:
> > > > > > uint16_t nb_seg_max;
> > > > > > uint16_t nb_mtu_seg_max;
> > > > > > as we now have tx_prepare(), right?
> > > > > >
> > > > > > For two reasons:
> > > > > > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > > > > > for them and would prefer to do necessary preparations for TX offloads themselves.
> > > > > >
> > > > > > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > > > > > As an example ixgbe can't TX packets with then 40 segments.
> > > > > > Obviously ixbge_tx_prep() performs that check and returns an error.
> > > > >
> > > > > Problem is that tx_prepare() also provides safeties which are not part of
> > > > > tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> > > > > fact struct rte_eth_desc_lim can grow new fields anytime, application
> > > > > developers will be tempted to just call tx_prepare() and focus on more
> > > > > useful things.
> > > >
> > > > NP with that, that was an intention beyond introducing it.
> > > >
> > > > > Put another way, from a user's point of view, tx_prepare() is an opaque
> > > > > function that greatly increases tx_burst()'s ability to send mbufs as
> > > > > requested, with extra error checking on top; applications not written to run
> > > > > on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> > > > > at some point.
> > > > >
> > > > > > But it wouldn't try to merge/reallocate mbufs for you.
> > > > > > User still has to do it himself, or just prevent creating such long chains somehow.
> > > > >
> > > > > Yes, that's another debate. PMDs could still implement a software fallback
> > > > > for unlikely slow events like these. The number of PMDs is not going to
> > > > > decrease, each device having its own set of weird limitations in specific
> > > > > cases, PMDs should do their best to process mbufs even if that means slowly
> > > > > due to the lack of preparation.
> > > > >
> > > > > tx_prepare() has its uses but should really be optional, in the sense that
> > > > > if that function is not called, tx_burst() should deal with it somehow.
> > > >
> > > > As I said before, I don't think it is a good idea to put everything in tx_burst().
> > > > If PMD driver prefer things that way, yes tx_burst() can deal with each and
> > > > possible offload requirement itself, but it shouldn't be mandatory.
> > >
> > > In effect, having to call tx_prepare() otherwise makes this step mandatory
> > > anyway. Looks like we are not going to agree here.
> > >
> > > > > > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > > > > > belong. PMDs that do not need them do not have to provide support for
> > > > > > > tx_prepare() and do not suffer any performance impact as result;
> > > > > > > applications only have to make sure tx_prepare() is always called at some
> > > > > > > point before tx_burst().
> > > > > > >
> > > > > > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > > > > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > > > > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > > > > > therefore tx_prepare() + tx_burst().
> > > > > > >
> > > > > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > > > > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > > > > > I'm focusing on applications, for which the performance impact of calling
> > > > > > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > > > > > performing all the necessary preparation at once.
> > > > > > >
> > > > > > > [...]
> > > > > > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > > > > > of having to:
> > > > > > > > >
> > > > > > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > > > > > >      tx_burst(); // iterate and send
> > > > > > > > >
> > > > > > > > > Compared to:
> > > > > > > > >
> > > > > > > > >  tx_burst(); // iterate, update as needed and send
> > > > > > > >
> > > > > > > > I think that was discussed extensively quite a lot previously here:
> > > > > > > > As Thomas already replied - main motivation is to allow user
> > > > > > > > to execute them on different stages of packet TX pipeline,
> > > > > > > > and probably on different cores.
> > > > > > > > I think that provides better flexibility to the user to when/where
> > > > > > > > do these preparations and hopefully would lead to better performance.
> > > > > > >
> > > > > > > And I agree, I think this use case is valid but does not warrant such a high
> > > > > > > penalty when your application does not need that much flexibility. Simple
> > > > > > > (yet conscious) applications need the highest performance. Complex ones as
> > > > > > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > > > > > of extra CPU cycles right?
> > > > > >
> > > > > > It would mean an extra cache-miss for every packet, so I think performance hit
> > > > > > would be quite significant.
> > > > >
> > > > > A performance hit has to occur somewhere regardless, because something has
> > > > > to be done in order to send packets that need it. Whether this cost is in
> > > > > application code or in a PMD function, it remains part of TX.
> > > >
> > > > Depending on the place the final cost would differ quite a lot.
> > > > If you call tx_prepare() somewhere close to the place where you fill the packet header
> > > > contents, then most likely the data that tx_prepare() has to access will be already in the cache.
> > > > So the performance penalty will be minimal.
> > > > If you'll try to access the same data later (at tx_burst), then the possibility that it would still
> > > > be in cache is much less.
> > > > If you calling tx_burst() from other core then data would for sure be out of cache,
> > > > and  even worse can still be in another core cache.
> > >
> > > Well sure, that's why I also think tx_prepare() has its uses, only that
> > > since tx_prepare() is optional, tx_burst() should provide the same
> > > functionality when tx_prepare() is not called.
> >
> > As I understand, to implement what you are proposing (TX_PREPARED mbuf->ol_flag)
> > it will be required:
> >
> > a) Modify all existing applications that do similar to tx_prepare() stuff on their own,
> > otherwise they'll would hit performance penalty.
> > b) Modify at least all Intel PMDs and might be some others too (vmxnet3?).
> >
> > Step b) probably wouldn't cause any significant performance impact straightway,
> > but it's for sure wouldn't make things faster, and would increase tx_burst() code
> > complexity quite a lot.
> > From other side, I can't see any real benefit that we will have in return.
> > So I still opposed to that idea.
> 
> Applications gain the ability to perform tx_burst() with offloads without
> having to prepare anything. 

What means 'without preparing anything'?
This is just not possible I think.
One way or another application would have to to decide
what exactly it likes to TX and what HW offloads it likes to use for it.
So at least, it still needs to fill relevant mbuf fields:
pkt_len, data_len, nb_segs, ol_flags, tx_offload, etc.

>Currently these applications either cannot use
> offloads at all or need to perform PMD-specific voodoo first.

That's why tx_prepare() is introduced.

> The generic
> alternative to this scenario being tx_prepare(), PMDs have to make this step
> as cheap as possible.
> 
> Yes that would slow down existing applications, people may find it
> acceptable since we're modifying the TX API here.
> 
> > > > > > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > > > > > Why do you believe that:
> > > > > > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> > > > >
> > > > > I mean instead of two function calls with their own loops:
> > > > >
> > > > >  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> > > > >
> > > > >  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> > > > >
> > > > > You end up with one:
> > > > >
> > > > >  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> > > > >
> > > > > Which usually is more efficient.
> > > >
> > > > I really doubt that.
> > > > If it would be that, what is the point to process packet in bulks?
> > > > Usually dividing processing into different stages and at each stage processing
> > > > multiple packet at once helps to improve performance.
> > > > At  least for IA.
> > > > Look for example how we had to change l3fwd to improve its performance.
> > >
> > > Depends quite a bit on usage pattern. It is less efficient for applications
> > > that do not modify mbuf contents because of the additional function call and
> > > inner loop.
> >
> > If the application doesn't modify mbuf contents that it can simply skip calling tx_prepare().
> 
> What if that same application wants to enable some offload as well?

Hmm, wasn't that your use-case when no offloads (modifications) are required just 3 lines above?

> 
> > > Note that I'm only pushing for the ability to conveniently address both
> > > cases with maximum performance.
> > >
> > > > > > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > > > > > - read mbuf fileds (2 cache-lines),
> > > > > > - read packet header (1/2 cache-lines)
> > > > > > - calculate pseudo-header csum
> > > > > >  - update packet header
> > > > > > Comparing to that price of extra function call seems neglectable
> > > > > > (if we TX packets in bursts of course).
> > > > >
> > > > > We agree its performance is a critical issue then, sharing half the read
> > > > > steps with tx_burst() would make sense to me.
> > > >
> > > > I didn't understand that sentence.
> > >
> > > I meant this step can be shared (in addition to loop etc):
> > >
> > >  - read mbuf fileds (2 cache-lines),
> >
> > Ah ok, you still believe that mixing tx_burst and tx_prepare code together
> > would give us noticeable performance benefit.
> > As I said above, I don't think it would, but you are welcome to try and
> > prove me wrong.
> 
> Depends on what you call noticeable. I guess we can at least agree having
> two separate functions and loops cause more instructions to be generated and
> executed.
> 
> Now for the number of spent CPU cycles, depends of course whether mbufs are
> still hot into the cache or not, and as I told you in my opinion we'll
> usually see applications calling tx_prepare() just before tx_burst() to
> benefit from offloads.

Honestly, Adrien we are going in cycles here.
Just to be clear:
Current patch introduces tx_prepare() without affecting in any way:
	1) existing applications
	2) exiting PMD code (tx_burst)
Meanwhile I still believe it is useful and provide a big step forward in terms
of generalizing usage of  HW TX offloads.

What you propose requires modifications for both existing applications and existing PMD code
(full-featured tx_burst() for at least all Intel PMDs and vmxnet3 has to be significantly modified).
You believe that with these modifications that new tx_burst() implementation
would be noticeably faster than just current: tx_prepare(); tx_burst();
I personally doubt that it really would (at least on modern IA).
But as I said, you are more than welcome to prove me wrong here.
Let say, provide a patch for ixgbe (or i40e) full-featured tx_burst() implementation,
so that it would combine both tx_prepare() and tx_burst() functionalities into one function.
Then we can run some performance tests with current and yours patches and compare results.  
Without that, I don't see any point to discuss your proposition any further.
I just won't agree for such big change in existing PMDs without some solid justification beyond it.

> 
> > > > > > > Yes they will, therefore we need a method that satisfies both cases.
> > > > > > >
> > > > > > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > > > > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > > > > > checks and things it would otherwise have done.
> > > > > >
> > > > > > That's an interesting idea, but it has one drawback:
> > > > > > As I understand, it means that from now on if user doing preparations on his own,
> > > > > > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > > > > > So any existing applications that using TX offloads and do preparation by themselves
> > > > > > would have to be modified to avoid performance loss.
> > > > >
> > > > > In my opinion, users should not do preparation on their own.
> > > >
> > > > People already do it now.
> > >
> > > But we do not want them to anymore thanks to this new API, for reasons
> > > described in the motivation section of the cover letter, right?
> >
> > We probably wouldn't recommend that, but if people would like to use their own stuff,
> > or shortcuts - I don't want to stop them here.
> >
> > >
> > > > > If we provide a
> > > > > generic method, it has to be fast enough to replace theirs. Perhaps not as
> > > > > fast since it would work with all PMDs (usual trade-off), but acceptably so.
> > > > >
> > > > > > > Another possibility, telling the PMD first that you always intend to use
> > > > > > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> > > > > >
> > > > > > That what we have right now (at least for Intel HW):
> > > > > > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > > > > > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > > > > > Now he can use a 'proper' PMD provided function.
> > > > > >
> > > > > > My vote still would be for that model.
> > > > >
> > > > > OK, then in a nutshell:
> > > > >
> > > > > 1. Users are not expected to perform preparation/checks themselves anymore,
> > > > >    if they do, it's their problem.
> > > >
> > > > I think we need to be backward compatible here.
> > > > If the existing app doing what tx_prepare() supposed to do, it should keep working.
> > >
> > > It should work, only if they keep doing it as well as call tx_burst()
> > > directly, they will likely get lower performance.
> > >
> > > > > 2. If configured through an API to be defined, tx_burst() can be split in
> > > > >    two and applications must call tx_prepare() at some point before
> > > > >    tx_burst().
> > > > >
> > > > > 3. Otherwise tx_burst() should perform the necessary preparation and checks
> > > > >    on its own by default (when tx_prepare() is not expected).
> > > >
> > > > As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
> > > > If some particular implementation of tx_burst() prefers to do things that way - that's fine.
> > > > But it shouldn't be required to.
> > >
> > > You're right, however applications might find it convenient. I think most
> > > will end up with something like the following:
> > >
> > >  if (tx_prepare(pkts))
> > >      tx_burst(pkts));
> >
> > Looking at existing DPDK apps - most of them do use some sort of TX bufferization.
> > So, even in a simplistic app it would probably be:
> >
> > tx_prepare(pkts);
> > tx_buffer(pkts);
> 
> We're down to my word against yours here I guess, to leave the choice to
> application developers, we'd need to provide tx_prepare() and a simpler
> tx_burst() as well as the ability to call tx_burst() directly and still get
> offloads.

>From what I've seen, most DPDK libs/apps do buffer data packets for TX in one or another way:
mtcp, warp17, seastar.
Not to mention sample apps.
But ok, as said above, if you can prove that tx_burst() you are proposing is really much faster then
tx_prepare(); tx_burst(); 
I'll be happy to reconsider.

> 
> > > > > 4. We probably still need some mbuf flag to mark mbufs that cannot be
> > > > >    modified, the refcount could also serve as a hint.
> > > >
> > > > If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
> > > > tx_prepare() in that case.
> > >
> > > I think it would be easier to document what offload flags may cause the
> > > tx_burst() function to modify mbuf contents, so applications have the
> > > ability to set or strip these flags on a mbuf basis. That way there is no
> > > need to call tx_prepare() without knowing exactly what it's going to do.
> >
> > Not sure I understand what exactly do you propose in the last paragraph?
> 
> That for each TX offload flag, we document whether preparation might cause a
> mbuf to be written to during the tx_prepare()/tx_burst() phase. One of the
> reasons for tx_prepare() being:
> 
>  4) Fields in packet may require different initialization (like e.g. will
>     require pseudo-header checksum precalculation, sometimes in a
>     different way depending on packet type, and so on). Now application
>     needs to care about it.
> 
> If we determine what offloads may cause mbuf contents to change (all of them
> perhaps?), then applications can easily strip those flags from outgoing
> "const" mbufs. Then it becomes acceptable for tx_burst() to modify mbuf
> contents as per user request, which removes one reason to rely on
> tx_prepare() for these.

Hmm, I didn't get you here.
If let say I don't need TX TCP cksum offload, why would I set this flag inside mbuf at first place?
If I do expect PMD to do  TX TCP cksum offload for me, then I have to set that flag,
otherwise how PMD would know that I did require that offload?

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 15:53                         ` Ferruh Yigit
@ 2016-12-07  7:55                           ` Andrew Rybchenko
  2016-12-07  8:11                           ` Yuanhan Liu
  2016-12-13 11:59                           ` Ferruh Yigit
  2 siblings, 0 replies; 261+ messages in thread
From: Andrew Rybchenko @ 2016-12-07  7:55 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon, dev, Jan Medala, Jakub Palider,
	Netanel Belgazal, Evgeny Schemeilin, Alejandro Lucero,
	Yuanhan Liu, Yong Wang, Hemant Agrawal
  Cc: Tomasz Kulasek, konstantin.ananyev

On 12/06/2016 06:53 PM, Ferruh Yigit wrote:
> On 11/28/2016 11:03 AM, Thomas Monjalon wrote:
>> We need attention of every PMD developers on this thread.
>>
>> Reminder of what Konstantin suggested:
>> "
>> - if the PMD supports TX offloads AND
>> - if to be able use any of these offloads the upper layer SW would have to:
>>      * modify the contents of the packet OR
>>      * obey HW specific restrictions
>> then it is a PMD developer responsibility to provide tx_prep() that would implement
>> expected modifications of the packet contents and restriction checks.
>> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.
>> "
>>
>> I copy/paste also my previous conclusion:
>>
>> Before txprep, there is only one API: the application must prepare the
>> packets checksum itself (get_psd_sum in testpmd).
>> With txprep, the application have 2 choices: keep doing the job itself
>> or call txprep which calls a PMD-specific function.
>> The question is: does non-Intel drivers need a checksum preparation for TSO?
>> Will it behave well if txprep does nothing in these drivers?
>>
>> When looking at the code, most of drivers handle the TSO flags.
>> But it is hard to know whether they rely on the pseudo checksum or not.
>>
>> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
>>
>> drivers/net/bnxt/bnxt_txr.c
>> drivers/net/cxgbe/sge.c
>> drivers/net/e1000/em_rxtx.c
>> drivers/net/e1000/igb_rxtx.c
>> drivers/net/ena/ena_ethdev.c
>> drivers/net/enic/enic_rxtx.c
>> drivers/net/fm10k/fm10k_rxtx.c
>> drivers/net/i40e/i40e_rxtx.c
>> drivers/net/ixgbe/ixgbe_rxtx.c
>> drivers/net/mlx4/mlx4.c
>> drivers/net/mlx5/mlx5_rxtx.c
>> drivers/net/nfp/nfp_net.c
>> drivers/net/qede/qede_rxtx.c
>> drivers/net/thunderx/nicvf_rxtx.c
>> drivers/net/virtio/virtio_rxtx.c
>> drivers/net/vmxnet3/vmxnet3_rxtx.c
>>
>> Please, we need a comment for each driver saying
>> "it is OK, we do not need any checksum preparation for TSO"
>> or
>> "yes we have to implement tx_prepare or TSO will not work in this mode"
>>
> Still waiting response from PMDs:
> - ena
> - nfp
> - virtio
>
> Waiting clarification for preparation requirements:
> - vmxnet3
>
> Also including new PMDs to the thread:
> - sfc

The patch which adds TSO support is 
http://dpdk.org/dev/patchwork/patch/17417/
We use l2/l3/l4 header length. We do NOT use prepared pseudo header 
checksum and
HW does NOT need it.

Andrew.

> - dpaa2

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 15:53                         ` Ferruh Yigit
  2016-12-07  7:55                           ` Andrew Rybchenko
@ 2016-12-07  8:11                           ` Yuanhan Liu
  2016-12-07 10:13                             ` Ananyev, Konstantin
  2016-12-13 11:59                           ` Ferruh Yigit
  2 siblings, 1 reply; 261+ messages in thread
From: Yuanhan Liu @ 2016-12-07  8:11 UTC (permalink / raw)
  To: Ferruh Yigit, Olivier Matz
  Cc: Thomas Monjalon, dev, Jan Medala, Jakub Palider,
	Netanel Belgazal, Evgeny Schemeilin, Alejandro Lucero, Yong Wang,
	Andrew Rybchenko, Hemant Agrawal, Tomasz Kulasek,
	konstantin.ananyev

On Tue, Dec 06, 2016 at 03:53:42PM +0000, Ferruh Yigit wrote:
> > Please, we need a comment for each driver saying
> > "it is OK, we do not need any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
> >

Sorry for late. For virtio, I think it's not a must. The checksum stuff
has been handled inside the Tx function. However, we may could move it
to tx_prepare, which would actually recover the performance lost
introduced while enabling TSO for the non-TSO case.

	--yliu

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 18:25                                   ` Yong Wang
@ 2016-12-07  9:57                                     ` Ferruh Yigit
  2016-12-07 10:03                                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-07  9:57 UTC (permalink / raw)
  To: Yong Wang, Ananyev, Konstantin, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz

On 12/6/2016 6:25 PM, Yong Wang wrote:
>> -----Original Message-----
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Sunday, December 4, 2016 4:11 AM
>> To: Yong Wang <yongwang@vmware.com>; Thomas Monjalon
>> <thomas.monjalon@6wind.com>
>> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
>> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
>> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
>> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
>> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
>> <alejandro.lucero@netronome.com>; Rasesh Mody
>> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
>> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
>> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
>> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
>>
>> Hi
>>
>>
>>
>>>>
>>
>>>> 2016-11-30 17:42, Ananyev, Konstantin:
>>
>>>>>>> Please, we need a comment for each driver saying
>>
>>>>>>> "it is OK, we do not need any checksum preparation for TSO"
>>
>>>>>>> or
>>
>>>>>>> "yes we have to implement tx_prepare or TSO will not work in this
>>
>>>> mode"
>>
>>>>>>>
>>
>>>>>>
>>
>>>>>> qede PMD doesn’t currently support TSO yet, it only supports Tx
>>
>>>> TCP/UDP/IP
>>
>>>>>> csum offloads.
>>
>>>>>> So Tx preparation isn’t applicable. So as of now -
>>
>>>>>> "it is OK, we do not need any checksum preparation for TSO"
>>
>>>>>
>>
>>>>> Thanks for the answer.
>>
>>>>> Though please note that it not only for TSO.
>>
>>>>
>>
>>>> Oh yes, sorry, my wording was incorrect.
>>
>>>> We need to know if any checksum preparation is needed prior
>>
>>>> offloading its final computation to the hardware or driver.
>>
>>>> So the question applies to TSO and simple checksum offload.
>>
>>>>
>>
>>>> We are still waiting answers for
>>
>>>> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
>>
>>>
>>
>>> The case for a virtual device is a little bit more complicated as packets
>> offloaded from a virtual device can eventually be delivered to
>>
>>> another virtual NIC or different physical NICs that have different offload
>> requirements.  In ESX, the hypervisor will enforce that the packets
>>
>>> offloaded will be something that the hardware expects.  The contract for
>> vmxnet3 is that the guest needs to fill in pseudo header checksum
>>
>>> for both l4 checksum only and TSO + l4 checksum offload cases.
>>
>>
>>
>> Ok, so at first glance that looks to me very similar to Intel HW requirements.
>>
>> Could you confirm would rte_net_intel_cksum_prepare()
>>
>> also work for vmxnet3 or some extra modifications are required?
>>
>> You can look at it here: https://urldefense.proofpoint.com/v2/url?u=http-
>> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
>> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
>> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
>> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
>>
>> Note that for Intel HW the rules for pseudo-header csum calculation
>>
>> differ for TSO and non-TSO case.
>>
>> For TSO length inside pseudo-header are set to 0, while for non-tso case
>>
>> It should be set to L3 payload length.
>>
>> Is it the same for vmxnet3 or no?
>>
>> Thanks
>>
>> Konstantin
>>
> 
> Yes and this is the same for vmxnet3.
> 

This means vmxnet3 PMD also should be updated, right? Should that update
be part of tx_prep patchset? Or separate patch?

>>>
>>
>>>>> This is for any TX offload for which the upper layer SW would have
>>
>>>>> to modify the contents of the packet.
>>
>>>>> Though as I can see for qede neither PKT_TX_IP_CKSUM or
>>
>>>> PKT_TX_TCP_CKSUM
>>
>>>>> exhibits any extra requirements for the user.
>>
>>>>> Is that correct?
>>
>>
> 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07  9:57                                     ` Ferruh Yigit
@ 2016-12-07 10:03                                       ` Ananyev, Konstantin
  2016-12-07 14:31                                         ` Alejandro Lucero
  2016-12-08 18:20                                         ` Yong Wang
  0 siblings, 2 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-07 10:03 UTC (permalink / raw)
  To: Yigit, Ferruh, Yong Wang, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz


Hi Ferruh,

> 
> On 12/6/2016 6:25 PM, Yong Wang wrote:
> >> -----Original Message-----
> >> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> >> Sent: Sunday, December 4, 2016 4:11 AM
> >> To: Yong Wang <yongwang@vmware.com>; Thomas Monjalon
> >> <thomas.monjalon@6wind.com>
> >> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
> >> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> >> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
> >> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
> >> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> >> <alejandro.lucero@netronome.com>; Rasesh Mody
> >> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> >> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
> >> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> >> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> >>
> >> Hi
> >>
> >>
> >>
> >>>>
> >>
> >>>> 2016-11-30 17:42, Ananyev, Konstantin:
> >>
> >>>>>>> Please, we need a comment for each driver saying
> >>
> >>>>>>> "it is OK, we do not need any checksum preparation for TSO"
> >>
> >>>>>>> or
> >>
> >>>>>>> "yes we have to implement tx_prepare or TSO will not work in this
> >>
> >>>> mode"
> >>
> >>>>>>>
> >>
> >>>>>>
> >>
> >>>>>> qede PMD doesn’t currently support TSO yet, it only supports Tx
> >>
> >>>> TCP/UDP/IP
> >>
> >>>>>> csum offloads.
> >>
> >>>>>> So Tx preparation isn’t applicable. So as of now -
> >>
> >>>>>> "it is OK, we do not need any checksum preparation for TSO"
> >>
> >>>>>
> >>
> >>>>> Thanks for the answer.
> >>
> >>>>> Though please note that it not only for TSO.
> >>
> >>>>
> >>
> >>>> Oh yes, sorry, my wording was incorrect.
> >>
> >>>> We need to know if any checksum preparation is needed prior
> >>
> >>>> offloading its final computation to the hardware or driver.
> >>
> >>>> So the question applies to TSO and simple checksum offload.
> >>
> >>>>
> >>
> >>>> We are still waiting answers for
> >>
> >>>> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> >>
> >>>
> >>
> >>> The case for a virtual device is a little bit more complicated as packets
> >> offloaded from a virtual device can eventually be delivered to
> >>
> >>> another virtual NIC or different physical NICs that have different offload
> >> requirements.  In ESX, the hypervisor will enforce that the packets
> >>
> >>> offloaded will be something that the hardware expects.  The contract for
> >> vmxnet3 is that the guest needs to fill in pseudo header checksum
> >>
> >>> for both l4 checksum only and TSO + l4 checksum offload cases.
> >>
> >>
> >>
> >> Ok, so at first glance that looks to me very similar to Intel HW requirements.
> >>
> >> Could you confirm would rte_net_intel_cksum_prepare()
> >>
> >> also work for vmxnet3 or some extra modifications are required?
> >>
> >> You can look at it here: https://urldefense.proofpoint.com/v2/url?u=http-
> >> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> >> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> >> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> >> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> >>
> >> Note that for Intel HW the rules for pseudo-header csum calculation
> >>
> >> differ for TSO and non-TSO case.
> >>
> >> For TSO length inside pseudo-header are set to 0, while for non-tso case
> >>
> >> It should be set to L3 payload length.
> >>
> >> Is it the same for vmxnet3 or no?
> >>
> >> Thanks
> >>
> >> Konstantin
> >>
> >
> > Yes and this is the same for vmxnet3.
> >
> 
> This means vmxnet3 PMD also should be updated, right? 

Yes, that's right.

>Should that update
> be part of tx_prep patchset? Or separate patch?

Another question I suppose is who will do the actual patch for vmxnet3.
Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
Please note, that in both cases will need your help in testing/reviewing it.
Konstantin

> 
> >>>
> >>
> >>>>> This is for any TX offload for which the upper layer SW would have
> >>
> >>>>> to modify the contents of the packet.
> >>
> >>>>> Though as I can see for qede neither PKT_TX_IP_CKSUM or
> >>
> >>>> PKT_TX_TCP_CKSUM
> >>
> >>>>> exhibits any extra requirements for the user.
> >>
> >>>>> Is that correct?
> >>
> >>
> >


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 20:31                                           ` Ananyev, Konstantin
@ 2016-12-07 10:08                                             ` Adrien Mazarguil
  0 siblings, 0 replies; 261+ messages in thread
From: Adrien Mazarguil @ 2016-12-07 10:08 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Alejandro Lucero, Harish Patil,
	Rasesh Mody, Jerin Jacob, Yuanhan Liu, Yong Wang, Kulasek,
	TomaszX, olivier.matz

On Tue, Dec 06, 2016 at 08:31:35PM +0000, Ananyev, Konstantin wrote:
> > Hi Konstantin,
> > 
> > On Tue, Dec 06, 2016 at 10:56:26AM +0000, Ananyev, Konstantin wrote:
> > >
> > > Hi Adrien,
> > >
> > > >
> > > > On Mon, Dec 05, 2016 at 04:43:52PM +0000, Ananyev, Konstantin wrote:
> > > > [...]
> > > > > > On Fri, Dec 02, 2016 at 01:00:55AM +0000, Ananyev, Konstantin wrote:
> > > > > > [...]
> > > > > > > > On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> > > > > > > > [...]
> > > > > > > > > Do you have anything particular in mind here?
> > > > > > > >
> > > > > > > > Nothing in particular, so for the sake of the argument, let's suppose that I
> > > > > > > > would like to add a field to expose some limitation that only applies to my
> > > > > > > > PMD during TX but looks generic enough to make sense, e.g. maximum packet
> > > > > > > > size when VLAN tagging is requested.
> > > > > > >
> > > > > > > Hmm, I didn't hear about such limitations so far, but if it is real case -
> > > > > > > sure, feel free to submit the patch.
> > > > > >
> > > > > > I won't, that was hypothetical.
> > > > >
> > > > > Then why we discussing it? :)
> > > >
> > > > Just to make a point, which is that new limitations may appear anytime and
> > > > tx_prepare() can now be used to check for them. First patch of the series
> > > > does it:
> > > >
> > > >  +   uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
> > > >  +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> > > >
> > > > And states that:
> > > >
> > > >  + * For each packet to send, the rte_eth_tx_prepare() function performs
> > > >  + * the following operations:
> > > >  + *
> > > >  + * - Check if packet meets devices requirements for tx offloads.
> > > >  + *
> > > >  + * - Check limitations about number of segments.
> > > >  + *
> > > >  + * - Check additional requirements when debug is enabled.
> > > >  + *
> > > >  + * - Update and/or reset required checksums when tx offload is set for packet.
> > >
> > > I think I already explained in my previous email why I think that
> > > nb_seg_max and nb_mtu_seg_max are not redundant because of tx_prepare().
> > > From my point they are complement to tx_prepare():
> > > Even if people do use tx_prepare() they still should take this information into account.
> > > As an example ixgbe can't TX packets with then 40 segments.
> > > tx_prepare() for ixgbe will flag that issue, but it can't make a decision on user behalf
> > > what to do in that case: drop the packet, try to coalesce it into the packet with less
> > > number of segments, split the packet into several smaller, etc.
> > > That's up to user to make such decision, and to make it, user might need this information.
> > 
> > Yet tx_prepare() has already the ability to update mbuf contents, issue is
> > what will this function do in the future, where will it stop? It is defined
> > in a way that each PMD does what it wants to make mbufs edible for
> > tx_burst(), because of this applications will just always call it to be on
> > the safe side.
> > 
> > > > It's like making this function mandatory IMO.
> > >
> > > That's probably where confusion starts: I don't think that
> > > tx_prepare() should be mandatory for the user to call.
> > > Yes, it should be a recommended way.
> > > But the user still should have the ability to by-pass it,
> > > if he believes there is no need for it, or he prefers to implement
> > > the same functionality on his own.
> > > As an example,  if the user knows that he is going to send  a group
> > > of one-segment packets that don't require any tx offloads, he can safely skip
> > > tx_prepare() for them.
> > 
> > I understand your point, and agree with the example you provide. Many
> > applications do not know what's inside mbufs though, except perhaps that
> > they contain TCP and may want to perform TSO because of that. Those will
> > have to call tx_prepare() to be future-proof.
> > 
> > > > > > > > PMDs are free to set that field to some
> > > > > > > > special value (say, 0) if they do not care.
> > > > > > > >
> > > > > > > > Since that field exists however, conscious applications should check its
> > > > > > > > value for each packet that needs to be transmitted. This extra code causes a
> > > > > > > > slowdown just by sitting in the data path. Since it is not the only field in
> > > > > > > > that structure, the performance impact can be significant.
> > >
> > > Conscious user will  probably use this information at the stage of packet formation.
> > > He probably has to do this sort of things for large packets anyway:
> > > Check what is the underlying mtu, to decide does he need to split the packet,
> > > or enable tso for it, etc.
> > 
> > There are already too many things to check, 
> 
> There always been, that patch exposes them, before that upper layer probably
> had to use some hard-coded defines.
> 
> > applications probably won't mind
> > a little help from PMDs. If we keep adding fields to this structure, we'll
> > have to provide some sort of PMD-specific function that checks what is
> > relevant.
> 
> Why PMD specific?
> These fields are generic enough and could be used by upper layer to consult
> when packet is formed.

I've used the wrong term here, I meant a generic function provided by PMDs,
just like tx_prepare(). Some sort of generic tx_check(), whose functionality
is currently partially covered by tx_prepare() if I'm not mistaken.

> > Furthermore, assuming most packets are fine and do not require any extra
> > processing, what is rejected by tx_burst() could enter some unlikely() path
> > that attempts to rectify and re-send them. That would at least optimize the
> > common scenario.
> 
> It is up to the upper layer to decide what to do with ill-formed packets:
> drop/log/try to cure/etc.
> Obviously different applications would have different logic and make different decisions here.
> If you'd like to introduce a new function (in rte_net or whatever) that would be smart and
> generic  enough to cure ill-formed packets - you are more than welcome to try.
> Though discussion of such fallback function is far of scope of that patch, I believe.

I agree it's up to the upper layer, so just to clarify, I meant the
tx_burst() function could fail some check and not transmit the remaining
buffers, leaving the application to do whatever with them in some unlikely()
path:

 // tx_prepare() has never been called
 n = tx_burst(pkts, num);
 if (unlikely(n < num)) {
     n += recover_here(pkts + n, num - n);
 }

> > > > > > > > Even though this code is inside applications, it remains unfair to PMDs for
> > > > > > > > which these tests are irrelevant. This problem is identified and addressed
> > > > > > > > by tx_prepare().
> > > > > > >
> > > > > > > I suppose the question is why do we need:
> > > > > > > uint16_t nb_seg_max;
> > > > > > > uint16_t nb_mtu_seg_max;
> > > > > > > as we now have tx_prepare(), right?
> > > > > > >
> > > > > > > For two reasons:
> > > > > > > 1. Some people might feel that tx_prepare() is not good (smart/fast) enough
> > > > > > > for them and would prefer to do necessary preparations for TX offloads themselves.
> > > > > > >
> > > > > > > 2. Even if people do use tx_prepare() they still should take this information into accout.
> > > > > > > As an example ixgbe can't TX packets with then 40 segments.
> > > > > > > Obviously ixbge_tx_prep() performs that check and returns an error.
> > > > > >
> > > > > > Problem is that tx_prepare() also provides safeties which are not part of
> > > > > > tx_burst(), such as not going over nb_mtu_seg_max. Because of this and the
> > > > > > fact struct rte_eth_desc_lim can grow new fields anytime, application
> > > > > > developers will be tempted to just call tx_prepare() and focus on more
> > > > > > useful things.
> > > > >
> > > > > NP with that, that was an intention beyond introducing it.
> > > > >
> > > > > > Put another way, from a user's point of view, tx_prepare() is an opaque
> > > > > > function that greatly increases tx_burst()'s ability to send mbufs as
> > > > > > requested, with extra error checking on top; applications not written to run
> > > > > > on a specific PMD/device (all of them ideally) will thus call tx_prepare()
> > > > > > at some point.
> > > > > >
> > > > > > > But it wouldn't try to merge/reallocate mbufs for you.
> > > > > > > User still has to do it himself, or just prevent creating such long chains somehow.
> > > > > >
> > > > > > Yes, that's another debate. PMDs could still implement a software fallback
> > > > > > for unlikely slow events like these. The number of PMDs is not going to
> > > > > > decrease, each device having its own set of weird limitations in specific
> > > > > > cases, PMDs should do their best to process mbufs even if that means slowly
> > > > > > due to the lack of preparation.
> > > > > >
> > > > > > tx_prepare() has its uses but should really be optional, in the sense that
> > > > > > if that function is not called, tx_burst() should deal with it somehow.
> > > > >
> > > > > As I said before, I don't think it is a good idea to put everything in tx_burst().
> > > > > If PMD driver prefer things that way, yes tx_burst() can deal with each and
> > > > > possible offload requirement itself, but it shouldn't be mandatory.
> > > >
> > > > In effect, having to call tx_prepare() otherwise makes this step mandatory
> > > > anyway. Looks like we are not going to agree here.
> > > >
> > > > > > > > Thanks to tx_prepare(), these checks are moved back into PMDs where they
> > > > > > > > belong. PMDs that do not need them do not have to provide support for
> > > > > > > > tx_prepare() and do not suffer any performance impact as result;
> > > > > > > > applications only have to make sure tx_prepare() is always called at some
> > > > > > > > point before tx_burst().
> > > > > > > >
> > > > > > > > Once you reach this stage, you've effectively made tx_prepare() mandatory
> > > > > > > > before tx_burst(). If some bug occurs, then perhaps you forgot to call
> > > > > > > > tx_prepare(), you just need to add it. The total cost for doing TX is
> > > > > > > > therefore tx_prepare() + tx_burst().
> > > > > > > >
> > > > > > > > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > > > > > > > remain optional for long. Sure, PMDs that do not implement it do not care,
> > > > > > > > I'm focusing on applications, for which the performance impact of calling
> > > > > > > > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > > > > > > > performing all the necessary preparation at once.
> > > > > > > >
> > > > > > > > [...]
> > > > > > > > > > Following the same logic, why can't such a thing be made part of the TX
> > > > > > > > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > > > > > > > whenever necessary). From an application standpoint, what are the advantages
> > > > > > > > > > of having to:
> > > > > > > > > >
> > > > > > > > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > > > > > > > >      tx_burst(); // iterate and send
> > > > > > > > > >
> > > > > > > > > > Compared to:
> > > > > > > > > >
> > > > > > > > > >  tx_burst(); // iterate, update as needed and send
> > > > > > > > >
> > > > > > > > > I think that was discussed extensively quite a lot previously here:
> > > > > > > > > As Thomas already replied - main motivation is to allow user
> > > > > > > > > to execute them on different stages of packet TX pipeline,
> > > > > > > > > and probably on different cores.
> > > > > > > > > I think that provides better flexibility to the user to when/where
> > > > > > > > > do these preparations and hopefully would lead to better performance.
> > > > > > > >
> > > > > > > > And I agree, I think this use case is valid but does not warrant such a high
> > > > > > > > penalty when your application does not need that much flexibility. Simple
> > > > > > > > (yet conscious) applications need the highest performance. Complex ones as
> > > > > > > > you described already suffer quite a bit from IPCs and won't mind a couple
> > > > > > > > of extra CPU cycles right?
> > > > > > >
> > > > > > > It would mean an extra cache-miss for every packet, so I think performance hit
> > > > > > > would be quite significant.
> > > > > >
> > > > > > A performance hit has to occur somewhere regardless, because something has
> > > > > > to be done in order to send packets that need it. Whether this cost is in
> > > > > > application code or in a PMD function, it remains part of TX.
> > > > >
> > > > > Depending on the place the final cost would differ quite a lot.
> > > > > If you call tx_prepare() somewhere close to the place where you fill the packet header
> > > > > contents, then most likely the data that tx_prepare() has to access will be already in the cache.
> > > > > So the performance penalty will be minimal.
> > > > > If you'll try to access the same data later (at tx_burst), then the possibility that it would still
> > > > > be in cache is much less.
> > > > > If you calling tx_burst() from other core then data would for sure be out of cache,
> > > > > and  even worse can still be in another core cache.
> > > >
> > > > Well sure, that's why I also think tx_prepare() has its uses, only that
> > > > since tx_prepare() is optional, tx_burst() should provide the same
> > > > functionality when tx_prepare() is not called.
> > >
> > > As I understand, to implement what you are proposing (TX_PREPARED mbuf->ol_flag)
> > > it will be required:
> > >
> > > a) Modify all existing applications that do similar to tx_prepare() stuff on their own,
> > > otherwise they'll would hit performance penalty.
> > > b) Modify at least all Intel PMDs and might be some others too (vmxnet3?).
> > >
> > > Step b) probably wouldn't cause any significant performance impact straightway,
> > > but it's for sure wouldn't make things faster, and would increase tx_burst() code
> > > complexity quite a lot.
> > > From other side, I can't see any real benefit that we will have in return.
> > > So I still opposed to that idea.
> > 
> > Applications gain the ability to perform tx_burst() with offloads without
> > having to prepare anything. 
> 
> What means 'without preparing anything'?
> This is just not possible I think.
> One way or another application would have to to decide
> what exactly it likes to TX and what HW offloads it likes to use for it.
> So at least, it still needs to fill relevant mbuf fields:
> pkt_len, data_len, nb_segs, ol_flags, tx_offload, etc.
> 
> >Currently these applications either cannot use
> > offloads at all or need to perform PMD-specific voodoo first.
> 
> That's why tx_prepare() is introduced.

Yes, and the fact it appeared made me wonder why tx_burst() could not
implement its functionality as well. Why can't tx_burst() call
rte_net_intel_cksum_prepare() directly? I mean, unless offloads are
requested, this additional code should not impact performance.

> > The generic
> > alternative to this scenario being tx_prepare(), PMDs have to make this step
> > as cheap as possible.
> > 
> > Yes that would slow down existing applications, people may find it
> > acceptable since we're modifying the TX API here.
> > 
> > > > > > > About the 'simple' case when tx_prep() and tx_burst() are called on the same core,
> > > > > > > Why do you believe that:
> > > > > > > tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...}?
> > > > > >
> > > > > > I mean instead of two function calls with their own loops:
> > > > > >
> > > > > >  tx_prepare() { foreach (pkt) { check(); extra_check(); ... } }
> > > > > >
> > > > > >  tx_burst() { foreach (pkt) { check(); stuff(); ... } }
> > > > > >
> > > > > > You end up with one:
> > > > > >
> > > > > >  tx_burst() { foreach (pkt) { check(); extra_check(); stuff(); ... } }
> > > > > >
> > > > > > Which usually is more efficient.
> > > > >
> > > > > I really doubt that.
> > > > > If it would be that, what is the point to process packet in bulks?
> > > > > Usually dividing processing into different stages and at each stage processing
> > > > > multiple packet at once helps to improve performance.
> > > > > At  least for IA.
> > > > > Look for example how we had to change l3fwd to improve its performance.
> > > >
> > > > Depends quite a bit on usage pattern. It is less efficient for applications
> > > > that do not modify mbuf contents because of the additional function call and
> > > > inner loop.
> > >
> > > If the application doesn't modify mbuf contents that it can simply skip calling tx_prepare().
> > 
> > What if that same application wants to enable some offload as well?
> 
> Hmm, wasn't that your use-case when no offloads (modifications) are required just 3 lines above?

Not exactly, this is a case where an application does not touch mbuf
contents yet wants to perform some HW offload on them. I'll concede this
scenario is likely rare enough for existing offloads.

What is likely not, though, is when an application discovers it can request
a particular offload right before calling tx_burst().

> > > > Note that I'm only pushing for the ability to conveniently address both
> > > > cases with maximum performance.
> > > >
> > > > > > > tx_prep() itself is quite expensive, let say for Intel HW it includes:
> > > > > > > - read mbuf fileds (2 cache-lines),
> > > > > > > - read packet header (1/2 cache-lines)
> > > > > > > - calculate pseudo-header csum
> > > > > > >  - update packet header
> > > > > > > Comparing to that price of extra function call seems neglectable
> > > > > > > (if we TX packets in bursts of course).
> > > > > >
> > > > > > We agree its performance is a critical issue then, sharing half the read
> > > > > > steps with tx_burst() would make sense to me.
> > > > >
> > > > > I didn't understand that sentence.
> > > >
> > > > I meant this step can be shared (in addition to loop etc):
> > > >
> > > >  - read mbuf fileds (2 cache-lines),
> > >
> > > Ah ok, you still believe that mixing tx_burst and tx_prepare code together
> > > would give us noticeable performance benefit.
> > > As I said above, I don't think it would, but you are welcome to try and
> > > prove me wrong.
> > 
> > Depends on what you call noticeable. I guess we can at least agree having
> > two separate functions and loops cause more instructions to be generated and
> > executed.
> > 
> > Now for the number of spent CPU cycles, depends of course whether mbufs are
> > still hot into the cache or not, and as I told you in my opinion we'll
> > usually see applications calling tx_prepare() just before tx_burst() to
> > benefit from offloads.
> 
> Honestly, Adrien we are going in cycles here.
> Just to be clear:
> Current patch introduces tx_prepare() without affecting in any way:
> 	1) existing applications
> 	2) exiting PMD code (tx_burst)
> Meanwhile I still believe it is useful and provide a big step forward in terms
> of generalizing usage of  HW TX offloads.

Yes and agreed. I do not oppose this new API on the basis that it's better
than what was available before.

> What you propose requires modifications for both existing applications and existing PMD code
> (full-featured tx_burst() for at least all Intel PMDs and vmxnet3 has to be significantly modified).
> You believe that with these modifications that new tx_burst() implementation
> would be noticeably faster than just current: tx_prepare(); tx_burst();
> I personally doubt that it really would (at least on modern IA).
> But as I said, you are more than welcome to prove me wrong here.
> Let say, provide a patch for ixgbe (or i40e) full-featured tx_burst() implementation,
> so that it would combine both tx_prepare() and tx_burst() functionalities into one function.
> Then we can run some performance tests with current and yours patches and compare results.  
> Without that, I don't see any point to discuss your proposition any further.
> I just won't agree for such big change in existing PMDs without some solid justification beyond it.

I worry that proving you wrong would make you even stronger :), you raise
good points and I'm also a bit tired of this discussion. Without performance
numbers, my concerns are baseless.

Since non-Intel/vmxnet3 PMDs do not need tx_prepare() (yet), standardizing
on the case where it might be needed instead of assuming tx_burst() can do
all that work remains not better from a usability standpoint. It is done
because moving that stuff inside tx_burst() would be expensive even when
offloads are not requested, right?

> > > > > > > > Yes they will, therefore we need a method that satisfies both cases.
> > > > > > > >
> > > > > > > > As a possible solution, a special mbuf flag could be added to each mbuf
> > > > > > > > having gone through tx_prepare(). That way, tx_burst() could skip some
> > > > > > > > checks and things it would otherwise have done.
> > > > > > >
> > > > > > > That's an interesting idea, but it has one drawback:
> > > > > > > As I understand, it means that from now on if user doing preparations on his own,
> > > > > > > he had to setup this flag, otherwise tx_burst() would do extra unnecessary work.
> > > > > > > So any existing applications that using TX offloads and do preparation by themselves
> > > > > > > would have to be modified to avoid performance loss.
> > > > > >
> > > > > > In my opinion, users should not do preparation on their own.
> > > > >
> > > > > People already do it now.
> > > >
> > > > But we do not want them to anymore thanks to this new API, for reasons
> > > > described in the motivation section of the cover letter, right?
> > >
> > > We probably wouldn't recommend that, but if people would like to use their own stuff,
> > > or shortcuts - I don't want to stop them here.
> > >
> > > >
> > > > > > If we provide a
> > > > > > generic method, it has to be fast enough to replace theirs. Perhaps not as
> > > > > > fast since it would work with all PMDs (usual trade-off), but acceptably so.
> > > > > >
> > > > > > > > Another possibility, telling the PMD first that you always intend to use
> > > > > > > > tx_prepare() and getting a simpler/faster tx_burst() callback as a result.
> > > > > > >
> > > > > > > That what we have right now (at least for Intel HW):
> > > > > > > it is a user responsibility to do the necessary preparations/checks before calling tx_burst().
> > > > > > > With tx_prepare() we just remove from user the headache to implement tx_prepare() on his own.
> > > > > > > Now he can use a 'proper' PMD provided function.
> > > > > > >
> > > > > > > My vote still would be for that model.
> > > > > >
> > > > > > OK, then in a nutshell:
> > > > > >
> > > > > > 1. Users are not expected to perform preparation/checks themselves anymore,
> > > > > >    if they do, it's their problem.
> > > > >
> > > > > I think we need to be backward compatible here.
> > > > > If the existing app doing what tx_prepare() supposed to do, it should keep working.
> > > >
> > > > It should work, only if they keep doing it as well as call tx_burst()
> > > > directly, they will likely get lower performance.
> > > >
> > > > > > 2. If configured through an API to be defined, tx_burst() can be split in
> > > > > >    two and applications must call tx_prepare() at some point before
> > > > > >    tx_burst().
> > > > > >
> > > > > > 3. Otherwise tx_burst() should perform the necessary preparation and checks
> > > > > >    on its own by default (when tx_prepare() is not expected).
> > > > >
> > > > > As I said before, I don't think it should be mandatory for tx_burst() to do what tx_prepare() does.
> > > > > If some particular implementation of tx_burst() prefers to do things that way - that's fine.
> > > > > But it shouldn't be required to.
> > > >
> > > > You're right, however applications might find it convenient. I think most
> > > > will end up with something like the following:
> > > >
> > > >  if (tx_prepare(pkts))
> > > >      tx_burst(pkts));
> > >
> > > Looking at existing DPDK apps - most of them do use some sort of TX bufferization.
> > > So, even in a simplistic app it would probably be:
> > >
> > > tx_prepare(pkts);
> > > tx_buffer(pkts);
> > 
> > We're down to my word against yours here I guess, to leave the choice to
> > application developers, we'd need to provide tx_prepare() and a simpler
> > tx_burst() as well as the ability to call tx_burst() directly and still get
> > offloads.
> 
> From what I've seen, most DPDK libs/apps do buffer data packets for TX in one or another way:
> mtcp, warp17, seastar.

I was arguing most applications wouldn't call tx_prepare() far from
tx_burst() and that merging them would therefore make sense, which seems to
be the case in these examples. Obviously they're doing bufferization.

- warp17 would call tx_prepare() inside pkt_flush_tx_q() right before
  tx_burst(), it wouldn't make sense elsewhere.

- mtcp has an interesting loop on tx_burst(), however again, tx_prepare()
  would be performed inside dpdk_send_pkts(), not where mbufs are
  populated.

- seastar seems to expose the ability to send bursts as well,
  dpdk_qp::_send() would also call tx_prepare() before tx_burst().

> Not to mention sample apps.
> But ok, as said above, if you can prove that tx_burst() you are proposing is really much faster then
> tx_prepare(); tx_burst(); 
> I'll be happy to reconsider.

Nowhere I wrote "much" faster, I was just pointing out it would be more
efficient if the work was done in a single function, at least for the use
cases found in several examples you've provided.

Now if you tell me it cannot be significant, fine, but still.

> > > > > > 4. We probably still need some mbuf flag to mark mbufs that cannot be
> > > > > >    modified, the refcount could also serve as a hint.
> > > > >
> > > > > If mbuf can't be modified, you probably just wouldn't call the function that supposed to do that,
> > > > > tx_prepare() in that case.
> > > >
> > > > I think it would be easier to document what offload flags may cause the
> > > > tx_burst() function to modify mbuf contents, so applications have the
> > > > ability to set or strip these flags on a mbuf basis. That way there is no
> > > > need to call tx_prepare() without knowing exactly what it's going to do.
> > >
> > > Not sure I understand what exactly do you propose in the last paragraph?
> > 
> > That for each TX offload flag, we document whether preparation might cause a
> > mbuf to be written to during the tx_prepare()/tx_burst() phase. One of the
> > reasons for tx_prepare() being:
> > 
> >  4) Fields in packet may require different initialization (like e.g. will
> >     require pseudo-header checksum precalculation, sometimes in a
> >     different way depending on packet type, and so on). Now application
> >     needs to care about it.
> > 
> > If we determine what offloads may cause mbuf contents to change (all of them
> > perhaps?), then applications can easily strip those flags from outgoing
> > "const" mbufs. Then it becomes acceptable for tx_burst() to modify mbuf
> > contents as per user request, which removes one reason to rely on
> > tx_prepare() for these.
> 
> Hmm, I didn't get you here.
> If let say I don't need TX TCP cksum offload, why would I set this flag inside mbuf at first place?
> If I do expect PMD to do  TX TCP cksum offload for me, then I have to set that flag,
> otherwise how PMD would know that I did require that offload?

Let me take an example, if you assume a cloned mbuf must be sent, pointed
data is not owned by the TX function, and software preparation for TX
offloads might cause side effects elsewhere, I'm not talking about the
changes performed by HW to the outgoing frame here.

One reason for tx_prepare() according to 4) is to make this clear; by
calling tx_prepare(), applications fully acknowledge and take responsibility
for possible side-effects.

Hence my suggestion, we could make this acceptable for tx_burst() by
documenting that requesting offloads may cause it to modify mbuf contents as
well.

Before you reply, note that I've neither acked nor nacked that series, and
intend to leave this to other people. So far it's received several acks
already so I guess it's probably ready for inclusion.

I'm not comfortable with it as an application developer because of the extra
API call that I think could be included in tx_burst(), but as a PMD
maintainer I do not mind. Might even prove useful someday if some
corner-case offload too expensive to fully handle in tx_burst() had to be
supported. An optional offload if you will.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07  8:11                           ` Yuanhan Liu
@ 2016-12-07 10:13                             ` Ananyev, Konstantin
  2016-12-07 10:18                               ` Yuanhan Liu
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-07 10:13 UTC (permalink / raw)
  To: Yuanhan Liu, Yigit, Ferruh, Olivier Matz
  Cc: Thomas Monjalon, dev, Jan Medala, Jakub Palider,
	Netanel Belgazal, Evgeny Schemeilin, Alejandro Lucero, Yong Wang,
	Andrew Rybchenko, Hemant Agrawal, Kulasek, TomaszX


Hi Yliu,

> 
> On Tue, Dec 06, 2016 at 03:53:42PM +0000, Ferruh Yigit wrote:
> > > Please, we need a comment for each driver saying
> > > "it is OK, we do not need any checksum preparation for TSO"
> > > or
> > > "yes we have to implement tx_prepare or TSO will not work in this mode"
> > >
> 
> Sorry for late. For virtio, I think it's not a must. The checksum stuff
> has been handled inside the Tx function. However, we may could move it
> to tx_prepare, which would actually recover the performance lost
> introduced while enabling TSO for the non-TSO case.
> 

So would you like to provide a patch for it,
Or would you like to keep tx_prepare() for virtio as NOP for now?
Thanks
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07 10:13                             ` Ananyev, Konstantin
@ 2016-12-07 10:18                               ` Yuanhan Liu
  2016-12-07 10:22                                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Yuanhan Liu @ 2016-12-07 10:18 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Yigit, Ferruh, Olivier Matz, Thomas Monjalon, dev, Jan Medala,
	Jakub Palider, Netanel Belgazal, Evgeny Schemeilin,
	Alejandro Lucero, Yong Wang, Andrew Rybchenko, Hemant Agrawal,
	Kulasek, TomaszX

On Wed, Dec 07, 2016 at 10:13:14AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Yliu,
> 
> > 
> > On Tue, Dec 06, 2016 at 03:53:42PM +0000, Ferruh Yigit wrote:
> > > > Please, we need a comment for each driver saying
> > > > "it is OK, we do not need any checksum preparation for TSO"
> > > > or
> > > > "yes we have to implement tx_prepare or TSO will not work in this mode"
> > > >
> > 
> > Sorry for late. For virtio, I think it's not a must. The checksum stuff
> > has been handled inside the Tx function. However, we may could move it
> > to tx_prepare, which would actually recover the performance lost
> > introduced while enabling TSO for the non-TSO case.
> > 
> 
> So would you like to provide a patch for it,
> Or would you like to keep tx_prepare() for virtio as NOP for now?

Hi Konstantin,

I'd keep it as it is for now. It should be a trivial patch after all, that
I could provide it when everything are settled down.

	--yliu

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07 10:18                               ` Yuanhan Liu
@ 2016-12-07 10:22                                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-07 10:22 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Yigit, Ferruh, Olivier Matz, Thomas Monjalon, dev, Jan Medala,
	Jakub Palider, Netanel Belgazal, Evgeny Schemeilin,
	Alejandro Lucero, Yong Wang, Andrew Rybchenko, Hemant Agrawal,
	Kulasek, TomaszX



> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Wednesday, December 7, 2016 10:19 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Olivier Matz <olivier.matz@6wind.com>; Thomas Monjalon <thomas.monjalon@6wind.com>;
> dev@dpdk.org; Jan Medala <jan@semihalf.com>; Jakub Palider <jpa@semihalf.com>; Netanel Belgazal <netanel@amazon.com>; Evgeny
> Schemeilin <evgenys@amazon.com>; Alejandro Lucero <alejandro.lucero@netronome.com>; Yong Wang <yongwang@vmware.com>;
> Andrew Rybchenko <arybchenko@solarflare.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> On Wed, Dec 07, 2016 at 10:13:14AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Yliu,
> >
> > >
> > > On Tue, Dec 06, 2016 at 03:53:42PM +0000, Ferruh Yigit wrote:
> > > > > Please, we need a comment for each driver saying
> > > > > "it is OK, we do not need any checksum preparation for TSO"
> > > > > or
> > > > > "yes we have to implement tx_prepare or TSO will not work in this mode"
> > > > >
> > >
> > > Sorry for late. For virtio, I think it's not a must. The checksum stuff
> > > has been handled inside the Tx function. However, we may could move it
> > > to tx_prepare, which would actually recover the performance lost
> > > introduced while enabling TSO for the non-TSO case.
> > >
> >
> > So would you like to provide a patch for it,
> > Or would you like to keep tx_prepare() for virtio as NOP for now?
> 
> Hi Konstantin,
> 
> I'd keep it as it is for now. It should be a trivial patch after all, that
> I could provide it when everything are settled down.

Ok, thanks for clarification.
Konstantin

> 
> 	--yliu

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-12-07 11:13                         ` Ferruh Yigit
  2016-12-07 12:00                           ` Mcnamara, John
  2016-12-07 12:00                           ` Kulasek, TomaszX
  0 siblings, 2 replies; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-07 11:13 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: konstantin.ananyev, olivier.matz

On 11/23/2016 5:36 PM, Tomasz Kulasek wrote:
> Added "csum txprep (on|off)" command which allows to switch to the
> tx path using Tx preparation API.
> 
> By default unchanged implementation is used.
> 
> Using Tx preparation path, pseudo header calculation for udp/tcp/tso
> packets from application, and used Tx preparation API for
> packet preparation and verification.
> 
> Adding additional step to the csum engine costs about 3-4% of performance
> drop, on my setup with ixgbe driver. It's caused mostly by the need
> of reaccessing and modification of packet data.
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test-pmd/cmdline.c  |   49 +++++++++++++++++++++++++++++++++++++++++++++++
>  app/test-pmd/csumonly.c |   33 ++++++++++++++++++++++++-------
>  app/test-pmd/testpmd.c  |    5 +++++
>  app/test-pmd/testpmd.h  |    2 ++
>  4 files changed, 82 insertions(+), 7 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 63b55dc..373fc59 100644
<...>
> +cmdline_parse_inst_t cmd_csum_txprep = {
> +	.f = cmd_csum_txprep_parsed,
> +	.data = NULL,
> +	.help_str = "enable/disable tx preparation path for csum engine: "
> +	"csum txprep on|off",

Can you please format help string as:
"cmd fixed_string fixed|string|options <variable>: Description"
see commit 26faac80327f

above becomes:
"csum txprep on|off: Enable/Disable tx preparation path for csum engine"

<...>

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-12-07 11:13                         ` Ferruh Yigit
@ 2016-12-07 12:00                           ` Mcnamara, John
  2016-12-07 12:12                             ` Kulasek, TomaszX
  2016-12-07 12:00                           ` Kulasek, TomaszX
  1 sibling, 1 reply; 261+ messages in thread
From: Mcnamara, John @ 2016-12-07 12:00 UTC (permalink / raw)
  To: Yigit, Ferruh, Kulasek, TomaszX, dev; +Cc: Ananyev, Konstantin, olivier.matz

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Wednesday, December 7, 2016 11:14 AM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> ...
> <...>
> > +cmdline_parse_inst_t cmd_csum_txprep = {
> > +	.f = cmd_csum_txprep_parsed,
> > +	.data = NULL,
> > +	.help_str = "enable/disable tx preparation path for csum engine: "
> > +	"csum txprep on|off",
> 
> Can you please format help string as:
> "cmd fixed_string fixed|string|options <variable>: Description"
> see commit 26faac80327f
> 
> above becomes:
> "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> 


Also, does this require an update to the testpmd docs?

 

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-12-07 11:13                         ` Ferruh Yigit
  2016-12-07 12:00                           ` Mcnamara, John
@ 2016-12-07 12:00                           ` Kulasek, TomaszX
  1 sibling, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-07 12:00 UTC (permalink / raw)
  To: Yigit, Ferruh, dev; +Cc: Ananyev, Konstantin, olivier.matz

Hi,

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 12:14
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> On 11/23/2016 5:36 PM, Tomasz Kulasek wrote:
> > Added "csum txprep (on|off)" command which allows to switch to the tx
> > path using Tx preparation API.
> >
> > By default unchanged implementation is used.
> >
> > Using Tx preparation path, pseudo header calculation for udp/tcp/tso
> > packets from application, and used Tx preparation API for packet
> > preparation and verification.
> >
> > Adding additional step to the csum engine costs about 3-4% of
> > performance drop, on my setup with ixgbe driver. It's caused mostly by
> > the need of reaccessing and modification of packet data.
> >
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  app/test-pmd/cmdline.c  |   49
> +++++++++++++++++++++++++++++++++++++++++++++++
> >  app/test-pmd/csumonly.c |   33 ++++++++++++++++++++++++-------
> >  app/test-pmd/testpmd.c  |    5 +++++
> >  app/test-pmd/testpmd.h  |    2 ++
> >  4 files changed, 82 insertions(+), 7 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 63b55dc..373fc59 100644
> <...>
> > +cmdline_parse_inst_t cmd_csum_txprep = {
> > +	.f = cmd_csum_txprep_parsed,
> > +	.data = NULL,
> > +	.help_str = "enable/disable tx preparation path for csum engine: "
> > +	"csum txprep on|off",
> 
> Can you please format help string as:
> "cmd fixed_string fixed|string|options <variable>: Description"
> see commit 26faac80327f
> 
> above becomes:
> "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> 
> <...>

Sure, thanks.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-12-07 12:00                           ` Mcnamara, John
@ 2016-12-07 12:12                             ` Kulasek, TomaszX
  2016-12-07 12:49                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-07 12:12 UTC (permalink / raw)
  To: Mcnamara, John, Yigit, Ferruh, dev; +Cc: Ananyev, Konstantin, olivier.matz

Hi John,

> -----Original Message-----
> From: Mcnamara, John
> Sent: Wednesday, December 7, 2016 13:01
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Wednesday, December 7, 2016 11:14 AM
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > olivier.matz@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > csum engine
> >
> > ...
> > <...>
> > > +cmdline_parse_inst_t cmd_csum_txprep = {
> > > +	.f = cmd_csum_txprep_parsed,
> > > +	.data = NULL,
> > > +	.help_str = "enable/disable tx preparation path for csum engine: "
> > > +	"csum txprep on|off",
> >
> > Can you please format help string as:
> > "cmd fixed_string fixed|string|options <variable>: Description"
> > see commit 26faac80327f
> >
> > above becomes:
> > "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> >
> 
> 
> Also, does this require an update to the testpmd docs?
> 
> 

Yes, I think. I will add adequate description to the docs.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
  2016-12-07 12:12                             ` Kulasek, TomaszX
@ 2016-12-07 12:49                               ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-07 12:49 UTC (permalink / raw)
  To: Kulasek, TomaszX, Mcnamara, John, Yigit, Ferruh, dev; +Cc: olivier.matz

Hi everyone,

> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Wednesday, December 7, 2016 12:12 PM
> To: Mcnamara, John <john.mcnamara@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; olivier.matz@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine
> 
> Hi John,
> 
> > -----Original Message-----
> > From: Mcnamara, John
> > Sent: Wednesday, December 7, 2016 13:01
> > To: Yigit, Ferruh <ferruh.yigit@intel.com>; Kulasek, TomaszX
> > <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > olivier.matz@6wind.com
> > Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > csum engine
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > > Sent: Wednesday, December 7, 2016 11:14 AM
> > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > > olivier.matz@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > > csum engine
> > >
> > > ...
> > > <...>
> > > > +cmdline_parse_inst_t cmd_csum_txprep = {
> > > > +	.f = cmd_csum_txprep_parsed,
> > > > +	.data = NULL,
> > > > +	.help_str = "enable/disable tx preparation path for csum engine: "
> > > > +	"csum txprep on|off",
> > >
> > > Can you please format help string as:
> > > "cmd fixed_string fixed|string|options <variable>: Description"
> > > see commit 26faac80327f
> > >
> > > above becomes:
> > > "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> > >
> >
> >
> > Also, does this require an update to the testpmd docs?
> >
> >
> 
> Yes, I think. I will add adequate description to the docs.

I suppose if we'll get tx_prepare() addressed on all PMDs DPDK support
we can safely remove that command.
Konstantin

> 
> Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07 10:03                                       ` Ananyev, Konstantin
@ 2016-12-07 14:31                                         ` Alejandro Lucero
  2016-12-08 18:20                                         ` Yong Wang
  1 sibling, 0 replies; 261+ messages in thread
From: Alejandro Lucero @ 2016-12-07 14:31 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Yigit, Ferruh, Yong Wang, Thomas Monjalon, Harish Patil, dev,
	Rahul Lakkireddy, Stephen Hurd, Jan Medala, Jakub Palider,
	John Daley, Adrien Mazarguil, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Kulasek, TomaszX, olivier.matz

For NFP, we do not have TSO support yet, although it is coming and
hopefully it will be within next release.

Regarding this email thread, it is "it is OK, we do not need any checksum
preparation for TSO"

On Wed, Dec 7, 2016 at 10:03 AM, Ananyev, Konstantin <
konstantin.ananyev@intel.com> wrote:

>
> Hi Ferruh,
>
> >
> > On 12/6/2016 6:25 PM, Yong Wang wrote:
> > >> -----Original Message-----
> > >> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > >> Sent: Sunday, December 4, 2016 4:11 AM
> > >> To: Yong Wang <yongwang@vmware.com>; Thomas Monjalon
> > >> <thomas.monjalon@6wind.com>
> > >> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul
> Lakkireddy
> > >> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> > >> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
> > >> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
> > >> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> > >> <alejandro.lucero@netronome.com>; Rasesh Mody
> > >> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> > >> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
> > >> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> > >> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> > >>
> > >> Hi
> > >>
> > >>
> > >>
> > >>>>
> > >>
> > >>>> 2016-11-30 17:42, Ananyev, Konstantin:
> > >>
> > >>>>>>> Please, we need a comment for each driver saying
> > >>
> > >>>>>>> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >>>>>>> or
> > >>
> > >>>>>>> "yes we have to implement tx_prepare or TSO will not work in this
> > >>
> > >>>> mode"
> > >>
> > >>>>>>>
> > >>
> > >>>>>>
> > >>
> > >>>>>> qede PMD doesn’t currently support TSO yet, it only supports Tx
> > >>
> > >>>> TCP/UDP/IP
> > >>
> > >>>>>> csum offloads.
> > >>
> > >>>>>> So Tx preparation isn’t applicable. So as of now -
> > >>
> > >>>>>> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >>>>>
> > >>
> > >>>>> Thanks for the answer.
> > >>
> > >>>>> Though please note that it not only for TSO.
> > >>
> > >>>>
> > >>
> > >>>> Oh yes, sorry, my wording was incorrect.
> > >>
> > >>>> We need to know if any checksum preparation is needed prior
> > >>
> > >>>> offloading its final computation to the hardware or driver.
> > >>
> > >>>> So the question applies to TSO and simple checksum offload.
> > >>
> > >>>>
> > >>
> > >>>> We are still waiting answers for
> > >>
> > >>>>  bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> > >>
> > >>>
> > >>
> > >>> The case for a virtual device is a little bit more complicated as
> packets
> > >> offloaded from a virtual device can eventually be delivered to
> > >>
> > >>> another virtual NIC or different physical NICs that have different
> offload
> > >> requirements.  In ESX, the hypervisor will enforce that the packets
> > >>
> > >>> offloaded will be something that the hardware expects.  The contract
> for
> > >> vmxnet3 is that the guest needs to fill in pseudo header checksum
> > >>
> > >>> for both l4 checksum only and TSO + l4 checksum offload cases.
> > >>
> > >>
> > >>
> > >> Ok, so at first glance that looks to me very similar to Intel HW
> requirements.
> > >>
> > >> Could you confirm would rte_net_intel_cksum_prepare()
> > >>
> > >> also work for vmxnet3 or some extra modifications are required?
> > >>
> > >> You can look at it here: https://urldefense.proofpoint.
> com/v2/url?u=http-
> > >> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> > >> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> > >> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> > >> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> > >>
> > >> Note that for Intel HW the rules for pseudo-header csum calculation
> > >>
> > >> differ for TSO and non-TSO case.
> > >>
> > >> For TSO length inside pseudo-header are set to 0, while for non-tso
> case
> > >>
> > >> It should be set to L3 payload length.
> > >>
> > >> Is it the same for vmxnet3 or no?
> > >>
> > >> Thanks
> > >>
> > >> Konstantin
> > >>
> > >
> > > Yes and this is the same for vmxnet3.
> > >
> >
> > This means vmxnet3 PMD also should be updated, right?
>
> Yes, that's right.
>
> >Should that update
> > be part of tx_prep patchset? Or separate patch?
>
> Another question I suppose is who will do the actual patch for vmxnet3.
> Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
> Please note, that in both cases will need your help in testing/reviewing
> it.
> Konstantin
>
> >
> > >>>
> > >>
> > >>>>> This is for any TX offload for which the upper layer SW would have
> > >>
> > >>>>> to modify the contents of the packet.
> > >>
> > >>>>> Though as I can see for qede neither PKT_TX_IP_CKSUM or
> > >>
> > >>>> PKT_TX_TCP_CKSUM
> > >>
> > >>>>> exhibits any extra requirements for the user.
> > >>
> > >>>>> Is that correct?
> > >>
> > >>
> > >
>
>

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-02 16:17                               ` Ananyev, Konstantin
@ 2016-12-08 17:24                                 ` Olivier Matz
  2016-12-09 17:19                                   ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Olivier Matz @ 2016-12-08 17:24 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, Kulasek, TomaszX, dev

Hi Konstantin,

On Fri, 2 Dec 2016 16:17:51 +0000, "Ananyev, Konstantin"
<konstantin.ananyev@intel.com> wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Friday, December 2, 2016 8:24 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> > <tomaszx.kulasek@intel.com>; dev@dpdk.org Subject: Re: [dpdk-dev]
> > [PATCH v12 1/6] ethdev: add Tx preparation
> > 
> > Hi Konstantin,
> > 
> > On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin"
> > <konstantin.ananyev@intel.com> wrote:  
> > > >
> > > > 2016-11-23 18:36, Tomasz Kulasek:  
> > > > > +/**
> > > > > + * Process a burst of output packets on a transmit queue of
> > > > > an Ethernet device.
> > > > > + *
> > > > > + * The rte_eth_tx_prepare() function is invoked to prepare
> > > > > output packets to be
> > > > > + * transmitted on the output queue *queue_id* of the Ethernet
> > > > > device designated
> > > > > + * by its *port_id*.
> > > > > + * The *nb_pkts* parameter is the number of packets to be
> > > > > prepared which are
> > > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures,
> > > > > each of them
> > > > > + * allocated from a pool created with
> > > > > rte_pktmbuf_pool_create().
> > > > > + * For each packet to send, the rte_eth_tx_prepare() function
> > > > > performs
> > > > > + * the following operations:
> > > > > + *
> > > > > + * - Check if packet meets devices requirements for tx
> > > > > offloads.
> > > > > + *
> > > > > + * - Check limitations about number of segments.
> > > > > + *
> > > > > + * - Check additional requirements when debug is enabled.
> > > > > + *
> > > > > + * - Update and/or reset required checksums when tx offload
> > > > > is set for packet.
> > > > > + *
> > > > > + * Since this function can modify packet data, provided mbufs
> > > > > must be safely
> > > > > + * writable (e.g. modified data cannot be in shared
> > > > > segment).  
> > > >
> > > > I think we will have to remove this limitation in next releases.
> > > > As we don't know how it could affect the API, I suggest to
> > > > declare this API EXPERIMENTAL.  
> > >
> > > While I don't really mind to mart it as experimental, I don't
> > > really understand the reasoning: Why " this function can modify
> > > packet data, provided mbufs must be safely writable" suddenly
> > > becomes a problem? That seems like and obvious limitation to me
> > > and let say tx_burst() has the same one. Second, I don't see how
> > > you are going to remove it without introducing a heavy
> > > performance impact. Konstantin 
> > 
> > About tx_burst(), I don't think we should force the user to provide
> > a writable mbuf. There are many use cases where passing a clone
> > already works as of today and it avoids duplicating the mbuf data.
> > For instance: traffic generator, multicast, bridging/tap, etc...
> > 
> > Moreover, this requirement would be inconsistent with the model you
> > are proposing in case of pipeline:
> >  - tx_prepare() on core X, may update the data
> >  - tx_burst() on core Y, should not touch the data to avoid cache
> > misses 
> 
> Probably I wasn't very clear in my previous mail.
> I am not saying that we should force the user to pass a writable mbuf.
> What I am saying that for tx_burst() current expectation is that
> after mbuf is handled to tx_burst() user shouldn't try to modify its
> buffer contents till TX engine is done with the buffer (mbuf_free()
> is called by TX func for it). For tx_prep(), I think, it is the same
> though restrictions are a bit more strict: user should not try to
> read/write to the mbuf while tx_prep() is not finished with it. What
> puzzles me is that why that should be the reason to mark tx_prep() as
> experimental. Konstantin

To be sure we're on the same page, let me reword:

- mbufs passed to tx_prepare() by the application must have their
  headers (l2_len + l3_len + l4_len) writable because the phdr checksum
  can be replaced. It could be precised in the api comment.

- mbufs passed to tx_burst() must not be modified by the driver/hw, nor
  by the application.


About the API itself, I have one more question. I know you've
already discussed this a bit with Adrien, I don't want to spawn a new
big thread from here ;)

The API provides tx_prepare() to check the packets have the proper
format, and possibly modify them (ex: csum offload / tso) to match hw
requirements. So it does both checks (number of segments) and fixes
(csum/tso). What determines things that should be checked and things
that should be fixed?

The application gets few information from tx_prepare() about what should
be done to make the packet accepted by the hw, and the actions will
probably be different depending on hardware. So could we imagine that
in the future the function also tries to fix the packet? I've seen your
comment saying that it has to be an application decision, so what about
having a parameter saying "fix the packet" or "don't fix it"?

About rte_eth_desc_lim->nb_seg_max and
rte_eth_desc_lim->nb_mtu_seg_max, I'm still quite reserved, especially
for the 2nd one, because I don't see how it can be used by the
application. Well, it does not hurt to have them, but for me it looks a
bit useless.

Last thing, I think this API should become the default in the future.
For instance, it would prevent the application to calculate a phdr csum
that will not be used by the hw. Not calling tx_prepare() would require
the user/application to exactly knows the underlying hw and the kind of
packet that are generated. So for me it means we'll need to also update
other examples (other testpmd engines, l2fwd, ...). Do you agree?


Regards,
Olivier

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-07 10:03                                       ` Ananyev, Konstantin
  2016-12-07 14:31                                         ` Alejandro Lucero
@ 2016-12-08 18:20                                         ` Yong Wang
  2016-12-09 14:40                                           ` Jan Mędala
  2016-12-12 17:29                                           ` Ananyev, Konstantin
  1 sibling, 2 replies; 261+ messages in thread
From: Yong Wang @ 2016-12-08 18:20 UTC (permalink / raw)
  To: Ananyev, Konstantin, Yigit, Ferruh, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz

> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Wednesday, December 7, 2016 2:03 AM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>; Yong Wang
> <yongwang@vmware.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul Lakkireddy
> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>; Jakub
> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>; Adrien
> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> <alejandro.lucero@netronome.com>; Rasesh Mody
> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 
> Hi Ferruh,
> 
> >
> > On 12/6/2016 6:25 PM, Yong Wang wrote:
> > >> -----Original Message-----
> > >> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > >> Sent: Sunday, December 4, 2016 4:11 AM
> > >> To: Yong Wang <yongwang@vmware.com>; Thomas Monjalon
> > >> <thomas.monjalon@6wind.com>
> > >> Cc: Harish Patil <harish.patil@qlogic.com>; dev@dpdk.org; Rahul
> Lakkireddy
> > >> <rahul.lakkireddy@chelsio.com>; Stephen Hurd
> > >> <stephen.hurd@broadcom.com>; Jan Medala <jan@semihalf.com>;
> Jakub
> > >> Palider <jpa@semihalf.com>; John Daley <johndale@cisco.com>;
> Adrien
> > >> Mazarguil <adrien.mazarguil@6wind.com>; Alejandro Lucero
> > >> <alejandro.lucero@netronome.com>; Rasesh Mody
> > >> <rasesh.mody@qlogic.com>; Jacob, Jerin <Jerin.Jacob@cavium.com>;
> > >> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Kulasek, TomaszX
> > >> <tomaszx.kulasek@intel.com>; olivier.matz@6wind.com
> > >> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> > >>
> > >> Hi
> > >>
> > >>
> > >>
> > >>>>
> > >>
> > >>>> 2016-11-30 17:42, Ananyev, Konstantin:
> > >>
> > >>>>>>> Please, we need a comment for each driver saying
> > >>
> > >>>>>>> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >>>>>>> or
> > >>
> > >>>>>>> "yes we have to implement tx_prepare or TSO will not work in
> this
> > >>
> > >>>> mode"
> > >>
> > >>>>>>>
> > >>
> > >>>>>>
> > >>
> > >>>>>> qede PMD doesn’t currently support TSO yet, it only supports Tx
> > >>
> > >>>> TCP/UDP/IP
> > >>
> > >>>>>> csum offloads.
> > >>
> > >>>>>> So Tx preparation isn’t applicable. So as of now -
> > >>
> > >>>>>> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >>>>>
> > >>
> > >>>>> Thanks for the answer.
> > >>
> > >>>>> Though please note that it not only for TSO.
> > >>
> > >>>>
> > >>
> > >>>> Oh yes, sorry, my wording was incorrect.
> > >>
> > >>>> We need to know if any checksum preparation is needed prior
> > >>
> > >>>> offloading its final computation to the hardware or driver.
> > >>
> > >>>> So the question applies to TSO and simple checksum offload.
> > >>
> > >>>>
> > >>
> > >>>> We are still waiting answers for
> > >>
> > >>>> 	bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> > >>
> > >>>
> > >>
> > >>> The case for a virtual device is a little bit more complicated as packets
> > >> offloaded from a virtual device can eventually be delivered to
> > >>
> > >>> another virtual NIC or different physical NICs that have different
> offload
> > >> requirements.  In ESX, the hypervisor will enforce that the packets
> > >>
> > >>> offloaded will be something that the hardware expects.  The contract
> for
> > >> vmxnet3 is that the guest needs to fill in pseudo header checksum
> > >>
> > >>> for both l4 checksum only and TSO + l4 checksum offload cases.
> > >>
> > >>
> > >>
> > >> Ok, so at first glance that looks to me very similar to Intel HW
> requirements.
> > >>
> > >> Could you confirm would rte_net_intel_cksum_prepare()
> > >>
> > >> also work for vmxnet3 or some extra modifications are required?
> > >>
> > >> You can look at it here:
> https://urldefense.proofpoint.com/v2/url?u=http-
> > >>
> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> > >>
> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> > >> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> > >> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> > >>
> > >> Note that for Intel HW the rules for pseudo-header csum calculation
> > >>
> > >> differ for TSO and non-TSO case.
> > >>
> > >> For TSO length inside pseudo-header are set to 0, while for non-tso case
> > >>
> > >> It should be set to L3 payload length.
> > >>
> > >> Is it the same for vmxnet3 or no?
> > >>
> > >> Thanks
> > >>
> > >> Konstantin
> > >>
> > >
> > > Yes and this is the same for vmxnet3.
> > >
> >
> > This means vmxnet3 PMD also should be updated, right?
> 
> Yes, that's right.
> 
> >Should that update
> > be part of tx_prep patchset? Or separate patch?
> 
> Another question I suppose is who will do the actual patch for vmxnet3.
> Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
> Please note, that in both cases will need your help in testing/reviewing it.
> Konstantin

It will be great if you can put together a patch as part of the entire patchset on tx_prep() for vmxnet3 and I will definitely help review it.

Regarding testing, I can definitely help but I don't have a testing harness to cover the entire matrix (different ESX version, different vmxnet3 device version, VM-VM, VM-physical over different uplinks, etc.) so it will be limited.  Related to this, I have the impression that Intel has some existing coverage for vmxnet3 as well as other NICs.  Do we know if that will cover this use case as well?

> >
> > >>>
> > >>
> > >>>>> This is for any TX offload for which the upper layer SW would have
> > >>
> > >>>>> to modify the contents of the packet.
> > >>
> > >>>>> Though as I can see for qede neither PKT_TX_IP_CKSUM or
> > >>
> > >>>> PKT_TX_TCP_CKSUM
> > >>
> > >>>>> exhibits any extra requirements for the user.
> > >>
> > >>>>> Is that correct?
> > >>
> > >>
> > >


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-01 23:50                                   ` Thomas Monjalon
@ 2016-12-09 13:25                                     ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-09 13:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin, olivier.matz, Richardson, Bruce

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, December 2, 2016 00:51
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> 2016-12-01 22:31, Kulasek, TomaszX:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-12-01 19:20, Kulasek, TomaszX:
> > > > Hi Thomas,
> > > >
> > > > Sorry, I have answered for this question in another thread and I
> > > > missed
> > > about this one. Detailed answer is below.
> > >
> > > Yes you already gave this answer.
> > > And I will continue asking the question until you understand it.
> > >
> > > > > 2016-11-28 11:54, Thomas Monjalon:
> > > > > > Hi,
> > > > > >
> > > > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > > > --- a/config/common_base
> > > > > > > +++ b/config/common_base
> > > > > > > @@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
> > > > > > >  CONFIG_RTE_LIBRTE_IEEE1588=n
> > > > > > >  CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
> > > > > > >  CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
> > > > > > > +CONFIG_RTE_ETHDEV_TX_PREPARE=y
> > > > > >
> > > > > > Please, remind me why is there a configuration here.
> > > > > > It should be the responsibility of the application to call
> > > > > > tx_prepare or not. If the application choose to use this new
> > > > > > API but it is disabled, then the packets won't be prepared and
> > > > > > there is
> > > no error code:
> > > > > >
> > > > > > > +#else
> > > > > > > +
> > > > > > > +static inline uint16_t
> > > > > > > +rte_eth_tx_prepare(__rte_unused uint8_t port_id,
> > > > > > > +__rte_unused
> > > > > uint16_t queue_id,
> > > > > > > +               __rte_unused struct rte_mbuf **tx_pkts,
> > > > > > > +uint16_t
> > > > > > > +nb_pkts) {
> > > > > > > +       return nb_pkts;
> > > > > > > +}
> > > > > > > +
> > > > > > > +#endif
> > > > > >
> > > > > > So the application is not aware of the issue and it will not
> > > > > > use any fallback.
> > > >
> > > > tx_prepare mechanism can be turned off by compilation flag (as
> > > > discussed
> > > with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide
> > > real NOOP functionality (e.g. for low-end CPUs, where even
> > > unnecessary memory dereference and check can have significant impact
> on performance).
> > > >
> > > > Jerin observed that on some architectures (e.g. low-end ARM with
> > > embedded NIC), just reading and comparing 'dev->tx_pkt_prepare' may
> > > cause significant performance drop, so he proposed to introduce this
> > > configuration flag to provide real NOOP when tx_prepare
> > > functionality is not required, and can be turned on based on the
> _target_ configuration.
> > > >
> > > > For other cases, when this flag is turned on (by default), and
> > > tx_prepare is not implemented, functional NOOP is used based on
> > > comparison (dev->tx_pkt_prepare == NULL).
> > >
> > > So if the application call this function and if it is disabled, it
> > > simply won't work. Packets won't be prepared, checksum won't be
> computed.
> > >
> > > I give up, I just NACK.
> >
> > It is not to be turned on/off whatever someone wants, but only and only
> for the case, when platform developer knows, that his platform doesn't
> need this callback, so, he may turn off it and then save some performance
> (this option is per target).
> 
> How may he know? There is no comment in the config file, no documentation.
> 
> > For this case, the behavior of tx_prepare will be exactly the same when
> it is turned on or off. If is not the same, there's no sense to turn it
> off. There were long topic, where we've tried to convince you, that it
> should be turned on for all devices.
> 
> Really? You tried to convince me to turn it on?
> No you were trying to convince Jerin.
> I think it is a wrong idea to allow disabling this function.
> I didn't comment in first discussion because Jerin told it was really
> important for small hardware with fixed NIC, and I thought it would be
> implemented in a way the application cannot be misleaded.
> 
> The only solution I see here is to add some comments in the configuration
> file, below the #else and in the doc.
> Have you checked doc/guides/prog_guide/poll_mode_drv.rst?

I can change the name of CONFIG_RTE_ETHDEV_TX_PREPARE=y to something like CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n to made it less confusing, and add comments to describe why it is introduced and how it use safely.

I can also remove it at all if you don't like it.

As for doc/guides/prog_guide/poll_mode_drv.rst, do you mean, to add new section describing this feature?

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-08 18:20                                         ` Yong Wang
@ 2016-12-09 14:40                                           ` Jan Mędala
  2016-12-12 15:02                                             ` Ananyev, Konstantin
  2016-12-12 17:29                                           ` Ananyev, Konstantin
  1 sibling, 1 reply; 261+ messages in thread
From: Jan Mędala @ 2016-12-09 14:40 UTC (permalink / raw)
  To: Yigit, Ferruh
  Cc: Ananyev, Konstantin, Thomas Monjalon, Harish Patil, dev,
	Rahul Lakkireddy, Stephen Hurd, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Kulasek, TomaszX, olivier.matz, Yong Wang

Hello,

Sorry for late response.

>From ENA perspective, we need to dig deeper about the requirements and use
cases, but I'm pretty confident (95%) that ena will need to implement
tx_prep() API. There is at least one scenario, when HW relay on partial
checksum.


Jan

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-08 17:24                                 ` Olivier Matz
@ 2016-12-09 17:19                                   ` Kulasek, TomaszX
  2016-12-12 11:51                                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-09 17:19 UTC (permalink / raw)
  To: Olivier Matz, Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

Hi Oliver,

My 5 cents below:

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, December 8, 2016 18:24
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> Hi Konstantin,
> 
> On Fri, 2 Dec 2016 16:17:51 +0000, "Ananyev, Konstantin"
> <konstantin.ananyev@intel.com> wrote:
> > Hi Olivier,
> >
> > > -----Original Message-----
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, December 2, 2016 8:24 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> > > <tomaszx.kulasek@intel.com>; dev@dpdk.org Subject: Re: [dpdk-dev]
> > > [PATCH v12 1/6] ethdev: add Tx preparation
> > >
> > > Hi Konstantin,
> > >
> > > On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin"
> > > <konstantin.ananyev@intel.com> wrote:
> > > > >
> > > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > > +/**
> > > > > > + * Process a burst of output packets on a transmit queue of
> > > > > > an Ethernet device.
> > > > > > + *
> > > > > > + * The rte_eth_tx_prepare() function is invoked to prepare
> > > > > > output packets to be
> > > > > > + * transmitted on the output queue *queue_id* of the Ethernet
> > > > > > device designated
> > > > > > + * by its *port_id*.
> > > > > > + * The *nb_pkts* parameter is the number of packets to be
> > > > > > prepared which are
> > > > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures,
> > > > > > each of them
> > > > > > + * allocated from a pool created with
> > > > > > rte_pktmbuf_pool_create().
> > > > > > + * For each packet to send, the rte_eth_tx_prepare() function
> > > > > > performs
> > > > > > + * the following operations:
> > > > > > + *
> > > > > > + * - Check if packet meets devices requirements for tx
> > > > > > offloads.
> > > > > > + *
> > > > > > + * - Check limitations about number of segments.
> > > > > > + *
> > > > > > + * - Check additional requirements when debug is enabled.
> > > > > > + *
> > > > > > + * - Update and/or reset required checksums when tx offload
> > > > > > is set for packet.
> > > > > > + *
> > > > > > + * Since this function can modify packet data, provided mbufs
> > > > > > must be safely
> > > > > > + * writable (e.g. modified data cannot be in shared
> > > > > > segment).
> > > > >
> > > > > I think we will have to remove this limitation in next releases.
> > > > > As we don't know how it could affect the API, I suggest to
> > > > > declare this API EXPERIMENTAL.
> > > >
> > > > While I don't really mind to mart it as experimental, I don't
> > > > really understand the reasoning: Why " this function can modify
> > > > packet data, provided mbufs must be safely writable" suddenly
> > > > becomes a problem? That seems like and obvious limitation to me
> > > > and let say tx_burst() has the same one. Second, I don't see how
> > > > you are going to remove it without introducing a heavy performance
> > > > impact. Konstantin
> > >
> > > About tx_burst(), I don't think we should force the user to provide
> > > a writable mbuf. There are many use cases where passing a clone
> > > already works as of today and it avoids duplicating the mbuf data.
> > > For instance: traffic generator, multicast, bridging/tap, etc...
> > >
> > > Moreover, this requirement would be inconsistent with the model you
> > > are proposing in case of pipeline:
> > >  - tx_prepare() on core X, may update the data
> > >  - tx_burst() on core Y, should not touch the data to avoid cache
> > > misses
> >
> > Probably I wasn't very clear in my previous mail.
> > I am not saying that we should force the user to pass a writable mbuf.
> > What I am saying that for tx_burst() current expectation is that after
> > mbuf is handled to tx_burst() user shouldn't try to modify its buffer
> > contents till TX engine is done with the buffer (mbuf_free() is called
> > by TX func for it). For tx_prep(), I think, it is the same though
> > restrictions are a bit more strict: user should not try to read/write
> > to the mbuf while tx_prep() is not finished with it. What puzzles me
> > is that why that should be the reason to mark tx_prep() as
> > experimental. Konstantin
> 
> To be sure we're on the same page, let me reword:
> 
> - mbufs passed to tx_prepare() by the application must have their
>   headers (l2_len + l3_len + l4_len) writable because the phdr checksum
>   can be replaced. It could be precised in the api comment.
> 
> - mbufs passed to tx_burst() must not be modified by the driver/hw, nor
>   by the application.
> 
> 
> About the API itself, I have one more question. I know you've already
> discussed this a bit with Adrien, I don't want to spawn a new big thread
> from here ;)
> 
> The API provides tx_prepare() to check the packets have the proper format,
> and possibly modify them (ex: csum offload / tso) to match hw
> requirements. So it does both checks (number of segments) and fixes
> (csum/tso). What determines things that should be checked and things that
> should be fixed?
> 

1) Performance -- we may assume that packets are created with the common rules (e.g. doesn't tries to count IP checksum for IPv6 packet, sets all required fields, etc.). For now, additional checks are done only in DEBUG mode.

2) Uncommon requirements, such a number of segments for TSO/non-TSO, where invalid values can cause unexpected results (or even hang device on burst), so it is critical (at least for ixgbe and i40e) and must be checked.

3) Checksum field must be initialized in a hardware specific way to count it properly.

4) If packet is invalid its content isn't modified nor restored.

> The application gets few information from tx_prepare() about what should
> be done to make the packet accepted by the hw, and the actions will
> probably be different depending on hardware. So could we imagine that in
> the future the function also tries to fix the packet? I've seen your
> comment saying that it has to be an application decision, so what about
> having a parameter saying "fix the packet" or "don't fix it"?
> 

The main question here is how invasive in packet data tx_prepare should be.

If I understand you correctly, you're talking about the situation when tx_prepare will try to restore packet internally, e.g. linearize data for i40e to meet number of segments requirements, split mbufs, when are too large, maybe someone will wanted to have full software checksum computation fallback for devices which doesn't support it, and so on? And then this parameter will indicate "do the required work for me", or "just check, initialize what should be initialized, if fails, let me decide"?

Implemented model is quite simple and clear:

1) Validate uncommon requirements which may cause a problem and let application to deal with it.
2) If packet is valid, initialize required fields according to the hardware requirements.

In fact, it doesn't fix anything for now, but initializes checksums, and the main reason why it is done here, is the fact that it cannot be done in a trivial way in the application itself.

IMHO, we should to keep this API as simple as it possible and not let it grow in unexpected way (also for performance reason).

> About rte_eth_desc_lim->nb_seg_max and
> rte_eth_desc_lim->nb_mtu_seg_max, I'm still quite reserved, especially for
> the 2nd one, because I don't see how it can be used by the application.
> Well, it does not hurt to have them, but for me it looks a bit useless.
> 

* "nb_seg_max": Maximum number of segments per whole packet.
* "nb_mtu_seg_max": Maximum number of segments per one MTU.

Application can use provided values in a fallowed way:

* For non-TSO packet, a single transmit packet may span up to "nb_mtu_seg_max" buffers.
* For TSO packet the total number of data descriptors is "nb_seg_max", and each segment within the TSO may span up to "nb_mtu_seg_max".

> Last thing, I think this API should become the default in the future.
> For instance, it would prevent the application to calculate a phdr csum
> that will not be used by the hw. Not calling tx_prepare() would require
> the user/application to exactly knows the underlying hw and the kind of
> packet that are generated. So for me it means we'll need to also update
> other examples (other testpmd engines, l2fwd, ...). Do you agree?
> 

Most of sample applications doesn't use TX offloads at all and doesn't count checksums. So the question is "should tx_prepare be used even if not required?"

> 
> Regards,
> Olivier
> 

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-09 17:19                                   ` Kulasek, TomaszX
@ 2016-12-12 11:51                                     ` Ananyev, Konstantin
  2016-12-22 13:30                                       ` Thomas Monjalon
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-12 11:51 UTC (permalink / raw)
  To: Kulasek, TomaszX, Olivier Matz; +Cc: Thomas Monjalon, dev

Hi Olivier and Tomasz,

> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Friday, December 9, 2016 5:19 PM
> To: Olivier Matz <olivier.matz@6wind.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> 
> Hi Oliver,
> 
> My 5 cents below:
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Thursday, December 8, 2016 18:24
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> > <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
> >
> > Hi Konstantin,
> >
> > On Fri, 2 Dec 2016 16:17:51 +0000, "Ananyev, Konstantin"
> > <konstantin.ananyev@intel.com> wrote:
> > > Hi Olivier,
> > >
> > > > -----Original Message-----
> > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > Sent: Friday, December 2, 2016 8:24 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Kulasek, TomaszX
> > > > <tomaszx.kulasek@intel.com>; dev@dpdk.org Subject: Re: [dpdk-dev]
> > > > [PATCH v12 1/6] ethdev: add Tx preparation
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Fri, 2 Dec 2016 01:06:30 +0000, "Ananyev, Konstantin"
> > > > <konstantin.ananyev@intel.com> wrote:
> > > > > >
> > > > > > 2016-11-23 18:36, Tomasz Kulasek:
> > > > > > > +/**
> > > > > > > + * Process a burst of output packets on a transmit queue of
> > > > > > > an Ethernet device.
> > > > > > > + *
> > > > > > > + * The rte_eth_tx_prepare() function is invoked to prepare
> > > > > > > output packets to be
> > > > > > > + * transmitted on the output queue *queue_id* of the Ethernet
> > > > > > > device designated
> > > > > > > + * by its *port_id*.
> > > > > > > + * The *nb_pkts* parameter is the number of packets to be
> > > > > > > prepared which are
> > > > > > > + * supplied in the *tx_pkts* array of *rte_mbuf* structures,
> > > > > > > each of them
> > > > > > > + * allocated from a pool created with
> > > > > > > rte_pktmbuf_pool_create().
> > > > > > > + * For each packet to send, the rte_eth_tx_prepare() function
> > > > > > > performs
> > > > > > > + * the following operations:
> > > > > > > + *
> > > > > > > + * - Check if packet meets devices requirements for tx
> > > > > > > offloads.
> > > > > > > + *
> > > > > > > + * - Check limitations about number of segments.
> > > > > > > + *
> > > > > > > + * - Check additional requirements when debug is enabled.
> > > > > > > + *
> > > > > > > + * - Update and/or reset required checksums when tx offload
> > > > > > > is set for packet.
> > > > > > > + *
> > > > > > > + * Since this function can modify packet data, provided mbufs
> > > > > > > must be safely
> > > > > > > + * writable (e.g. modified data cannot be in shared
> > > > > > > segment).
> > > > > >
> > > > > > I think we will have to remove this limitation in next releases.
> > > > > > As we don't know how it could affect the API, I suggest to
> > > > > > declare this API EXPERIMENTAL.
> > > > >
> > > > > While I don't really mind to mart it as experimental, I don't
> > > > > really understand the reasoning: Why " this function can modify
> > > > > packet data, provided mbufs must be safely writable" suddenly
> > > > > becomes a problem? That seems like and obvious limitation to me
> > > > > and let say tx_burst() has the same one. Second, I don't see how
> > > > > you are going to remove it without introducing a heavy performance
> > > > > impact. Konstantin
> > > >
> > > > About tx_burst(), I don't think we should force the user to provide
> > > > a writable mbuf. There are many use cases where passing a clone
> > > > already works as of today and it avoids duplicating the mbuf data.
> > > > For instance: traffic generator, multicast, bridging/tap, etc...
> > > >
> > > > Moreover, this requirement would be inconsistent with the model you
> > > > are proposing in case of pipeline:
> > > >  - tx_prepare() on core X, may update the data
> > > >  - tx_burst() on core Y, should not touch the data to avoid cache
> > > > misses
> > >
> > > Probably I wasn't very clear in my previous mail.
> > > I am not saying that we should force the user to pass a writable mbuf.
> > > What I am saying that for tx_burst() current expectation is that after
> > > mbuf is handled to tx_burst() user shouldn't try to modify its buffer
> > > contents till TX engine is done with the buffer (mbuf_free() is called
> > > by TX func for it). For tx_prep(), I think, it is the same though
> > > restrictions are a bit more strict: user should not try to read/write
> > > to the mbuf while tx_prep() is not finished with it. What puzzles me
> > > is that why that should be the reason to mark tx_prep() as
> > > experimental. Konstantin
> >
> > To be sure we're on the same page, let me reword:
> >
> > - mbufs passed to tx_prepare() by the application must have their
> >   headers (l2_len + l3_len + l4_len) writable because the phdr checksum
> >   can be replaced. It could be precised in the api comment.
> >
> > - mbufs passed to tx_burst() must not be modified by the driver/hw, nor
> >   by the application.

Yes, agree with both.

> >
> >
> > About the API itself, I have one more question. I know you've already
> > discussed this a bit with Adrien, I don't want to spawn a new big thread
> > from here ;)
> >
> > The API provides tx_prepare() to check the packets have the proper format,
> > and possibly modify them (ex: csum offload / tso) to match hw
> > requirements. So it does both checks (number of segments) and fixes
> > (csum/tso). What determines things that should be checked and things that
> > should be fixed?
> >

Just to echo Tomasz detailed reply below: I think right now tx_prepare()
doesn't make any fixing.
It just: checks that packet satisfies PMD/HW requirements, if not returns
an error without trying to fix anything/
If the  packet seems valid - calculates and fills L4 cksum to fulfill HW reqs . 
About the question what should be checked - as usual there is a tradeoff between 
additional checks and performance.
As Tomasz stated below right now in non-debug mode we do checks only for things
that can have a critical impact on HW itself (tx hang, etc.).
All extra checks are implemented only in DEBUG mode right now.  

> 
> 1) Performance -- we may assume that packets are created with the common rules (e.g. doesn't tries to count IP checksum for IPv6
> packet, sets all required fields, etc.). For now, additional checks are done only in DEBUG mode.
> 
> 2) Uncommon requirements, such a number of segments for TSO/non-TSO, where invalid values can cause unexpected results (or
> even hang device on burst), so it is critical (at least for ixgbe and i40e) and must be checked.
> 
> 3) Checksum field must be initialized in a hardware specific way to count it properly.
> 
> 4) If packet is invalid its content isn't modified nor restored.

> 
> > The application gets few information from tx_prepare() about what should
> > be done to make the packet accepted by the hw, and the actions will
> > probably be different depending on hardware.

That's true.
I am open to suggestions how in future to provide extra information to the upper layer.
Set rte_errno to different values depending on type of error,
OR extra parameter in tx_prepare() that will provide more detailed error information,
OR something else?

> So could we imagine that in
> > the future the function also tries to fix the packet? I've seen your
> > comment saying that it has to be an application decision, so what about
> > having a parameter saying "fix the packet" or "don't fix it"?
> >

While I am not directly opposed to that idea, I am a bit skeptical
that it could be implemented in useful and effective way.    
As I said before it may be fixed in different ways:
coalesce mbufs, split mbuf chain into multiple ones,
might be some sort of combination of these 2 approaches.
Again it seems a bit ineffective from performance point of view:
form packet first then check and re-adjust it again.
Seems more plausible to have it formed in a right way straightway.
Also it would mean that we'll need to pass to tx_prepare() information
about which mempool to use for new mbufs, probably some extra information. 
That's why I think it might be better to do a proper (re)formatting of the packet
before passing it down to tx_prepare().
Might be some sort of generic helper function in rte_net that would use information
from rte_eth_desc_lim and could be called by upper layer before tx_prepare().
But if you think it could be merged into tx_prepare() into some smart and effective way -     
sure thing, let's look at particular implementation ideas you have. 

> 
> The main question here is how invasive in packet data tx_prepare should be.
> 
> If I understand you correctly, you're talking about the situation when tx_prepare will try to restore packet internally, e.g. linearize data
> for i40e to meet number of segments requirements, split mbufs, when are too large, maybe someone will wanted to have full
> software checksum computation fallback for devices which doesn't support it, and so on? And then this parameter will indicate "do
> the required work for me", or "just check, initialize what should be initialized, if fails, let me decide"?
> 
> Implemented model is quite simple and clear:
> 
> 1) Validate uncommon requirements which may cause a problem and let application to deal with it.
> 2) If packet is valid, initialize required fields according to the hardware requirements.
> 
> In fact, it doesn't fix anything for now, but initializes checksums, and the main reason why it is done here, is the fact that it cannot be
> done in a trivial way in the application itself.
> 
> IMHO, we should to keep this API as simple as it possible and not let it grow in unexpected way (also for performance reason).
> 
> > About rte_eth_desc_lim->nb_seg_max and
> > rte_eth_desc_lim->nb_mtu_seg_max, I'm still quite reserved, especially for
> > the 2nd one, because I don't see how it can be used by the application.
> > Well, it does not hurt to have them, but for me it looks a bit useless.
> >
> 
> * "nb_seg_max": Maximum number of segments per whole packet.
> * "nb_mtu_seg_max": Maximum number of segments per one MTU.
> 
> Application can use provided values in a fallowed way:
> 
> * For non-TSO packet, a single transmit packet may span up to "nb_mtu_seg_max" buffers.
> * For TSO packet the total number of data descriptors is "nb_seg_max", and each segment within the TSO may span up to
> "nb_mtu_seg_max".

I suppose these fields will be useful.
Let say you are about to form and send packets.
Right now, how would you know what are HW limitations on number of segments per packet?  

> 
> > Last thing, I think this API should become the default in the future.
> > For instance, it would prevent the application to calculate a phdr csum
> > that will not be used by the hw. Not calling tx_prepare() would require
> > the user/application to exactly knows the underlying hw and the kind of
> > packet that are generated. So for me it means we'll need to also update
> > other examples (other testpmd engines, l2fwd, ...). Do you agree?
> >
> 
> Most of sample applications doesn't use TX offloads at all and doesn't count checksums. So the question is "should tx_prepare be
> used even if not required?"

I suppose it wouldn't be a big harm to add tx_prepare() into sample apps when appropriate.
In most cases (when simple TS path used) it will be NOP anyway.

Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-09 14:40                                           ` Jan Mędala
@ 2016-12-12 15:02                                             ` Ananyev, Konstantin
  2016-12-16  0:15                                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-12 15:02 UTC (permalink / raw)
  To: Jan Medala, Yigit, Ferruh
  Cc: Thomas Monjalon, Harish Patil, dev, Rahul Lakkireddy,
	Stephen Hurd, Jakub Palider, John Daley, Adrien Mazarguil,
	Alejandro Lucero, Rasesh Mody, Jacob, Jerin, Yuanhan Liu,
	Kulasek, TomaszX, olivier.matz, Yong Wang

Hi Jan,

>Hello,

>Sorry for late response.

>From ENA perspective, we need to dig deeper about the requirements and use cases, but I'm pretty confident (95%) that ena will need to implement tx_prep() API. There is at least one >scenario, when HW relay on partial checksum.

Could you let us know as soon as you'll figure it out.
Hopefully will still be able to hit into 17.02 integration deadline.
Thanks
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-08 18:20                                         ` Yong Wang
  2016-12-09 14:40                                           ` Jan Mędala
@ 2016-12-12 17:29                                           ` Ananyev, Konstantin
  1 sibling, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-12 17:29 UTC (permalink / raw)
  To: Yong Wang, Yigit, Ferruh, Thomas Monjalon
  Cc: Harish Patil, dev, Rahul Lakkireddy, Stephen Hurd, Jan Medala,
	Jakub Palider, John Daley, Adrien Mazarguil, Alejandro Lucero,
	Rasesh Mody, Jacob, Jerin, Yuanhan Liu, Kulasek, TomaszX,
	olivier.matz



> > > This means vmxnet3 PMD also should be updated, right?
> >
> > Yes, that's right.
> >
> > >Should that update
> > > be part of tx_prep patchset? Or separate patch?
> >
> > Another question I suppose is who will do the actual patch for vmxnet3.
> > Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
> > Please note, that in both cases will need your help in testing/reviewing it.
> > Konstantin
> 
> It will be great if you can put together a patch as part of the entire patchset on tx_prep() for vmxnet3 and I will definitely help review
> it.

Ok. 

> 
> Regarding testing, I can definitely help but I don't have a testing harness to cover the entire matrix (different ESX version, different
> vmxnet3 device version, VM-VM, VM-physical over different uplinks, etc.) so it will be limited.  Related to this, I have the impression
> that Intel has some existing coverage for vmxnet3 as well as other NICs.  Do we know if that will cover this use case as well?

I'll ask our validation team, but I don't know off-hand what coverage for vmxnet3 we have.
Konstantin


> 
> > >
> > > >>>
> > > >>
> > > >>>>> This is for any TX offload for which the upper layer SW would have
> > > >>
> > > >>>>> to modify the contents of the packet.
> > > >>
> > > >>>>> Though as I can see for qede neither PKT_TX_IP_CKSUM or
> > > >>
> > > >>>> PKT_TX_TCP_CKSUM
> > > >>
> > > >>>>> exhibits any extra requirements for the user.
> > > >>
> > > >>>>> Is that correct?
> > > >>
> > > >>
> > > >


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-06 15:53                         ` Ferruh Yigit
  2016-12-07  7:55                           ` Andrew Rybchenko
  2016-12-07  8:11                           ` Yuanhan Liu
@ 2016-12-13 11:59                           ` Ferruh Yigit
  2 siblings, 0 replies; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-13 11:59 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Jan Medala, Jakub Palider,
	Netanel Belgazal, Evgeny Schemeilin, Yong Wang, Hemant Agrawal
  Cc: Tomasz Kulasek, konstantin.ananyev

On 12/6/2016 3:53 PM, Ferruh Yigit wrote:
> On 11/28/2016 11:03 AM, Thomas Monjalon wrote:
>> We need attention of every PMD developers on this thread.
>>
>> Reminder of what Konstantin suggested:
>> "
>> - if the PMD supports TX offloads AND
>> - if to be able use any of these offloads the upper layer SW would have to:
>>     * modify the contents of the packet OR
>>     * obey HW specific restrictions
>> then it is a PMD developer responsibility to provide tx_prep() that would implement
>> expected modifications of the packet contents and restriction checks.
>> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
>> "
>>
>> I copy/paste also my previous conclusion:
>>
>> Before txprep, there is only one API: the application must prepare the
>> packets checksum itself (get_psd_sum in testpmd).
>> With txprep, the application have 2 choices: keep doing the job itself
>> or call txprep which calls a PMD-specific function.
>> The question is: does non-Intel drivers need a checksum preparation for TSO?
>> Will it behave well if txprep does nothing in these drivers?
>>
>> When looking at the code, most of drivers handle the TSO flags.
>> But it is hard to know whether they rely on the pseudo checksum or not.
>>
>> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
>>
>> drivers/net/bnxt/bnxt_txr.c
>> drivers/net/cxgbe/sge.c
>> drivers/net/e1000/em_rxtx.c
>> drivers/net/e1000/igb_rxtx.c
>> drivers/net/ena/ena_ethdev.c
>> drivers/net/enic/enic_rxtx.c
>> drivers/net/fm10k/fm10k_rxtx.c
>> drivers/net/i40e/i40e_rxtx.c
>> drivers/net/ixgbe/ixgbe_rxtx.c
>> drivers/net/mlx4/mlx4.c
>> drivers/net/mlx5/mlx5_rxtx.c
>> drivers/net/nfp/nfp_net.c
>> drivers/net/qede/qede_rxtx.c
>> drivers/net/thunderx/nicvf_rxtx.c
>> drivers/net/virtio/virtio_rxtx.c
>> drivers/net/vmxnet3/vmxnet3_rxtx.c
>>
>> Please, we need a comment for each driver saying
>> "it is OK, we do not need any checksum preparation for TSO"
>> or
>> "yes we have to implement tx_prepare or TSO will not work in this mode"
>>
> 
> Still waiting response from PMDs:
> - ena
> - nfp
> - virtio
> 
> Waiting clarification for preparation requirements:
> - vmxnet3
> 
> Also including new PMDs to the thread:
> - sfc
> - dpaa2

Thanks all for responses, now only remaining ones are following:

Waiting clarification for preparation requirements: (it would be great
if details can be provided before next version of the patch)
- ena

Will support review/testing of the patch:
- vmxnet3

new PMD (still not in repo, so response is good to have)
- dpaa2

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 0/7] add Tx preparation
  2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
                                         ` (6 preceding siblings ...)
  2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
@ 2016-12-13 17:41                       ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 1/7] ethdev: " Tomasz Kulasek
                                           ` (7 more replies)
  7 siblings, 8 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prepare) before rte_eth_tx_burst let
   to prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prepare() function to do necessary preparations
   of packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prepare" which can be implemented in the driver to prepare
   and verify packets, in devices specific way, before burst, what
   should to prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prepare(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("Tx prepare failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v13 changes:
 - added support for vmxnet3
 - reworded help information for "csum txprep" command
 - renamed RTE_ETHDEV_TX_PREPARE to RTE_ETHDEV_TX_PREPARE_NOOP to
   better suit its purpose.

v12 changes:
 - renamed API function from "rte_eth_tx_prep" to "rte_eth_tx_prepare"
   (to be not confused with "prepend")
 - changed "rte_phdr_cksum_fix" to "rte_net_intel_cksum_prepare"
 - added "csum txprep (on|off)" command to the csum engine allowing to
   select txprep path for packet processing

v11 changed:
 - updated comments
 - added information to the API description about packet data
   requirements/limitations.

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
 - added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Ananyev, Konstantin (1):
  vmxnet3: add Tx preparation

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/cmdline.c                      |   49 ++++++++++++
 app/test-pmd/csumonly.c                     |   33 ++++++--
 app/test-pmd/testpmd.c                      |    5 ++
 app/test-pmd/testpmd.h                      |    2 +
 config/common_base                          |    9 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   13 +++
 drivers/net/e1000/e1000_ethdev.h            |   11 +++
 drivers/net/e1000/em_ethdev.c               |    5 +-
 drivers/net/e1000/em_rxtx.c                 |   48 ++++++++++-
 drivers/net/e1000/igb_ethdev.c              |    4 +
 drivers/net/e1000/igb_rxtx.c                |   52 +++++++++++-
 drivers/net/fm10k/fm10k.h                   |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c            |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c              |   50 +++++++++++-
 drivers/net/i40e/i40e_ethdev.c              |    3 +
 drivers/net/i40e/i40e_rxtx.c                |   74 ++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h                |    8 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h            |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c              |   56 +++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h              |    2 +
 drivers/net/vmxnet3/vmxnet3_ethdev.c        |    4 +
 drivers/net/vmxnet3/vmxnet3_ethdev.h        |    2 +
 drivers/net/vmxnet3/vmxnet3_rxtx.c          |   57 +++++++++++++
 lib/librte_ether/rte_ethdev.h               |  115 +++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h                  |   64 +++++++++++++++
 lib/librte_net/rte_net.h                    |   85 ++++++++++++++++++++
 27 files changed, 757 insertions(+), 13 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 1/7] ethdev: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 2/7] e1000: " Tomasz Kulasek
                                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prepare`

uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(struct rte_mbuf *m)
	to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_net_intel_cksum_prepare(struct rte_mbuf *m)
	to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
	before hardware tx checksum offload.
	 - for non-TSO tcp/udp packets full pseudo-header checksum is
	   counted and set.
	 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base            |    9 ++++
 lib/librte_ether/rte_ethdev.h |  115 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++
 lib/librte_net/rte_net.h      |   85 ++++++++++++++++++++++++++++++
 4 files changed, 273 insertions(+)

diff --git a/config/common_base b/config/common_base
index 652a839..2c5352e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -123,6 +123,15 @@ CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
 
 #
+# Use real NOOP to turn off TX preparation stage
+#
+# While the behaviour of ``rte_ethdev_tx_prepare`` may change after turning on
+# real NOOP, this configuration shouldn't be never enabled globaly, and can be
+# used in appropriate target configuration file with a following restrictions
+#
+CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n
+
+#
 # Support NIC bypass logic
 #
 CONFIG_RTE_NIC_BYPASS=n
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..b3052db 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -702,6 +703,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;     /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1191,6 +1194,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1625,6 +1633,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2819,6 +2828,112 @@ int rte_eth_dev_set_vlan_ether_type(uint8_t port_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prepare() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * Since this function can modify packet data, provided mbufs must be safely
+ * writable (e.g. modified data cannot be in shared segment).
+ *
+ * The rte_eth_tx_prepare() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent, otherwise stops processing on the first invalid packet and
+ * leaves the rest packets untouched.
+ *
+ * When this functionality is not implemented in the driver, all packets are
+ * are returned untouched.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately:
+ *   - -EINVAL: offload flags are not correctly set
+ *   - -ENOTSUP: the offload feature is not supported by the hardware
+ *
+ */
+
+#ifndef RTE_ETHDEV_TX_PREPARE_NOOP
+
+static inline uint16_t
+rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prepare)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+/*
+ * Native NOOP operation for compilation targets which doesn't require any
+ * preparations steps, and functional NOOP may introduce unnecessary performance
+ * drop.
+ *
+ * Generally this is not a good idea to turn it on globally and didn't should
+ * be used if behavior of tx_preparation can change.
+ */
+
+static inline uint16_t
+rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..39ee5ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..85f356d 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,86 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_prepare(struct rte_mbuf *m)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 2/7] e1000: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 1/7] ethdev: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 3/7] fm10k: " Tomasz Kulasek
                                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 ++++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   52 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ int eth_em_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index aee3d34..a004ee9 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prepare = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1079,6 +1080,8 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..7e271ad 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ struct em_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 2fddf0c..015ef46 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..8a3a3db 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,51 @@ struct igb_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) || (m->l2_len + m->l3_len +
+					m->l4_len > IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1413,7 @@ struct igb_tx_queue {
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 3/7] fm10k: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 1/7] ethdev: " Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 2/7] e1000: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 4/7] i40e: " Tomasz Kulasek
                                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 923690c..a116822 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1447,6 +1447,8 @@ static int fm10k_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2755,8 +2757,10 @@ static void __attribute__((cold))
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prepare = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2835,6 +2839,7 @@ static void __attribute__((cold))
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prepare = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..144e5e6 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, struct rte_mbuf *mb)
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 4/7] i40e: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
                                           ` (2 preceding siblings ...)
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 3/7] fm10k: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 5/7] ixgbe: " Tomasz Kulasek
                                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   74 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 67778ba..5761357 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -943,6 +943,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prepare = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2645,6 +2646,8 @@ static int i40e_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..d248396 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,63 @@ static inline int __attribute__((always_inline))
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so m->nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for m->nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered
+			 * malicious
+			 */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2833,11 @@ void __attribute__((cold))
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prepare = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 5/7] ixgbe: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
                                           ` (3 preceding siblings ...)
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 4/7] i40e: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 6/7] vmxnet3: " Tomasz Kulasek
                                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   56 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index edc9b22..a75f59d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ struct rte_ixgbe_xstats_name_off {
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b2d9f45..dbe83e7 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,56 @@ static inline int __attribute__((always_inline))
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2336,7 @@ void __attribute__((cold))
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prepare = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2357,7 @@ void __attribute__((cold))
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prepare = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 6/7] vmxnet3: add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
                                           ` (4 preceding siblings ...)
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 5/7] ixgbe: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-13 18:15                           ` Yong Wang
  2016-12-20 13:36                           ` Ferruh Yigit
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 7/7] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
  7 siblings, 2 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev; +Cc: Ananyev, Konstantin

From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |    4 +++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 ++
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |   57 ++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 8bb13e5..f85be91 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -237,6 +237,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &vmxnet3_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &vmxnet3_recv_pkts;
 	eth_dev->tx_pkt_burst = &vmxnet3_xmit_pkts;
+	eth_dev->tx_pkt_prepare = vmxnet3_prep_pkts;
 	pci_dev = eth_dev->pci_dev;
 
 	/*
@@ -326,6 +327,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 	eth_dev->tx_pkt_burst = NULL;
+	eth_dev->tx_pkt_prepare = NULL;
 
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
@@ -728,6 +730,8 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 		.nb_max = VMXNET3_TX_RING_MAX_SIZE,
 		.nb_min = VMXNET3_DEF_TX_RING_SIZE,
 		.nb_align = 1,
+		.nb_seg_max = UINT8_MAX,
+		.nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
 	};
 
 	dev_info->rx_offload_capa =
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h b/drivers/net/vmxnet3/vmxnet3_ethdev.h
index 7d3b11e..469db71 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
@@ -171,5 +171,7 @@ uint16_t vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 			   uint16_t nb_pkts);
 uint16_t vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
+uint16_t vmxnet3_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts);
 
 #endif /* _VMXNET3_ETHDEV_H_ */
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index b109168..0c35738 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -69,6 +69,7 @@
 #include <rte_sctp.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
+#include <rte_net.h>
 
 #include "base/vmxnet3_defs.h"
 #include "vmxnet3_ring.h"
@@ -76,6 +77,14 @@
 #include "vmxnet3_logs.h"
 #include "vmxnet3_ethdev.h"
 
+#define	VMXNET3_TX_OFFLOAD_MASK	( \
+		PKT_TX_VLAN_PKT | \
+		PKT_TX_L4_MASK |  \
+		PKT_TX_TCP_SEG)
+
+#define	VMXNET3_TX_OFFLOAD_NOTSUP_MASK	\
+	(PKT_TX_OFFLOAD_MASK ^ VMXNET3_TX_OFFLOAD_MASK)
+
 static const uint32_t rxprod_reg[2] = {VMXNET3_REG_RXPROD, VMXNET3_REG_RXPROD2};
 
 static int vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t*, uint8_t);
@@ -350,6 +359,54 @@
 }
 
 uint16_t
+vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	int32_t ret;
+	uint32_t i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i != nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/*
+		 * Non-TSO packet cannot occupy more than
+		 * VMXNET3_MAX_TXD_PER_PKT TX descriptors.
+		 */
+		if ((ol_flags & PKT_TX_TCP_SEG) == 0 &&
+				m->nb_segs > VMXNET3_MAX_TXD_PER_PKT) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* check that only supported TX offloads are requested. */
+		if ((ol_flags & VMXNET3_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
+				(ol_flags & PKT_TX_L4_MASK) ==
+				PKT_TX_SCTP_CKSUM) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+uint16_t
 vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		  uint16_t nb_pkts)
 {
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v13 7/7] testpmd: use Tx preparation in csum engine
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
                                           ` (5 preceding siblings ...)
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 6/7] vmxnet3: " Tomasz Kulasek
@ 2016-12-13 17:41                         ` Tomasz Kulasek
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
  7 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-13 17:41 UTC (permalink / raw)
  To: dev

Added "csum txprep (on|off)" command which allows to switch to the
tx path using Tx preparation API.

By default unchanged implementation is used.

Using Tx preparation path, pseudo header calculation for udp/tcp/tso
packets from application, and used Tx preparation API for
packet preparation and verification.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/cmdline.c                      |   49 +++++++++++++++++++++++++++
 app/test-pmd/csumonly.c                     |   33 ++++++++++++++----
 app/test-pmd/testpmd.c                      |    5 +++
 app/test-pmd/testpmd.h                      |    2 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   13 +++++++
 5 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d03a592..499a00b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -366,6 +366,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"csum show (port_id)\n"
 			"    Display tx checksum offload configuration\n\n"
 
+			"csum txprep (on|off)"
+			"    Enable tx preparation path in csum forward engine"
+			"\n\n"
+
 			"tso set (segsize) (portid)\n"
 			"    Enable TCP Segmentation Offload in csum forward"
 			" engine.\n"
@@ -3528,6 +3532,50 @@ struct cmd_csum_tunnel_result {
 	},
 };
 
+/* Enable/disable tx preparation path */
+struct cmd_csum_txprep_result {
+	cmdline_fixed_string_t csum;
+	cmdline_fixed_string_t parse;
+	cmdline_fixed_string_t onoff;
+};
+
+static void
+cmd_csum_txprep_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_csum_txprep_result *res = parsed_result;
+
+	if (!strcmp(res->onoff, "on"))
+		tx_prepare = 1;
+	else
+		tx_prepare = 0;
+
+}
+
+cmdline_parse_token_string_t cmd_csum_txprep_csum =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				csum, "csum");
+cmdline_parse_token_string_t cmd_csum_txprep_parse =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				parse, "txprep");
+cmdline_parse_token_string_t cmd_csum_txprep_onoff =
+	TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result,
+				onoff, "on#off");
+
+cmdline_parse_inst_t cmd_csum_txprep = {
+	.f = cmd_csum_txprep_parsed,
+	.data = NULL,
+	.help_str = "csum txprep on|off: Enable/Disable tx preparation path "
+			"for csum engine",
+	.tokens = {
+		(void *)&cmd_csum_txprep_csum,
+		(void *)&cmd_csum_txprep_parse,
+		(void *)&cmd_csum_txprep_onoff,
+		NULL,
+	},
+};
+
 /* *** ENABLE HARDWARE SEGMENTATION IN TX NON-TUNNELED PACKETS *** */
 struct cmd_tso_set_result {
 	cmdline_fixed_string_t tso;
@@ -11518,6 +11566,7 @@ struct cmd_set_vf_mac_addr_result {
 	(cmdline_parse_inst_t *)&cmd_csum_set,
 	(cmdline_parse_inst_t *)&cmd_csum_show,
 	(cmdline_parse_inst_t *)&cmd_csum_tunnel,
+	(cmdline_parse_inst_t *)&cmd_csum_txprep,
 	(cmdline_parse_inst_t *)&cmd_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..3afa9ab 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -372,8 +372,10 @@ struct simple_gre_hdr {
 			udp_hdr->dgram_cksum = 0;
 			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
+				if (!tx_prepare)
+					udp_hdr->dgram_cksum = get_psd_sum(
+							l3_hdr, info->ethertype,
+							ol_flags);
 			} else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
@@ -385,12 +387,15 @@ struct simple_gre_hdr {
 		tcp_hdr->cksum = 0;
 		if (tso_segsz) {
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
+			if (!tx_prepare)
+				tcp_hdr->cksum = get_psd_sum(l3_hdr,
+						info->ethertype, ol_flags);
+
 		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
+			if (!tx_prepare)
+				tcp_hdr->cksum = get_psd_sum(l3_hdr,
+						info->ethertype, ol_flags);
 		} else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
@@ -648,6 +653,7 @@ struct simple_gre_hdr {
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +863,20 @@ struct simple_gre_hdr {
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	if (tx_prepare) {
+		nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue,
+				pkts_burst, nb_rx);
+		if (nb_prep != nb_rx)
+			printf("Preparing packet burst to transmit failed: %s\n",
+					rte_strerror(rte_errno));
+
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_prep);
+	} else
+		nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+				nb_rx);
+
 	/*
 	 * Retry if necessary
 	 */
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c2..634f10b 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -180,6 +180,11 @@ struct fwd_engine * fwd_engines[] = {
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+/*
+ * Enable Tx preparation path in the "csum" engine.
+ */
+uint8_t tx_prepare;
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9c1e703..488a6e1 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t tx_prepare;
+
 extern uint16_t nb_pkt_per_burst;
 extern uint16_t mb_mempool_cache;
 extern int8_t rx_pthresh;
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index f1c269a..d77336e 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -750,6 +750,19 @@ Display tx checksum offload configuration::
 
    testpmd> csum show (port_id)
 
+csum txprep
+~~~~~~~~~~~
+
+Select TX preparation path for the ``csum`` forwarding engine::
+
+   testpmd> csum txprep (on|off)
+
+If enabled, the csum forward engine uses TX preparation API for full packet
+preparation and verification before TX burst.
+
+If disabled, csum engine initializes all required fields on application level
+and TX preparation stage is not executed.
+
 tso set
 ~~~~~~~
 
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v13 6/7] vmxnet3: add Tx preparation
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 6/7] vmxnet3: " Tomasz Kulasek
@ 2016-12-13 18:15                           ` Yong Wang
  2016-12-20 13:36                           ` Ferruh Yigit
  1 sibling, 0 replies; 261+ messages in thread
From: Yong Wang @ 2016-12-13 18:15 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: Ananyev, Konstantin

Looks good and two nits below.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Tuesday, December 13, 2016 9:42 AM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v13 6/7] vmxnet3: add Tx preparation
> 
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  drivers/net/vmxnet3/vmxnet3_ethdev.c |    4 +++
>  drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 ++
>  drivers/net/vmxnet3/vmxnet3_rxtx.c   |   57
> ++++++++++++++++++++++++++++++++++
>  3 files changed, 63 insertions(+)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> index 8bb13e5..f85be91 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> @@ -237,6 +237,7 @@ static void vmxnet3_mac_addr_set(struct
> rte_eth_dev *dev,
>  	eth_dev->dev_ops = &vmxnet3_eth_dev_ops;
>  	eth_dev->rx_pkt_burst = &vmxnet3_recv_pkts;
>  	eth_dev->tx_pkt_burst = &vmxnet3_xmit_pkts;
> +	eth_dev->tx_pkt_prepare = vmxnet3_prep_pkts;
>  	pci_dev = eth_dev->pci_dev;
> 
>  	/*
> @@ -326,6 +327,7 @@ static void vmxnet3_mac_addr_set(struct
> rte_eth_dev *dev,
>  	eth_dev->dev_ops = NULL;
>  	eth_dev->rx_pkt_burst = NULL;
>  	eth_dev->tx_pkt_burst = NULL;
> +	eth_dev->tx_pkt_prepare = NULL;
> 
>  	rte_free(eth_dev->data->mac_addrs);
>  	eth_dev->data->mac_addrs = NULL;
> @@ -728,6 +730,8 @@ static void vmxnet3_mac_addr_set(struct
> rte_eth_dev *dev,
>  		.nb_max = VMXNET3_TX_RING_MAX_SIZE,
>  		.nb_min = VMXNET3_DEF_TX_RING_SIZE,
>  		.nb_align = 1,
> +		.nb_seg_max = UINT8_MAX,

To be consistent with other drivers, can you define VMXNET3_TX_MAX_SEG as UINT8_MAX and use it here?

> +		.nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
>  	};
> 
>  	dev_info->rx_offload_capa =
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h
> b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> index 7d3b11e..469db71 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> @@ -171,5 +171,7 @@ uint16_t vmxnet3_recv_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  			   uint16_t nb_pkts);
>  uint16_t vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  			   uint16_t nb_pkts);
> +uint16_t vmxnet3_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> +			uint16_t nb_pkts);
> 
>  #endif /* _VMXNET3_ETHDEV_H_ */
> diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> index b109168..0c35738 100644
> --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> @@ -69,6 +69,7 @@
>  #include <rte_sctp.h>
>  #include <rte_string_fns.h>
>  #include <rte_errno.h>
> +#include <rte_net.h>
> 
>  #include "base/vmxnet3_defs.h"
>  #include "vmxnet3_ring.h"
> @@ -76,6 +77,14 @@
>  #include "vmxnet3_logs.h"
>  #include "vmxnet3_ethdev.h"
> 
> +#define	VMXNET3_TX_OFFLOAD_MASK	( \
> +		PKT_TX_VLAN_PKT | \
> +		PKT_TX_L4_MASK |  \
> +		PKT_TX_TCP_SEG)
> +
> +#define	VMXNET3_TX_OFFLOAD_NOTSUP_MASK	\
> +	(PKT_TX_OFFLOAD_MASK ^ VMXNET3_TX_OFFLOAD_MASK)
> +
>  static const uint32_t rxprod_reg[2] = {VMXNET3_REG_RXPROD,
> VMXNET3_REG_RXPROD2};
> 
>  static int vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t*, uint8_t);
> @@ -350,6 +359,54 @@
>  }
> 
>  uint16_t
> +vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf
> **tx_pkts,
> +	uint16_t nb_pkts)
> +{
> +	int32_t ret;
> +	uint32_t i;
> +	uint64_t ol_flags;
> +	struct rte_mbuf *m;
> +
> +	for (i = 0; i != nb_pkts; i++) {
> +		m = tx_pkts[i];
> +		ol_flags = m->ol_flags;
> +
> +		/*
> +		 * Non-TSO packet cannot occupy more than
> +		 * VMXNET3_MAX_TXD_PER_PKT TX descriptors.
> +		 */
> +		if ((ol_flags & PKT_TX_TCP_SEG) == 0 &&
> +				m->nb_segs >
> VMXNET3_MAX_TXD_PER_PKT) {
> +			rte_errno = -EINVAL;
> +			return i;
> +		}
> +
> +		/* check that only supported TX offloads are requested. */
> +		if ((ol_flags & VMXNET3_TX_OFFLOAD_NOTSUP_MASK) != 0
> ||
> +				(ol_flags & PKT_TX_L4_MASK) ==
> +				PKT_TX_SCTP_CKSUM) {
> +			rte_errno = -EINVAL;

Return ENOTSUP instead of EINVAL here?

> +			return i;
> +		}
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +		ret = rte_validate_tx_offload(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +#endif
> +		ret = rte_net_intel_cksum_prepare(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}
> +
> +uint16_t
>  vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		  uint16_t nb_pkts)
>  {
> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-12 15:02                                             ` Ananyev, Konstantin
@ 2016-12-16  0:15                                               ` Ananyev, Konstantin
  2016-12-16 13:53                                                 ` Jan Mędala
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-16  0:15 UTC (permalink / raw)
  To: 'Jan Medala', Yigit, Ferruh
  Cc: 'Thomas Monjalon', 'Harish Patil',
	'dev@dpdk.org', 'Rahul Lakkireddy',
	'Stephen Hurd', 'Jakub Palider',
	'John Daley', 'Adrien Mazarguil',
	'Alejandro Lucero', 'Rasesh Mody',
	'Jacob, Jerin', 'Yuanhan Liu',
	Kulasek, TomaszX, 'olivier.matz@6wind.com',
	'Yong Wang',
	evgenys, netanel


Hi Jan,

> 
> Hi Jan,
> 
> >Hello,
> 
> >Sorry for late response.
> 
> >From ENA perspective, we need to dig deeper about the requirements and use cases, but I'm pretty confident (95%) that ena will
> need to implement tx_prep() API. There is at least one >scenario, when HW relay on partial checksum.
> 
> Could you let us know as soon as you'll figure it out.
> Hopefully will still be able to hit into 17.02 integration deadline.

Is there any update on that subject?
There is no much time left before 17.02 integration deadline, and we sort of stuck
because there is luck of response from ENA device maintainers.
Could you at least point to some sort of spec for that device,
or might be to some people who might answer that question?
From what I can see looking at Linux ENA driver, it is not doing anything special for TSO.
So it seems that standard pseudo-header checksum calculation should be enough.
Is that correct?
Thanks
Konstantin  


> Thanks
> Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-16  0:15                                               ` Ananyev, Konstantin
@ 2016-12-16 13:53                                                 ` Jan Mędala
  2016-12-16 15:27                                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Jan Mędala @ 2016-12-16 13:53 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Yigit, Ferruh, Thomas Monjalon, Harish Patil, dev,
	Rahul Lakkireddy, Stephen Hurd, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Kulasek, TomaszX, olivier.matz, Yong Wang, evgenys,
	netanel

​Hello,
​


> Is there any update on that subject?
>

​At this point we need to have only pseudo-header checksum for TSO. Maybe
there will be new requirements, but that's something I cannot predict at
this point.
​


> So it seems that standard pseudo-header checksum calculation should be
> enough.
> Is that correct?
>

​Yes, this will be enough at this point.
We also need to deliver a fix for issue we found during investigation of
tx_prep() API.

Thanks,
Jan​

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-16 13:53                                                 ` Jan Mędala
@ 2016-12-16 15:27                                                   ` Ananyev, Konstantin
  2016-12-16 15:37                                                     ` Jan Mędala
  0 siblings, 1 reply; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-16 15:27 UTC (permalink / raw)
  To: Jan Medala
  Cc: Yigit, Ferruh, Thomas Monjalon, Harish Patil, dev,
	Rahul Lakkireddy, Stephen Hurd, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Kulasek, TomaszX, olivier.matz, Yong Wang, evgenys,
	netanel

Hi Jan,


​>Hello,
​

>>Is there any update on that subject?

​>At this point we need to have only pseudo-header checksum for TSO. Maybe there will be new requirements, but that's something I cannot predict at this point.

Ok great, then we'll add a patch for ENA for v14, unless you guys would like to do it yourself.
​
 
>>So it seems that standard pseudo-header checksum calculation should be enough.
>>Is that correct?

​>Yes, this will be enough at this point.
>We also need to deliver a fix for issue we found during investigation of tx_prep() API.

Not sure what you are talking about...
Is that some separate issue not related directly to tx_prepare()?

Thanks
Konstantin


^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
  2016-12-16 15:27                                                   ` Ananyev, Konstantin
@ 2016-12-16 15:37                                                     ` Jan Mędala
  0 siblings, 0 replies; 261+ messages in thread
From: Jan Mędala @ 2016-12-16 15:37 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Yigit, Ferruh, Thomas Monjalon, Harish Patil, dev,
	Rahul Lakkireddy, Stephen Hurd, Jakub Palider, John Daley,
	Adrien Mazarguil, Alejandro Lucero, Rasesh Mody, Jacob, Jerin,
	Yuanhan Liu, Kulasek, TomaszX, olivier.matz, Yong Wang, evgenys,
	netanel

>
> ​>At this point we need to have only pseudo-header checksum for TSO. Maybe
> there will be new requirements, but that's something I cannot predict at
> this point.
>
> Ok great, then we'll add a patch for ENA for v14, unless you guys would
> like to do it yourself.
> ​

​That'd be great!​

>We also need to deliver a fix for issue we found during investigation of
> tx_prep() API.
>
> Not sure what you are talking about...
> Is that some separate issue not related directly to tx_prepare()?
>

​Yes, this is something not related, but came up during investigation of
tx_prepare(), so this is an additional benefit of this topic :-)​

​Thanks
Jan

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v13 6/7] vmxnet3: add Tx preparation
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 6/7] vmxnet3: " Tomasz Kulasek
  2016-12-13 18:15                           ` Yong Wang
@ 2016-12-20 13:36                           ` Ferruh Yigit
  2016-12-22 13:10                             ` Thomas Monjalon
  1 sibling, 1 reply; 261+ messages in thread
From: Ferruh Yigit @ 2016-12-20 13:36 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: Ananyev, Konstantin

On 12/13/2016 5:41 PM, Tomasz Kulasek wrote:
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---

<...>

>  
>  uint16_t
> +vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
> +	uint16_t nb_pkts)
> +{
<...>
> +
> +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> +		ret = rte_validate_tx_offload(m);
> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +#endif
> +		ret = rte_net_intel_cksum_prepare(m);

Since this API used beyond Intel drivers, what do you think renaming it?
rte_net_generic_cksum_prepare() ?

> +		if (ret != 0) {
> +			rte_errno = ret;
> +			return i;
> +		}
> +	}
> +
> +	return i;
> +}
<...>

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 0/8] add Tx preparation
  2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
                                           ` (6 preceding siblings ...)
  2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 7/7] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-12-22 13:05                         ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 1/8] ethdev: " Tomasz Kulasek
                                             ` (8 more replies)
  7 siblings, 9 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prepare) before rte_eth_tx_burst let
   to prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prepare() function to do necessary preparations
   of packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prepare" which can be implemented in the driver to prepare
   and verify packets, in devices specific way, before burst, what
   should to prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prepare(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("Tx prepare failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */

v14 changes:
 - added support for ena
 - introduced rte_net_intel_cksum_flags_prepare(m, ol_flags) function
   in rte_net.h to allow application choose offloads to be computed
   if not all are required
 - all drivers support tx preparation API for now, so removed
   csum txprep command from test-pmd as redundant, and use Tx 
   preparation by default 

v13 changes:
 - added support for vmxnet3
 - reworded help information for "csum txprep" command
 - renamed RTE_ETHDEV_TX_PREPARE to RTE_ETHDEV_TX_PREPARE_NOOP to
   better suit its purpose.

v12 changes:
 - renamed API function from "rte_eth_tx_prep" to "rte_eth_tx_prepare"
   (to be not confused with "prepend")
 - changed "rte_phdr_cksum_fix" to "rte_net_intel_cksum_prepare"
 - added "csum txprep (on|off)" command to the csum engine allowing to
   select txprep path for packet processing

v11 changed:
 - updated comments
 - added information to the API description about packet data
   requirements/limitations.

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
 - added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Konstantin Ananyev (2):
  ena: add Tx preparation
  vmxnet3: add Tx preparation

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c              |   37 +++++------
 app/test-pmd/testpmd.c               |    5 ++
 app/test-pmd/testpmd.h               |    2 +
 config/common_base                   |    9 +++
 drivers/net/e1000/e1000_ethdev.h     |   11 ++++
 drivers/net/e1000/em_ethdev.c        |    5 +-
 drivers/net/e1000/em_rxtx.c          |   48 +++++++++++++-
 drivers/net/e1000/igb_ethdev.c       |    4 ++
 drivers/net/e1000/igb_rxtx.c         |   53 +++++++++++++++-
 drivers/net/ena/ena_ethdev.c         |   51 +++++++++++++++
 drivers/net/fm10k/fm10k.h            |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c     |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c       |   50 ++++++++++++++-
 drivers/net/i40e/i40e_ethdev.c       |    3 +
 drivers/net/i40e/i40e_rxtx.c         |   74 +++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h         |    8 +++
 drivers/net/ixgbe/ixgbe_ethdev.c     |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h     |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c       |   57 +++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h       |    2 +
 drivers/net/vmxnet3/vmxnet3_ethdev.c |    6 ++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 +
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |   56 +++++++++++++++++
 lib/librte_ether/rte_ethdev.h        |  115 ++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h           |   64 +++++++++++++++++++
 lib/librte_net/rte_net.h             |  110 ++++++++++++++++++++++++++++++++
 26 files changed, 764 insertions(+), 27 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 1/8] ethdev: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 14:24                             ` Thomas Monjalon
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 2/8] e1000: " Tomasz Kulasek
                                             ` (7 subsequent siblings)
  8 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prepare`

uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

Added functions:

int
rte_validate_tx_offload(struct rte_mbuf *m)

  to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.


int rte_net_intel_cksum_prepare(struct rte_mbuf *m)

  to prepare pseudo header checksum for TSO and non-TSO tcp/udp packets
  before hardware tx checksum offload.
   - for non-TSO tcp/udp packets full pseudo-header checksum is
     counted and set.
   - for TSO the IP payload length is not included.


int
rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)

  this function uses same logic as rte_net_intel_cksum_prepare, but
  allows application to choose which offloads should be taken into
  account, if full preparation is not required.


PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base            |    9 ++++
 lib/librte_ether/rte_ethdev.h |  115 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++++++
 lib/librte_net/rte_net.h      |  110 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 298 insertions(+)

diff --git a/config/common_base b/config/common_base
index edb6a54..92c413a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -123,6 +123,15 @@ CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
 
 #
+# Use real NOOP to turn off TX preparation stage
+#
+# While the behaviour of ``rte_ethdev_tx_prepare`` may change after turning on
+# real NOOP, this configuration shouldn't be never enabled globaly, and can be
+# used in appropriate target configuration file with a following restrictions
+#
+CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n
+
+#
 # Support NIC bypass logic
 #
 CONFIG_RTE_NIC_BYPASS=n
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 52119af..10be095 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -702,6 +703,8 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+	uint16_t nb_seg_max;  /**< Max number of segments per whole packet. */
+	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };
 
 /**
@@ -1191,6 +1194,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1625,6 +1633,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2832,6 +2841,112 @@ int rte_eth_dev_set_vlan_ether_type(uint8_t port_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prepare() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * Since this function can modify packet data, provided mbufs must be safely
+ * writable (e.g. modified data cannot be in shared segment).
+ *
+ * The rte_eth_tx_prepare() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent, otherwise stops processing on the first invalid packet and
+ * leaves the rest packets untouched.
+ *
+ * When this functionality is not implemented in the driver, all packets are
+ * are returned untouched.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately:
+ *   - -EINVAL: offload flags are not correctly set
+ *   - -ENOTSUP: the offload feature is not supported by the hardware
+ *
+ */
+
+#ifndef RTE_ETHDEV_TX_PREPARE_NOOP
+
+static inline uint16_t
+rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prepare)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+/*
+ * Native NOOP operation for compilation targets which doesn't require any
+ * preparations steps, and functional NOOP may introduce unnecessary performance
+ * drop.
+ *
+ * Generally this is not a good idea to turn it on globally and didn't should
+ * be used if behavior of tx_preparation can change.
+ */
+
+static inline uint16_t
+rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..39ee5ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..548eaed 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,111 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data and based on the requested offload flags.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @param ol_flags
+ *   TX offloads flags to use with this packet.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_prepare(struct rte_mbuf *m)
+{
+	return rte_net_intel_cksum_flags_prepare(m, m->ol_flags);
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 2/8] e1000: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 1/8] ethdev: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 3/8] fm10k: " Tomasz Kulasek
                                             ` (6 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   53 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ int eth_em_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 866a5cf..00d5996 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prepare = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1079,6 +1080,8 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..7e271ad 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ struct em_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 08f2a68..cfe1180 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..5d0d3cd 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,52 @@ struct igb_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) ||
+					(m->l2_len + m->l3_len + m->l4_len >
+					IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1414,7 @@ struct igb_tx_queue {
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 3/8] fm10k: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 1/8] ethdev: " Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 2/8] e1000: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 4/8] i40e: " Tomasz Kulasek
                                             ` (5 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index fe74f6d..6648468 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1447,6 +1447,8 @@ static int fm10k_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2755,8 +2757,10 @@ static void __attribute__((cold))
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prepare = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2835,6 +2839,7 @@ static void __attribute__((cold))
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prepare = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..144e5e6 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, struct rte_mbuf *mb)
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 4/8] i40e: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (2 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 3/8] fm10k: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 5/8] ixgbe: " Tomasz Kulasek
                                             ` (4 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   74 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index b0c0fbf..0e20178 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -944,6 +944,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prepare = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2646,6 +2647,8 @@ static int i40e_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..1c9a6c8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,63 @@ static inline int __attribute__((always_inline))
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered
+			 * malicious
+			 */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2833,11 @@ void __attribute__((cold))
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prepare = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 5/8] ixgbe: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (3 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 4/8] i40e: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 6/8] vmxnet3: " Tomasz Kulasek
                                             ` (3 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   57 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index baffc71..d726a2b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ struct rte_ixgbe_xstats_name_off {
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b2d9f45..0bbc583 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,57 @@ static inline int __attribute__((always_inline))
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and
+		 *       non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2337,7 @@ void __attribute__((cold))
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prepare = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2358,7 @@ void __attribute__((cold))
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prepare = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 6/8] vmxnet3: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (4 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 5/8] ixgbe: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 17:59                             ` Yong Wang
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 7/8] ena: " Tomasz Kulasek
                                             ` (2 subsequent siblings)
  8 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev; +Cc: Ananyev, Konstantin

From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |    6 ++++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 ++
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |   56 ++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 93c9ac9..e31896f 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -69,6 +69,8 @@
 
 #define PROCESS_SYS_EVENTS 0
 
+#define	VMXNET3_TX_MAX_SEG	UINT8_MAX
+
 static int eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_vmxnet3_dev_uninit(struct rte_eth_dev *eth_dev);
 static int vmxnet3_dev_configure(struct rte_eth_dev *dev);
@@ -237,6 +239,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &vmxnet3_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &vmxnet3_recv_pkts;
 	eth_dev->tx_pkt_burst = &vmxnet3_xmit_pkts;
+	eth_dev->tx_pkt_prepare = vmxnet3_prep_pkts;
 	pci_dev = eth_dev->pci_dev;
 
 	/*
@@ -326,6 +329,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 	eth_dev->tx_pkt_burst = NULL;
+	eth_dev->tx_pkt_prepare = NULL;
 
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
@@ -728,6 +732,8 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 		.nb_max = VMXNET3_TX_RING_MAX_SIZE,
 		.nb_min = VMXNET3_DEF_TX_RING_SIZE,
 		.nb_align = 1,
+		.nb_seg_max = VMXNET3_TX_MAX_SEG,
+		.nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
 	};
 
 	dev_info->rx_offload_capa =
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h b/drivers/net/vmxnet3/vmxnet3_ethdev.h
index 7d3b11e..469db71 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
@@ -171,5 +171,7 @@ uint16_t vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 			   uint16_t nb_pkts);
 uint16_t vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
+uint16_t vmxnet3_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts);
 
 #endif /* _VMXNET3_ETHDEV_H_ */
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index b109168..3651369 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -69,6 +69,7 @@
 #include <rte_sctp.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
+#include <rte_net.h>
 
 #include "base/vmxnet3_defs.h"
 #include "vmxnet3_ring.h"
@@ -76,6 +77,14 @@
 #include "vmxnet3_logs.h"
 #include "vmxnet3_ethdev.h"
 
+#define	VMXNET3_TX_OFFLOAD_MASK	( \
+		PKT_TX_VLAN_PKT | \
+		PKT_TX_L4_MASK |  \
+		PKT_TX_TCP_SEG)
+
+#define	VMXNET3_TX_OFFLOAD_NOTSUP_MASK	\
+	(PKT_TX_OFFLOAD_MASK ^ VMXNET3_TX_OFFLOAD_MASK)
+
 static const uint32_t rxprod_reg[2] = {VMXNET3_REG_RXPROD, VMXNET3_REG_RXPROD2};
 
 static int vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t*, uint8_t);
@@ -350,6 +359,53 @@
 }
 
 uint16_t
+vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	int32_t ret;
+	uint32_t i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i != nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* Non-TSO packet cannot occupy more than
+		 * VMXNET3_MAX_TXD_PER_PKT TX descriptors.
+		 */
+		if ((ol_flags & PKT_TX_TCP_SEG) == 0 &&
+				m->nb_segs > VMXNET3_MAX_TXD_PER_PKT) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* check that only supported TX offloads are requested. */
+		if ((ol_flags & VMXNET3_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
+				(ol_flags & PKT_TX_L4_MASK) ==
+				PKT_TX_SCTP_CKSUM) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+uint16_t
 vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		  uint16_t nb_pkts)
 {
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 7/8] ena: add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (5 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 6/8] vmxnet3: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

From: Konstantin Ananyev <konstantin.ananyev@intel.com>

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ena/ena_ethdev.c |   51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 555fb31..51af723 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -39,6 +39,7 @@
 #include <rte_errno.h>
 #include <rte_version.h>
 #include <rte_eal_memconfig.h>
+#include <rte_net.h>
 
 #include "ena_ethdev.h"
 #include "ena_logs.h"
@@ -168,6 +169,14 @@ struct ena_stats {
 #define PCI_DEVICE_ID_ENA_VF	0xEC20
 #define PCI_DEVICE_ID_ENA_LLQ_VF	0xEC21
 
+#define	ENA_TX_OFFLOAD_MASK	(\
+	PKT_TX_L4_MASK |         \
+	PKT_TX_IP_CKSUM |        \
+	PKT_TX_TCP_SEG)
+
+#define	ENA_TX_OFFLOAD_NOTSUP_MASK	\
+	(PKT_TX_OFFLOAD_MASK ^ ENA_TX_OFFLOAD_MASK)
+
 static struct rte_pci_id pci_id_ena_map[] = {
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF) },
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_LLQ_VF) },
@@ -179,6 +188,8 @@ static int ena_device_init(struct ena_com_dev *ena_dev,
 static int ena_dev_configure(struct rte_eth_dev *dev);
 static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				  uint16_t nb_pkts);
+static uint16_t eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 static int ena_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint16_t nb_desc, unsigned int socket_id,
 			      const struct rte_eth_txconf *tx_conf);
@@ -1272,6 +1283,7 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ena_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_ena_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_ena_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_ena_prep_pkts;
 	adapter->rte_eth_dev_data = eth_dev->data;
 	adapter->rte_dev = eth_dev;
 
@@ -1570,6 +1582,45 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return recv_idx;
 }
 
+static uint16_t
+eth_ena_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int32_t ret;
+	uint32_t i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i != nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
+				(ol_flags & PKT_TX_L4_MASK) ==
+				PKT_TX_SCTP_CKSUM) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		/* ENA doesn't need different phdr cskum for TSO */
+		ret = rte_net_intel_cksum_flags_prepare(m,
+			ol_flags & ~PKT_TX_TCP_SEG);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
 static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				  uint16_t nb_pkts)
 {
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (6 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 7/8] ena: " Tomasz Kulasek
@ 2016-12-22 13:05                           ` Tomasz Kulasek
  2016-12-22 14:28                             ` Thomas Monjalon
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
  8 siblings, 1 reply; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-22 13:05 UTC (permalink / raw)
  To: dev

Since all current drivers supports Tx preparation API, it is used
in csum forwarding engine by default for all drivers.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/csumonly.c |   37 ++++++++++++++++---------------------
 app/test-pmd/testpmd.c  |    5 +++++
 app/test-pmd/testpmd.h  |    2 ++
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..806f957 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,11 +361,9 @@ struct simple_gre_hdr {
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
@@ -383,15 +372,11 @@ struct simple_gre_hdr {
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
@@ -648,6 +633,7 @@ struct simple_gre_hdr {
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +843,16 @@ struct simple_gre_hdr {
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue,
+			pkts_burst, nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+			nb_prep);
+
 	/*
 	 * Retry if necessary
 	 */
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c2..634f10b 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -180,6 +180,11 @@ struct fwd_engine * fwd_engines[] = {
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+/*
+ * Enable Tx preparation path in the "csum" engine.
+ */
+uint8_t tx_prepare;
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9c1e703..488a6e1 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -383,6 +383,8 @@ enum tx_pkt_split {
 
 extern enum tx_pkt_split tx_pkt_split;
 
+extern uint8_t tx_prepare;
+
 extern uint16_t nb_pkt_per_burst;
 extern uint16_t mb_mempool_cache;
 extern int8_t rx_pthresh;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v13 6/7] vmxnet3: add Tx preparation
  2016-12-20 13:36                           ` Ferruh Yigit
@ 2016-12-22 13:10                             ` Thomas Monjalon
  0 siblings, 0 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-22 13:10 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Tomasz Kulasek, Ananyev, Konstantin

2016-12-20 13:36, Ferruh Yigit:
> On 12/13/2016 5:41 PM, Tomasz Kulasek wrote:
> > From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> > 
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> 
> <...>
> 
> >  
> >  uint16_t
> > +vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
> > +	uint16_t nb_pkts)
> > +{
> <...>
> > +
> > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > +		ret = rte_validate_tx_offload(m);
> > +		if (ret != 0) {
> > +			rte_errno = ret;
> > +			return i;
> > +		}
> > +#endif
> > +		ret = rte_net_intel_cksum_prepare(m);
> 
> Since this API used beyond Intel drivers, what do you think renaming it?
> rte_net_generic_cksum_prepare() ?

I think it is good to have Intel in its name because it is where it
comes from.
Hopefully we won't have to care about this specific API when tx_prepare
will be well accepted.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-02  0:10                                   ` Ananyev, Konstantin
@ 2016-12-22 13:14                                     ` Thomas Monjalon
  2016-12-22 13:37                                       ` Jerin Jacob
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-22 13:14 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Kulasek, TomaszX, dev, olivier.matz, Richardson, Bruce

2016-12-02 00:10, Ananyev, Konstantin:
> I have absolutely no problem to remove the RTE_ETHDEV_TX_PREPARE and associated logic.
> I personally don't use ARM boxes and don't plan to,
> and in theory users can still do conditional compilation at the upper layer, if they want to. 

Yes you're right. The application can avoid calling tx_prepare at all.
No need of an ifdef inside DPDK.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-12 11:51                                     ` Ananyev, Konstantin
@ 2016-12-22 13:30                                       ` Thomas Monjalon
  2016-12-22 14:11                                         ` Ananyev, Konstantin
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-22 13:30 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, Olivier Matz, dev

2016-12-12 11:51, Ananyev, Konstantin:
> > > The application gets few information from tx_prepare() about what should
> > > be done to make the packet accepted by the hw, and the actions will
> > > probably be different depending on hardware.
> 
> That's true.
> I am open to suggestions how in future to provide extra information to the upper layer.
> Set rte_errno to different values depending on type of error,
> OR extra parameter in tx_prepare() that will provide more detailed error information,
> OR something else?

That's one of the reason which give me a feeling that it is safer
to introduce tx_prepare as an experimental API in 17.02.
So the users will know that it can change in the next release.
What do you think?

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-22 13:14                                     ` Thomas Monjalon
@ 2016-12-22 13:37                                       ` Jerin Jacob
  0 siblings, 0 replies; 261+ messages in thread
From: Jerin Jacob @ 2016-12-22 13:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ananyev, Konstantin, Kulasek, TomaszX, dev, olivier.matz,
	Richardson, Bruce

On Thu, Dec 22, 2016 at 02:14:45PM +0100, Thomas Monjalon wrote:
> 2016-12-02 00:10, Ananyev, Konstantin:
> > I have absolutely no problem to remove the RTE_ETHDEV_TX_PREPARE and associated logic.
> > I personally don't use ARM boxes and don't plan to,
> > and in theory users can still do conditional compilation at the upper layer, if they want to. 
> 
> Yes you're right. The application can avoid calling tx_prepare at all.

There are applications inside dpdk repo which will be using tx_prep so
in that case, IMHO, let the ifdef inside the DPDK library and disable it by
default so that if required we can disable it in one shot on integrated
controllers targets where is the system has only one integrated controller and
integrated controller does not need tx_prep


> No need of an ifdef inside DPDK.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v12 1/6] ethdev: add Tx preparation
  2016-12-22 13:30                                       ` Thomas Monjalon
@ 2016-12-22 14:11                                         ` Ananyev, Konstantin
  0 siblings, 0 replies; 261+ messages in thread
From: Ananyev, Konstantin @ 2016-12-22 14:11 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Kulasek, TomaszX, Olivier Matz, dev

> 
> 2016-12-12 11:51, Ananyev, Konstantin:
> > > > The application gets few information from tx_prepare() about what should
> > > > be done to make the packet accepted by the hw, and the actions will
> > > > probably be different depending on hardware.
> >
> > That's true.
> > I am open to suggestions how in future to provide extra information to the upper layer.
> > Set rte_errno to different values depending on type of error,
> > OR extra parameter in tx_prepare() that will provide more detailed error information,
> > OR something else?
> 
> That's one of the reason which give me a feeling that it is safer
> to introduce tx_prepare as an experimental API in 17.02.
> So the users will know that it can change in the next release.
> What do you think?

I think that's the good reason and I am ok with it. 
Konstantin

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v14 1/8] ethdev: add Tx preparation
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 1/8] ethdev: " Tomasz Kulasek
@ 2016-12-22 14:24                             ` Thomas Monjalon
  2016-12-23 18:49                               ` Kulasek, TomaszX
  0 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-22 14:24 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

Hi Tomasz,

2016-12-22 14:05, Tomasz Kulasek:
> Added API for `rte_eth_tx_prepare`
> 
> uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
> 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

As discussed earlier and agreed by Konstantin, please mark this API
as experimental.
We could make some changes in 17.05 to improve error description
or add some flags to modify the behaviour.


> int rte_net_intel_cksum_prepare(struct rte_mbuf *m)
> 
>   to prepare pseudo header checksum for TSO and non-TSO tcp/udp packets
>   before hardware tx checksum offload.
>    - for non-TSO tcp/udp packets full pseudo-header checksum is
>      counted and set.
>    - for TSO the IP payload length is not included.
> 
> 
> int
> rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)
> 
>   this function uses same logic as rte_net_intel_cksum_prepare, but
>   allows application to choose which offloads should be taken into
>   account, if full preparation is not required.

How the application knows which offload flag should be taken into account?


>  #
> +# Use real NOOP to turn off TX preparation stage
> +#
> +# While the behaviour of ``rte_ethdev_tx_prepare`` may change after turning on
> +# real NOOP, this configuration shouldn't be never enabled globaly, and can be
> +# used in appropriate target configuration file with a following restrictions
> +#
> +CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n

As discussed earlier, it would be easier to not call tx_prepare at all.
However, this option allows an optimization when compiling DPDK for a
known environment without modifying the application.
So it is worth to keep it.

The text explaining the option should be improved.
I suggest this text:

# Turn off Tx preparation stage
#
# Warning: rte_ethdev_tx_prepare() can be safely disabled only if using a
# driver which do not implement any Tx preparation.


> +	uint16_t nb_seg_max;  /**< Max number of segments per whole packet. */
> +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */

In another mail, you've added this explanation:
* For non-TSO packet, a single transmit packet may span up to "nb_mtu_seg_max" buffers.
* For TSO packet the total number of data descriptors is "nb_seg_max", and each segment within the TSO may span up to "nb_mtu_seg_max".

Maybe you can try to mix these comments to improve the doxygen.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-12-22 14:28                             ` Thomas Monjalon
  0 siblings, 0 replies; 261+ messages in thread
From: Thomas Monjalon @ 2016-12-22 14:28 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

2016-12-22 14:05, Tomasz Kulasek:
> Since all current drivers supports Tx preparation API, it is used
> in csum forwarding engine by default for all drivers.
[...]
> +/*
> + * Enable Tx preparation path in the "csum" engine.
> + */
> +uint8_t tx_prepare;

It seems this variable is not used.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v14 6/8] vmxnet3: add Tx preparation
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 6/8] vmxnet3: " Tomasz Kulasek
@ 2016-12-22 17:59                             ` Yong Wang
  0 siblings, 0 replies; 261+ messages in thread
From: Yong Wang @ 2016-12-22 17:59 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: Ananyev, Konstantin

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Thursday, December 22, 2016 5:05 AM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v14 6/8] vmxnet3: add Tx preparation
> 
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---

Acked-by: Yong Wang <yongwang@vmware.com>

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 0/8] add Tx preparation
  2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
                                             ` (7 preceding siblings ...)
  2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2016-12-23 18:40                           ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 1/8] ethdev: " Tomasz Kulasek
                                               ` (8 more replies)
  8 siblings, 9 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prepare) before rte_eth_tx_burst let
   to prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prepare() function to do necessary preparations
   of packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prepare" which can be implemented in the driver to prepare
   and verify packets, in devices specific way, before burst, what
   should to prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prepare(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("Tx prepare failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


v15 changes:
 - marked rte_eth_tx_prepare api as experimental
 - improved doxygen comments for nb_seg_max and nb_mtu_seg_max fields
 - removed unused "uint8_t tx_prepare" declaration from testpmd

v14 changes:
 - added support for ena
 - introduced rte_net_intel_cksum_flags_prepare(m, ol_flags) function
   in rte_net.h to allow application choose offloads to be computed
   if not all are required
 - since all drivers supports tx preparation API for now, removed
   csum txprep command from test-pmd and set Tx preparation as default 

v13 changes:
 - added support for vmxnet3
 - reworded help information for "csum txprep" command
 - renamed RTE_ETHDEV_TX_PREPARE to RTE_ETHDEV_TX_PREPARE_NOOP to
   better suit its purpose.

v12 changes:
 - renamed API function from "rte_eth_tx_prep" to "rte_eth_tx_prepare"
   (to be not confused with "prepend")
 - changed "rte_phdr_cksum_fix" to "rte_net_intel_cksum_prepare"
 - added "csum txprep (on|off)" command to the csum engine allowing to
   select txprep path for packet processing

v11 changed:
 - updated comments
 - added information to the API description about packet data
   requirements/limitations.

v10 changes:
 - moved drivers tx calback check in rte_eth_tx_prep after queue_id check

v9 changes:
 - fixed headers structure fragmentation check
 - moved fragmentation check into rte_validate_tx_offload()

v8 changes:
 - mbuf argument in rte_validate_tx_offload declared as const

v7 changes:
 - comments reworded/added
 - changed errno values returned from Tx prep API
 - added check in rte_phdr_cksum_fix if headers are in the first
   data segment and can be safetly modified
 - moved rte_validate_tx_offload to rte_mbuf
 - moved rte_phdr_cksum_fix to rte_net.h
 - removed rte_pkt.h new file as useless

v6 changes:
 - added performance impact test results to the patch description

v5 changes:
 - rebased csum engine modification
 - added information to the csum engine about performance tests
 - some performance improvements

v4 changes:
 - tx_prep is now set to default behavior (NULL) for simple/vector path
   in fm10k, i40e and ixgbe drivers to increase performance, when
   Tx offloads are not intentionally available

v3 changes:
 - reworked csum testpmd engine instead adding new one,
 - fixed checksum initialization procedure to include also outer
   checksum offloads,
 - some minor formattings and optimalizations

v2 changes:
 - rte_eth_tx_prep() returns number of packets when device doesn't
   support tx_prep functionality,
 - introduced CONFIG_RTE_ETHDEV_TX_PREP allowing to turn off tx_prep


Konstantin Ananyev (2):
  ena: add Tx preparation
  vmxnet3: add Tx preparation

Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: use Tx preparation in csum engine

 app/test-pmd/csumonly.c              |   37 ++++-----
 app/test-pmd/testpmd.c               |    5 ++
 config/common_base                   |    8 ++
 drivers/net/e1000/e1000_ethdev.h     |   11 +++
 drivers/net/e1000/em_ethdev.c        |    5 +-
 drivers/net/e1000/em_rxtx.c          |   48 +++++++++++-
 drivers/net/e1000/igb_ethdev.c       |    4 +
 drivers/net/e1000/igb_rxtx.c         |   53 ++++++++++++-
 drivers/net/ena/ena_ethdev.c         |   51 +++++++++++++
 drivers/net/fm10k/fm10k.h            |    6 ++
 drivers/net/fm10k/fm10k_ethdev.c     |    5 ++
 drivers/net/fm10k/fm10k_rxtx.c       |   50 +++++++++++-
 drivers/net/i40e/i40e_ethdev.c       |    3 +
 drivers/net/i40e/i40e_rxtx.c         |   74 +++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h         |    8 ++
 drivers/net/ixgbe/ixgbe_ethdev.c     |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h     |    5 +-
 drivers/net/ixgbe/ixgbe_rxtx.c       |   57 ++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h       |    2 +
 drivers/net/vmxnet3/vmxnet3_ethdev.c |    6 ++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 +
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |   56 ++++++++++++++
 lib/librte_ether/rte_ethdev.h        |  139 ++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h           |   64 ++++++++++++++++
 lib/librte_net/rte_net.h             |  110 +++++++++++++++++++++++++++
 25 files changed, 785 insertions(+), 27 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 1/8] ethdev: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 2/8] e1000: " Tomasz Kulasek
                                               ` (7 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Added API for `rte_eth_tx_prepare`

uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

	uint16_t nb_seg_max;
		/**< Max number of segments per whole packet. */

	uint16_t nb_mtu_seg_max;
		/**< Max number of segments per one MTU */

These fields can be used to create valid packets according to the
following rules:

 * For non-TSO packet, a single transmit packet may span up to
   "nb_mtu_seg_max" buffers.

 * For TSO packet the total number of data descriptors is "nb_seg_max",
   and each segment within the TSO may span up to "nb_mtu_seg_max".


Added functions:

int
rte_validate_tx_offload(struct rte_mbuf *m)

  to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.


int rte_net_intel_cksum_prepare(struct rte_mbuf *m)

  to prepare pseudo header checksum for TSO and non-TSO tcp/udp packets
  before hardware tx checksum offload.
   - for non-TSO tcp/udp packets full pseudo-header checksum is
     counted and set.
   - for TSO the IP payload length is not included.


int
rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)

  this function uses same logic as rte_net_intel_cksum_prepare, but
  allows application to choose which offloads should be taken into
  account, if full preparation is not required.

  
PERFORMANCE TESTS
-----------------

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of different
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base            |    8 +++
 lib/librte_ether/rte_ethdev.h |  139 +++++++++++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.h    |   64 +++++++++++++++++++
 lib/librte_net/rte_net.h      |  110 ++++++++++++++++++++++++++++++++
 4 files changed, 321 insertions(+)

diff --git a/config/common_base b/config/common_base
index edb6a54..8e9dcfa 100644
--- a/config/common_base
+++ b/config/common_base
@@ -123,6 +123,14 @@ CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
 
 #
+# Turn off Tx preparation stage
+#
+# Warning: rte_ethdev_tx_prepare() can be safely disabled only if using a
+# driver which do not implement any Tx preparation.
+#
+CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n
+
+#
 # Support NIC bypass logic
 #
 CONFIG_RTE_NIC_BYPASS=n
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 52119af..86c16e0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_errno.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -702,6 +703,29 @@ struct rte_eth_desc_lim {
 	uint16_t nb_max;   /**< Max allowed number of descriptors. */
 	uint16_t nb_min;   /**< Min allowed number of descriptors. */
 	uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+
+	/**
+	 * Max allowed number of segments per whole packet.
+	 *
+	 * - For TSO packet this is the total number of data descriptors allowed
+	 *   by device.
+	 *
+	 * @see nb_mtu_seg_max
+	 */
+	uint16_t nb_seg_max;
+
+	/**
+	 * Max number of segments per one MTU.
+	 *
+	 * - For non-TSO packet, this is the maximum allowed number of segments
+	 *   in a single transmit packet.
+	 *
+	 * - For TSO packet each segment within the TSO may span up to this
+	 *   value.
+	 *
+	 * @see nb_seg_max
+	 */
+	uint16_t nb_mtu_seg_max;
 };
 
 /**
@@ -1191,6 +1215,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
 				   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. */
 
+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+				   struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
 			       struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1625,6 +1654,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
 	eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
 	eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+	eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare function. */
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2832,6 +2862,115 @@ int rte_eth_dev_set_vlan_ether_type(uint8_t port_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prepare() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prepare() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * Since this function can modify packet data, provided mbufs must be safely
+ * writable (e.g. modified data cannot be in shared segment).
+ *
+ * The rte_eth_tx_prepare() function returns the number of packets ready to be
+ * sent. A return value equal to *nb_pkts* means that all packets are valid and
+ * ready to be sent, otherwise stops processing on the first invalid packet and
+ * leaves the rest packets untouched.
+ *
+ * When this functionality is not implemented in the driver, all packets are
+ * are returned untouched.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *   The value must be a valid port id.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately:
+ *   - -EINVAL: offload flags are not correctly set
+ *   - -ENOTSUP: the offload feature is not supported by the hardware
+ *
+ */
+
+#ifndef RTE_ETHDEV_TX_PREPARE_NOOP
+
+static inline uint16_t
+rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX port_id=%d\n", port_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	dev = &rte_eth_devices[port_id];
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	if (!dev->tx_pkt_prepare)
+		return nb_pkts;
+
+	return (*dev->tx_pkt_prepare)(dev->data->tx_queues[queue_id],
+			tx_pkts, nb_pkts);
+}
+
+#else
+
+/*
+ * Native NOOP operation for compilation targets which doesn't require any
+ * preparations steps, and functional NOOP may introduce unnecessary performance
+ * drop.
+ *
+ * Generally this is not a good idea to turn it on globally and didn't should
+ * be used if behavior of tx_preparation can change.
+ */
+
+static inline uint16_t
+rte_eth_tx_prepare(__rte_unused uint8_t port_id, __rte_unused uint16_t queue_id,
+		__rte_unused struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	return nb_pkts;
+}
+
+#endif
+
 typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
 		void *userdata);
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..39ee5ed 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -283,6 +283,19 @@
  */
 #define PKT_TX_OUTER_IPV6    (1ULL << 60)
 
+/**
+ * Bit Mask of all supported packet Tx offload features flags, which can be set
+ * for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK)
+
 #define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
@@ -1647,6 +1660,57 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
 }
 
 /**
+ * Validate general requirements for tx offload in mbuf.
+ *
+ * This function checks correctness and completeness of Tx offload settings.
+ *
+ * @param m
+ *   The packet mbuf to be validated.
+ * @return
+ *   0 if packet is valid
+ */
+static inline int
+rte_validate_tx_offload(const struct rte_mbuf *m)
+{
+	uint64_t ol_flags = m->ol_flags;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	/* Does packet set any of available offloads? */
+	if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
+		return 0;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	/* Headers are fragmented */
+	if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
+		return -ENOTSUP;
+
+	/* IP checksum can be counted only for IPv4 packet */
+	if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
+		return -EINVAL;
+
+	/* IP type not set when required */
+	if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
+		if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
+			return -EINVAL;
+
+	/* Check requirements for TSO packet */
+	if (ol_flags & PKT_TX_TCP_SEG)
+		if ((m->tso_segsz == 0) ||
+				((ol_flags & PKT_TX_IPV4) &&
+				!(ol_flags & PKT_TX_IP_CKSUM)))
+			return -EINVAL;
+
+	/* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
+	if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
+			!(ol_flags & PKT_TX_OUTER_IPV4))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
  * Dump an mbuf structure to a file.
  *
  * Dump all fields for the given packet mbuf and all its associated
diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
index d4156ae..548eaed 100644
--- a/lib/librte_net/rte_net.h
+++ b/lib/librte_net/rte_net.h
@@ -38,6 +38,11 @@
 extern "C" {
 #endif
 
+#include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
+#include <rte_sctp.h>
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
@@ -86,6 +91,111 @@ struct rte_net_hdr_lens {
 uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
 
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data and based on the requested offload flags.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @param ol_flags
+ *   TX offloads flags to use with this packet.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ipv6_hdr *ipv6_hdr;
+	struct tcp_hdr *tcp_hdr;
+	struct udp_hdr *udp_hdr;
+	uint64_t inner_l3_offset = m->l2_len;
+
+	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+		inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
+
+	if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO udp */
+			udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
+					inner_l3_offset + m->l3_len);
+			udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	} else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
+			(ol_flags & PKT_TX_TCP_SEG)) {
+		if (ol_flags & PKT_TX_IPV4) {
+			ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
+					inner_l3_offset);
+
+			if (ol_flags & PKT_TX_IP_CKSUM)
+				ipv4_hdr->hdr_checksum = 0;
+
+			/* non-TSO tcp or TSO */
+			tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr +
+					m->l3_len);
+			tcp_hdr->cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
+					ol_flags);
+		} else {
+			ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
+					inner_l3_offset);
+			/* non-TSO tcp or TSO */
+			tcp_hdr = rte_pktmbuf_mtod_offset(m, struct tcp_hdr *,
+					inner_l3_offset + m->l3_len);
+			tcp_hdr->cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
+					ol_flags);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Prepare pseudo header checksum
+ *
+ * This function prepares pseudo header checksum for TSO and non-TSO tcp/udp in
+ * provided mbufs packet data.
+ *
+ * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and set
+ *   in packet data,
+ * - for TSO the IP payload length is not included in pseudo header.
+ *
+ * This function expects that used headers are in the first data segment of
+ * mbuf, are not fragmented and can be safely modified.
+ *
+ * @param m
+ *   The packet mbuf to be fixed.
+ * @return
+ *   0 if checksum is initialized properly
+ */
+static inline int
+rte_net_intel_cksum_prepare(struct rte_mbuf *m)
+{
+	return rte_net_intel_cksum_flags_prepare(m, m->ol_flags);
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 2/8] e1000: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 1/8] ethdev: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 3/8] fm10k: " Tomasz Kulasek
                                               ` (6 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h |   11 ++++++++
 drivers/net/e1000/em_ethdev.c    |    5 +++-
 drivers/net/e1000/em_rxtx.c      |   48 +++++++++++++++++++++++++++++++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +++
 drivers/net/e1000/igb_rxtx.c     |   53 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..bd0f277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -138,6 +138,11 @@
 #define E1000_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define E1000_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IGB_TX_MAX_SEG     UINT8_MAX
+#define IGB_TX_MAX_MTU_SEG UINT8_MAX
+#define EM_TX_MAX_SEG      UINT8_MAX
+#define EM_TX_MAX_MTU_SEG  UINT8_MAX
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
 	uint32_t flags;
@@ -315,6 +320,9 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igb_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
@@ -376,6 +384,9 @@ int eth_em_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 uint16_t eth_em_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_em_prep_pkts(void *txq, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 866a5cf..00d5996 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -300,6 +300,7 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &eth_em_ops;
 	eth_dev->rx_pkt_burst = (eth_rx_burst_t)&eth_em_recv_pkts;
 	eth_dev->tx_pkt_burst = (eth_tx_burst_t)&eth_em_xmit_pkts;
+	eth_dev->tx_pkt_prepare = (eth_tx_prep_t)&eth_em_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1079,6 +1080,8 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev,
 		.nb_max = E1000_MAX_RING_DESC,
 		.nb_min = E1000_MIN_RING_DESC,
 		.nb_align = EM_TXD_ALIGN,
+		.nb_seg_max = EM_TX_MAX_SEG,
+		.nb_mtu_seg_max = EM_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 41f51c0..7e271ad 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -66,6 +66,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -77,6 +78,14 @@
 
 #define E1000_RXDCTL_GRAN	0x01000000 /* RXDCTL Granularity */
 
+#define E1000_TX_OFFLOAD_MASK ( \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_VLAN_PKT)
+
+#define E1000_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ E1000_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -618,6 +627,43 @@ struct em_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_em_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if (m->ol_flags & E1000_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 08f2a68..cfe1180 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -369,6 +369,8 @@ static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 	.nb_max = E1000_MAX_RING_DESC,
 	.nb_min = E1000_MIN_RING_DESC,
 	.nb_align = IGB_RXD_ALIGN,
+	.nb_seg_max = IGB_TX_MAX_SEG,
+	.nb_mtu_seg_max = IGB_TX_MAX_MTU_SEG,
 };
 
 static const struct eth_dev_ops eth_igb_ops = {
@@ -760,6 +762,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &eth_igb_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -963,6 +966,7 @@ struct rte_igb_xstats_name_off {
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..5d0d3cd 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -65,6 +65,7 @@
 #include <rte_udp.h>
 #include <rte_tcp.h>
 #include <rte_sctp.h>
+#include <rte_net.h>
 #include <rte_string_fns.h>
 
 #include "e1000_logs.h"
@@ -78,6 +79,9 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
+#define IGB_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -616,6 +620,52 @@ struct igb_tx_queue {
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+eth_igb_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		/* Check some limitations for TSO in hardware */
+		if (m->ol_flags & PKT_TX_TCP_SEG)
+			if ((m->tso_segsz > IGB_TSO_MAX_MSS) ||
+					(m->l2_len + m->l3_len + m->l4_len >
+					IGB_TSO_MAX_HDRLEN)) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+
+		if (m->ol_flags & IGB_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -1364,6 +1414,7 @@ struct igb_tx_queue {
 
 	igb_reset_tx_queue(txq, dev);
 	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_prepare = &eth_igb_prep_pkts;
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 3/8] fm10k: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 1/8] ethdev: " Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 2/8] e1000: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 4/8] i40e: " Tomasz Kulasek
                                               ` (5 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/fm10k/fm10k.h        |    6 +++++
 drivers/net/fm10k/fm10k_ethdev.c |    5 ++++
 drivers/net/fm10k/fm10k_rxtx.c   |   50 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 05aa1a2..c6fed21 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -69,6 +69,9 @@
 #define FM10K_MAX_RX_DESC  (FM10K_MAX_RX_RING_SZ / sizeof(union fm10k_rx_desc))
 #define FM10K_MAX_TX_DESC  (FM10K_MAX_TX_RING_SZ / sizeof(struct fm10k_tx_desc))
 
+#define FM10K_TX_MAX_SEG     UINT8_MAX
+#define FM10K_TX_MAX_MTU_SEG UINT8_MAX
+
 /*
  * byte aligment for HW RX data buffer
  * Datasheet requires RX buffer addresses shall either be 512-byte aligned or
@@ -356,6 +359,9 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	uint16_t nb_pkts);
 
+uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index fe74f6d..6648468 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1447,6 +1447,8 @@ static int fm10k_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = FM10K_MAX_TX_DESC,
 		.nb_min = FM10K_MIN_TX_DESC,
 		.nb_align = FM10K_MULT_TX_DESC,
+		.nb_seg_max = FM10K_TX_MAX_SEG,
+		.nb_mtu_seg_max = FM10K_TX_MAX_MTU_SEG,
 	};
 
 	dev_info->speed_capa = ETH_LINK_SPEED_1G | ETH_LINK_SPEED_2_5G |
@@ -2755,8 +2757,10 @@ static void __attribute__((cold))
 			fm10k_txq_vec_setup(txq);
 		}
 		dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		dev->tx_pkt_burst = fm10k_xmit_pkts;
+		dev->tx_pkt_prepare = fm10k_prep_pkts;
 		PMD_INIT_LOG(DEBUG, "Use regular Tx func");
 	}
 }
@@ -2835,6 +2839,7 @@ static void __attribute__((cold))
 	dev->dev_ops = &fm10k_eth_dev_ops;
 	dev->rx_pkt_burst = &fm10k_recv_pkts;
 	dev->tx_pkt_burst = &fm10k_xmit_pkts;
+	dev->tx_pkt_prepare = &fm10k_prep_pkts;
 
 	/* only initialize in the primary process */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 32cc7ff..144e5e6 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -35,6 +35,7 @@
 
 #include <rte_ethdev.h>
 #include <rte_common.h>
+#include <rte_net.h>
 #include "fm10k.h"
 #include "base/fm10k_type.h"
 
@@ -65,6 +66,15 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 }
 #endif
 
+#define FM10K_TX_OFFLOAD_MASK (  \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_TCP_SEG)
+
+#define FM10K_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ FM10K_TX_OFFLOAD_MASK)
+
 /* @note: When this function is changed, make corresponding change to
  * fm10k_dev_supported_ptypes_get()
  */
@@ -597,3 +607,41 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, struct rte_mbuf *mb)
 
 	return count;
 }
+
+uint16_t
+fm10k_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+
+		if ((m->ol_flags & PKT_TX_TCP_SEG) &&
+				(m->tso_segsz < FM10K_TSO_MINMSS)) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (m->ol_flags & FM10K_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 4/8] i40e: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (2 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 3/8] fm10k: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 5/8] ixgbe: " Tomasz Kulasek
                                               ` (4 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c |    3 ++
 drivers/net/i40e/i40e_rxtx.c   |   74 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.h   |    8 +++++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index b0c0fbf..0e20178 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -944,6 +944,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 	dev->dev_ops = &i40e_eth_dev_ops;
 	dev->rx_pkt_burst = i40e_recv_pkts;
 	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->tx_pkt_prepare = i40e_prep_pkts;
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -2646,6 +2647,8 @@ static int i40e_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 		.nb_max = I40E_MAX_RING_DESC,
 		.nb_min = I40E_MIN_RING_DESC,
 		.nb_align = I40E_ALIGN_RING_DESC,
+		.nb_seg_max = I40E_TX_MAX_SEG,
+		.nb_mtu_seg_max = I40E_TX_MAX_MTU_SEG,
 	};
 
 	if (pf->flags & I40E_FLAG_VMDQ) {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 7ae7d9f..1c9a6c8 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,8 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_udp.h>
+#include <rte_ip.h>
+#include <rte_net.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -79,6 +81,17 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define I40E_TX_OFFLOAD_MASK (  \
+		PKT_TX_IP_CKSUM |       \
+		PKT_TX_L4_MASK |        \
+		PKT_TX_OUTER_IP_CKSUM | \
+		PKT_TX_TCP_SEG |        \
+		PKT_TX_QINQ_PKT |       \
+		PKT_TX_VLAN_PKT)
+
+#define I40E_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -1411,6 +1424,63 @@ static inline int __attribute__((always_inline))
 	return nb_tx;
 }
 
+/*********************************************************************
+ *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+i40e_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * m->nb_segs is uint8_t, so nb_segs is always less than
+		 * I40E_TX_MAX_SEG.
+		 * We check only a condition for nb_segs > I40E_TX_MAX_MTU_SEG.
+		 */
+		if (!(ol_flags & PKT_TX_TCP_SEG)) {
+			if (m->nb_segs > I40E_TX_MAX_MTU_SEG) {
+				rte_errno = -EINVAL;
+				return i;
+			}
+		} else if ((m->tso_segsz < I40E_MIN_TSO_MSS) ||
+				(m->tso_segsz > I40E_MAX_TSO_MSS)) {
+			/* MSS outside the range (256B - 9674B) are considered
+			 * malicious
+			 */
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & I40E_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+	return i;
+}
+
 /*
  * Find the VSI the queue belongs to. 'queue_idx' is the queue index
  * application used, which assume having sequential ones. But from driver's
@@ -2763,9 +2833,11 @@ void __attribute__((cold))
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
 			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
 		}
+		dev->tx_pkt_prepare = NULL;
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
 		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_prepare = i40e_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index ecdb13c..9df8a56 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -63,6 +63,12 @@
 #define	I40E_MIN_RING_DESC	64
 #define	I40E_MAX_RING_DESC	4096
 
+#define I40E_MIN_TSO_MSS          256
+#define I40E_MAX_TSO_MSS          9674
+
+#define I40E_TX_MAX_SEG     UINT8_MAX
+#define I40E_TX_MAX_MTU_SEG 8
+
 #undef container_of
 #define container_of(ptr, type, member) ({ \
 		typeof(((type *)0)->member)(*__mptr) = (ptr); \
@@ -223,6 +229,8 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 5/8] ixgbe: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (3 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 4/8] i40e: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 6/8] vmxnet3: " Tomasz Kulasek
                                               ` (3 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |    5 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   57 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 ++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index baffc71..d726a2b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -517,6 +517,8 @@ static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 	.nb_max = IXGBE_MAX_RING_DESC,
 	.nb_min = IXGBE_MIN_RING_DESC,
 	.nb_align = IXGBE_TXD_ALIGN,
+	.nb_seg_max = IXGBE_TX_MAX_SEG,
+	.nb_mtu_seg_max = IXGBE_TX_MAX_SEG,
 };
 
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
@@ -1103,6 +1105,7 @@ struct rte_ixgbe_xstats_name_off {
 	eth_dev->dev_ops = &ixgbe_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..e229cf5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -396,6 +396,9 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b2d9f45..0bbc583 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -70,6 +70,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_ip.h>
+#include <rte_net.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -87,6 +88,9 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
+#define IXGBE_TX_OFFLOAD_NOTSUP_MASK \
+		(PKT_TX_OFFLOAD_MASK ^ IXGBE_TX_OFFLOAD_MASK)
+
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -905,6 +909,57 @@ static inline int __attribute__((always_inline))
 
 /*********************************************************************
  *
+ *  TX prep functions
+ *
+ **********************************************************************/
+uint16_t
+ixgbe_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	int i, ret;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+	struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
+
+	for (i = 0; i < nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/**
+		 * Check if packet meets requirements for number of segments
+		 *
+		 * NOTE: for ixgbe it's always (40 - WTHRESH) for both TSO and
+		 *       non-TSO
+		 */
+
+		if (m->nb_segs > IXGBE_TX_MAX_SEG - txq->wthresh) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		if (ol_flags & IXGBE_TX_OFFLOAD_NOTSUP_MASK) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+/*********************************************************************
+ *
  *  RX functions
  *
  **********************************************************************/
@@ -2282,6 +2337,7 @@ void __attribute__((cold))
 	if (((txq->txq_flags & IXGBE_SIMPLE_FLAGS) == IXGBE_SIMPLE_FLAGS)
 			&& (txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST)) {
 		PMD_INIT_LOG(DEBUG, "Using simple tx code path");
+		dev->tx_pkt_prepare = NULL;
 #ifdef RTE_IXGBE_INC_VECTOR
 		if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
@@ -2302,6 +2358,7 @@ void __attribute__((cold))
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
 		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_prepare = ixgbe_prep_pkts;
 	}
 }
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 2608b36..7bbd9b8 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -80,6 +80,8 @@
 #define RTE_IXGBE_WAIT_100_US               100
 #define RTE_IXGBE_VMTXSW_REGISTER_COUNT     2
 
+#define IXGBE_TX_MAX_SEG                    40
+
 #define IXGBE_PACKET_TYPE_MASK_82599        0X7F
 #define IXGBE_PACKET_TYPE_MASK_X550         0X10FF
 #define IXGBE_PACKET_TYPE_MASK_TUNNEL       0XFF
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 6/8] vmxnet3: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (4 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 5/8] ixgbe: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 7/8] ena: " Tomasz Kulasek
                                               ` (2 subsequent siblings)
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev; +Cc: Ananyev, Konstantin

From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Yong Wang <yongwang@vmware.com>
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |    6 ++++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |    2 ++
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |   56 ++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 93c9ac9..e31896f 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -69,6 +69,8 @@
 
 #define PROCESS_SYS_EVENTS 0
 
+#define	VMXNET3_TX_MAX_SEG	UINT8_MAX
+
 static int eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_vmxnet3_dev_uninit(struct rte_eth_dev *eth_dev);
 static int vmxnet3_dev_configure(struct rte_eth_dev *dev);
@@ -237,6 +239,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = &vmxnet3_eth_dev_ops;
 	eth_dev->rx_pkt_burst = &vmxnet3_recv_pkts;
 	eth_dev->tx_pkt_burst = &vmxnet3_xmit_pkts;
+	eth_dev->tx_pkt_prepare = vmxnet3_prep_pkts;
 	pci_dev = eth_dev->pci_dev;
 
 	/*
@@ -326,6 +329,7 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 	eth_dev->dev_ops = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 	eth_dev->tx_pkt_burst = NULL;
+	eth_dev->tx_pkt_prepare = NULL;
 
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
@@ -728,6 +732,8 @@ static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 		.nb_max = VMXNET3_TX_RING_MAX_SIZE,
 		.nb_min = VMXNET3_DEF_TX_RING_SIZE,
 		.nb_align = 1,
+		.nb_seg_max = VMXNET3_TX_MAX_SEG,
+		.nb_mtu_seg_max = VMXNET3_MAX_TXD_PER_PKT,
 	};
 
 	dev_info->rx_offload_capa =
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h b/drivers/net/vmxnet3/vmxnet3_ethdev.h
index 7d3b11e..469db71 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
@@ -171,5 +171,7 @@ uint16_t vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 			   uint16_t nb_pkts);
 uint16_t vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
+uint16_t vmxnet3_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts);
 
 #endif /* _VMXNET3_ETHDEV_H_ */
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index b109168..3651369 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -69,6 +69,7 @@
 #include <rte_sctp.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
+#include <rte_net.h>
 
 #include "base/vmxnet3_defs.h"
 #include "vmxnet3_ring.h"
@@ -76,6 +77,14 @@
 #include "vmxnet3_logs.h"
 #include "vmxnet3_ethdev.h"
 
+#define	VMXNET3_TX_OFFLOAD_MASK	( \
+		PKT_TX_VLAN_PKT | \
+		PKT_TX_L4_MASK |  \
+		PKT_TX_TCP_SEG)
+
+#define	VMXNET3_TX_OFFLOAD_NOTSUP_MASK	\
+	(PKT_TX_OFFLOAD_MASK ^ VMXNET3_TX_OFFLOAD_MASK)
+
 static const uint32_t rxprod_reg[2] = {VMXNET3_REG_RXPROD, VMXNET3_REG_RXPROD2};
 
 static int vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t*, uint8_t);
@@ -350,6 +359,53 @@
 }
 
 uint16_t
+vmxnet3_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	int32_t ret;
+	uint32_t i;
+	uint64_t ol_flags;
+	struct rte_mbuf *m;
+
+	for (i = 0; i != nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		/* Non-TSO packet cannot occupy more than
+		 * VMXNET3_MAX_TXD_PER_PKT TX descriptors.
+		 */
+		if ((ol_flags & PKT_TX_TCP_SEG) == 0 &&
+				m->nb_segs > VMXNET3_MAX_TXD_PER_PKT) {
+			rte_errno = -EINVAL;
+			return i;
+		}
+
+		/* check that only supported TX offloads are requested. */
+		if ((ol_flags & VMXNET3_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
+				(ol_flags & PKT_TX_L4_MASK) ==
+				PKT_TX_SCTP_CKSUM) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		ret = rte_net_intel_cksum_prepare(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
+uint16_t
 vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		  uint16_t nb_pkts)
 {
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 7/8] ena: add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (5 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 6/8] vmxnet3: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
  2017-01-04 19:41                             ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Thomas Monjalon
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

From: Konstantin Ananyev <konstantin.ananyev@intel.com>

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ena/ena_ethdev.c |   51 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 555fb31..51af723 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -39,6 +39,7 @@
 #include <rte_errno.h>
 #include <rte_version.h>
 #include <rte_eal_memconfig.h>
+#include <rte_net.h>
 
 #include "ena_ethdev.h"
 #include "ena_logs.h"
@@ -168,6 +169,14 @@ struct ena_stats {
 #define PCI_DEVICE_ID_ENA_VF	0xEC20
 #define PCI_DEVICE_ID_ENA_LLQ_VF	0xEC21
 
+#define	ENA_TX_OFFLOAD_MASK	(\
+	PKT_TX_L4_MASK |         \
+	PKT_TX_IP_CKSUM |        \
+	PKT_TX_TCP_SEG)
+
+#define	ENA_TX_OFFLOAD_NOTSUP_MASK	\
+	(PKT_TX_OFFLOAD_MASK ^ ENA_TX_OFFLOAD_MASK)
+
 static struct rte_pci_id pci_id_ena_map[] = {
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF) },
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_LLQ_VF) },
@@ -179,6 +188,8 @@ static int ena_device_init(struct ena_com_dev *ena_dev,
 static int ena_dev_configure(struct rte_eth_dev *dev);
 static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				  uint16_t nb_pkts);
+static uint16_t eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
 static int ena_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint16_t nb_desc, unsigned int socket_id,
 			      const struct rte_eth_txconf *tx_conf);
@@ -1272,6 +1283,7 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->dev_ops = &ena_dev_ops;
 	eth_dev->rx_pkt_burst = &eth_ena_recv_pkts;
 	eth_dev->tx_pkt_burst = &eth_ena_xmit_pkts;
+	eth_dev->tx_pkt_prepare = &eth_ena_prep_pkts;
 	adapter->rte_eth_dev_data = eth_dev->data;
 	adapter->rte_dev = eth_dev;
 
@@ -1570,6 +1582,45 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return recv_idx;
 }
 
+static uint16_t
+eth_ena_prep_pkts(__rte_unused void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	int32_t ret;
+	uint32_t i;
+	struct rte_mbuf *m;
+	uint64_t ol_flags;
+
+	for (i = 0; i != nb_pkts; i++) {
+		m = tx_pkts[i];
+		ol_flags = m->ol_flags;
+
+		if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
+				(ol_flags & PKT_TX_L4_MASK) ==
+				PKT_TX_SCTP_CKSUM) {
+			rte_errno = -ENOTSUP;
+			return i;
+		}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+		ret = rte_validate_tx_offload(m);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+#endif
+		/* ENA doesn't need different phdr cskum for TSO */
+		ret = rte_net_intel_cksum_flags_prepare(m,
+			ol_flags & ~PKT_TX_TCP_SEG);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}
+	}
+
+	return i;
+}
+
 static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				  uint16_t nb_pkts)
 {
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH v15 8/8] testpmd: use Tx preparation in csum engine
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (6 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 7/8] ena: " Tomasz Kulasek
@ 2016-12-23 18:40                             ` Tomasz Kulasek
  2017-01-04 19:41                             ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Thomas Monjalon
  8 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-12-23 18:40 UTC (permalink / raw)
  To: dev

Since all current drivers supports Tx preparation API, it is used
in csum forwarding engine by default for all drivers.

Adding additional step to the csum engine costs about 3-4% of performance
drop, on my setup with ixgbe driver. It's caused mostly by the need
of reaccessing and modification of packet data.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/csumonly.c |   37 ++++++++++++++++---------------------
 app/test-pmd/testpmd.c  |    5 +++++
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 57e6ae2..806f957 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -112,15 +112,6 @@ struct simple_gre_hdr {
 } __attribute__((__packed__));
 
 static uint16_t
-get_psd_sum(void *l3_hdr, uint16_t ethertype, uint64_t ol_flags)
-{
-	if (ethertype == _htons(ETHER_TYPE_IPv4))
-		return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
-	else /* assume ethertype == ETHER_TYPE_IPv6 */
-		return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
-}
-
-static uint16_t
 get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype)
 {
 	if (ethertype == _htons(ETHER_TYPE_IPv4))
@@ -370,11 +361,9 @@ struct simple_gre_hdr {
 		/* do not recalculate udp cksum if it was 0 */
 		if (udp_hdr->dgram_cksum != 0) {
 			udp_hdr->dgram_cksum = 0;
-			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) {
+			if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM)
 				ol_flags |= PKT_TX_UDP_CKSUM;
-				udp_hdr->dgram_cksum = get_psd_sum(l3_hdr,
-					info->ethertype, ol_flags);
-			} else {
+			else {
 				udp_hdr->dgram_cksum =
 					get_udptcp_checksum(l3_hdr, udp_hdr,
 						info->ethertype);
@@ -383,15 +372,11 @@ struct simple_gre_hdr {
 	} else if (info->l4_proto == IPPROTO_TCP) {
 		tcp_hdr = (struct tcp_hdr *)((char *)l3_hdr + info->l3_len);
 		tcp_hdr->cksum = 0;
-		if (tso_segsz) {
+		if (tso_segsz)
 			ol_flags |= PKT_TX_TCP_SEG;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) {
+		else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM)
 			ol_flags |= PKT_TX_TCP_CKSUM;
-			tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype,
-				ol_flags);
-		} else {
+		else {
 			tcp_hdr->cksum =
 				get_udptcp_checksum(l3_hdr, tcp_hdr,
 					info->ethertype);
@@ -648,6 +633,7 @@ struct simple_gre_hdr {
 	void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */
 	uint16_t nb_rx;
 	uint16_t nb_tx;
+	uint16_t nb_prep;
 	uint16_t i;
 	uint64_t rx_ol_flags, tx_ol_flags;
 	uint16_t testpmd_ol_flags;
@@ -857,7 +843,16 @@ struct simple_gre_hdr {
 			printf("\n");
 		}
 	}
-	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx);
+
+	nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue,
+			pkts_burst, nb_rx);
+	if (nb_prep != nb_rx)
+		printf("Preparing packet burst to transmit failed: %s\n",
+				rte_strerror(rte_errno));
+
+	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst,
+			nb_prep);
+
 	/*
 	 * Retry if necessary
 	 */
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c2..634f10b 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -180,6 +180,11 @@ struct fwd_engine * fwd_engines[] = {
 enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF;
 /**< Split policy for packets to TX. */
 
+/*
+ * Enable Tx preparation path in the "csum" engine.
+ */
+uint8_t tx_prepare;
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v14 1/8] ethdev: add Tx preparation
  2016-12-22 14:24                             ` Thomas Monjalon
@ 2016-12-23 18:49                               ` Kulasek, TomaszX
  0 siblings, 0 replies; 261+ messages in thread
From: Kulasek, TomaszX @ 2016-12-23 18:49 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, December 22, 2016 15:25
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v14 1/8] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 
> 2016-12-22 14:05, Tomasz Kulasek:
> > Added API for `rte_eth_tx_prepare`
> >
> > uint16_t rte_eth_tx_prepare(uint8_t port_id, uint16_t queue_id,
> > 	struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> As discussed earlier and agreed by Konstantin, please mark this API
> as experimental.
> We could make some changes in 17.05 to improve error description
> or add some flags to modify the behaviour.
> 

Is it enough to add

/**
 * @warning
 * @b EXPERIMENTAL: this API may change without prior notice

?

> 
> > int rte_net_intel_cksum_prepare(struct rte_mbuf *m)
> >
> >   to prepare pseudo header checksum for TSO and non-TSO tcp/udp packets
> >   before hardware tx checksum offload.
> >    - for non-TSO tcp/udp packets full pseudo-header checksum is
> >      counted and set.
> >    - for TSO the IP payload length is not included.
> >
> >
> > int
> > rte_net_intel_cksum_flags_prepare(struct rte_mbuf *m, uint64_t ol_flags)
> >
> >   this function uses same logic as rte_net_intel_cksum_prepare, but
> >   allows application to choose which offloads should be taken into
> >   account, if full preparation is not required.
> 
> How the application knows which offload flag should be taken into account?
>

This new API is used in ena driver:

+		/* ENA doesn't need different phdr cskum for TSO */
+		ret = rte_net_intel_cksum_flags_prepare(m,
+			ol_flags & ~PKT_TX_TCP_SEG);
+		if (ret != 0) {
+			rte_errno = ret;
+			return i;
+		}

It's more useful to mask offloads which didn't should be used. 

> 
> >  #
> > +# Use real NOOP to turn off TX preparation stage
> > +#
> > +# While the behaviour of ``rte_ethdev_tx_prepare`` may change after
> turning on
> > +# real NOOP, this configuration shouldn't be never enabled globaly, and
> can be
> > +# used in appropriate target configuration file with a following
> restrictions
> > +#
> > +CONFIG_RTE_ETHDEV_TX_PREPARE_NOOP=n
> 
> As discussed earlier, it would be easier to not call tx_prepare at all.
> However, this option allows an optimization when compiling DPDK for a
> known environment without modifying the application.
> So it is worth to keep it.
> 
> The text explaining the option should be improved.
> I suggest this text:
> 
> # Turn off Tx preparation stage
> #
> # Warning: rte_ethdev_tx_prepare() can be safely disabled only if using a
> # driver which do not implement any Tx preparation.
> 
> 
> > +	uint16_t nb_seg_max;  /**< Max number of segments per whole packet.
> */
> > +	uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> 
> In another mail, you've added this explanation:
> * For non-TSO packet, a single transmit packet may span up to
> "nb_mtu_seg_max" buffers.
> * For TSO packet the total number of data descriptors is "nb_seg_max", and
> each segment within the TSO may span up to "nb_mtu_seg_max".
> 
> Maybe you can try to mix these comments to improve the doxygen.

Ok, I will.

Tomasz

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v15 0/8] add Tx preparation
  2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
                                               ` (7 preceding siblings ...)
  2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
@ 2017-01-04 19:41                             ` Thomas Monjalon
  2017-01-05 15:43                               ` Avi Kivity
  8 siblings, 1 reply; 261+ messages in thread
From: Thomas Monjalon @ 2017-01-04 19:41 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

2016-12-23 19:40, Tomasz Kulasek:
> v15 changes:
>  - marked rte_eth_tx_prepare api as experimental
>  - improved doxygen comments for nb_seg_max and nb_mtu_seg_max fields
>  - removed unused "uint8_t tx_prepare" declaration from testpmd

No you didn't remove this useless declaration. I did it for you.

This feature is now applied! Thanks and congratulations :)

^ permalink raw reply	[flat|nested] 261+ messages in thread

* Re: [dpdk-dev] [PATCH v15 0/8] add Tx preparation
  2017-01-04 19:41                             ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Thomas Monjalon
@ 2017-01-05 15:43                               ` Avi Kivity
  0 siblings, 0 replies; 261+ messages in thread
From: Avi Kivity @ 2017-01-05 15:43 UTC (permalink / raw)
  To: Thomas Monjalon, Tomasz Kulasek; +Cc: dev

On 01/04/2017 09:41 PM, Thomas Monjalon wrote:
> 2016-12-23 19:40, Tomasz Kulasek:
>> v15 changes:
>>   - marked rte_eth_tx_prepare api as experimental
>>   - improved doxygen comments for nb_seg_max and nb_mtu_seg_max fields
>>   - removed unused "uint8_t tx_prepare" declaration from testpmd
> No you didn't remove this useless declaration. I did it for you.
>
> This feature is now applied! Thanks and congratulations :)


Congrats and thanks!  This will allow us to remove some hacks from seastar.

^ permalink raw reply	[flat|nested] 261+ messages in thread

* [dpdk-dev] [PATCH 0/6] add Tx preparation
@ 2016-08-26 16:05 Tomasz Kulasek
  0 siblings, 0 replies; 261+ messages in thread
From: Tomasz Kulasek @ 2016-08-26 16:05 UTC (permalink / raw)
  To: dev

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (e.g. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like e.g. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1) Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2) Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

e.g.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* nb_prep indicates here first invalid packet. rte_eth_tx_prep
		 * can be used on remaining packets to find another ones.
		 */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */


Tomasz Kulasek (6):
  ethdev: add Tx preparation
  e1000: add Tx preparation
  fm10k: add Tx preparation
  i40e: add Tx preparation
  ixgbe: add Tx preparation
  testpmd: add txprep engine

 app/test-pmd/Makefile            |    3 +-
 app/test-pmd/testpmd.c           |    1 +
 app/test-pmd/testpmd.h           |    1 +
 app/test-pmd/txprep.c            |  412 ++++++++++++++++++++++++++++++++++++++
 drivers/net/e1000/e1000_ethdev.h |   11 +
 drivers/net/e1000/em_ethdev.c    |    5 +-
 drivers/net/e1000/em_rxtx.c      |   46 ++++-
 drivers/net/e1000/igb_ethdev.c   |    4 +
 drivers/net/e1000/igb_rxtx.c     |   50 ++++-
 drivers/net/fm10k/fm10k.h        |    9 +
 drivers/net/fm10k/fm10k_ethdev.c |    5 +
 drivers/net/fm10k/fm10k_rxtx.c   |   87 +++++++-
 drivers/net/i40e/i40e_ethdev.c   |    3 +
 drivers/net/i40e/i40e_rxtx.c     |   98 ++++++++-
 drivers/net/i40e/i40e_rxtx.h     |   10 +
 drivers/net/ixgbe/ixgbe_ethdev.c |    3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |    8 +-
 drivers/net/ixgbe/ixgbe_rxtx.c   |   83 +++++++-
 drivers/net/ixgbe/ixgbe_rxtx.h   |    2 +
 lib/librte_ether/rte_ethdev.h    |   74 +++++++
 lib/librte_mbuf/rte_mbuf.h       |    8 +
 lib/librte_net/Makefile          |    2 +-
 lib/librte_net/rte_pkt.h         |  132 ++++++++++++
 23 files changed, 1048 insertions(+), 9 deletions(-)
 create mode 100644 app/test-pmd/txprep.c
 create mode 100644 lib/librte_net/rte_pkt.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 261+ messages in thread

end of thread, other threads:[~2017-01-05 15:43 UTC | newest]

Thread overview: 261+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-26 16:22 [dpdk-dev] [PATCH 0/6] add Tx preparation Tomasz Kulasek
2016-08-26 16:22 ` [dpdk-dev] [PATCH 1/6] ethdev: " Tomasz Kulasek
2016-09-08  7:28   ` Jerin Jacob
2016-09-08 16:09     ` Kulasek, TomaszX
2016-09-09  5:58       ` Jerin Jacob
2016-08-26 16:22 ` [dpdk-dev] [PATCH 2/6] e1000: " Tomasz Kulasek
2016-08-26 16:22 ` [dpdk-dev] [PATCH 3/6] fm10k: " Tomasz Kulasek
2016-08-26 16:22 ` [dpdk-dev] [PATCH 4/6] i40e: " Tomasz Kulasek
2016-08-26 16:22 ` [dpdk-dev] [PATCH 5/6] ixgbe: " Tomasz Kulasek
2016-08-26 16:22 ` [dpdk-dev] [PATCH 6/6] testpmd: add txprep engine Tomasz Kulasek
2016-08-26 17:31 ` [dpdk-dev] [PATCH 0/6] add Tx preparation Stephen Hemminger
2016-08-31 12:34   ` Ananyev, Konstantin
2016-09-12 14:44 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 1/6] ethdev: " Tomasz Kulasek
2016-09-19 13:03     ` Ananyev, Konstantin
2016-09-19 15:29       ` Kulasek, TomaszX
2016-09-19 16:06         ` Jerin Jacob
2016-09-20  9:06           ` Ananyev, Konstantin
2016-09-21  8:29             ` Jerin Jacob
2016-09-22  9:36               ` Ananyev, Konstantin
2016-09-22  9:59                 ` Jerin Jacob
2016-09-23  9:41                   ` Ananyev, Konstantin
2016-09-23 10:29                     ` Jerin Jacob
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 2/6] e1000: " Tomasz Kulasek
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Tomasz Kulasek
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Tomasz Kulasek
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Tomasz Kulasek
2016-09-19 12:54     ` Ananyev, Konstantin
2016-09-19 13:58       ` Kulasek, TomaszX
2016-09-19 15:23         ` Ananyev, Konstantin
2016-09-20  7:15           ` Ananyev, Konstantin
2016-09-12 14:44   ` [dpdk-dev] [PATCH v2 6/6] testpmd: add txprep engine Tomasz Kulasek
2016-09-19 12:59     ` Ananyev, Konstantin
2016-09-28 11:10   ` [dpdk-dev] [PATCH v3 0/6] add Tx preparation Tomasz Kulasek
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: " Tomasz Kulasek
2016-09-29 10:40       ` Ananyev, Konstantin
2016-09-29 13:04         ` Kulasek, TomaszX
2016-09-29 13:57           ` Ananyev, Konstantin
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 2/6] e1000: " Tomasz Kulasek
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Tomasz Kulasek
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Tomasz Kulasek
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Tomasz Kulasek
2016-09-29 11:09       ` Ananyev, Konstantin
2016-09-29 15:12         ` Kulasek, TomaszX
2016-09-29 17:01           ` Ananyev, Konstantin
2016-09-28 11:10     ` [dpdk-dev] [PATCH v3 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-09-30  9:00     ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Tomasz Kulasek
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 1/6] ethdev: " Tomasz Kulasek
2016-10-10 14:08         ` Thomas Monjalon
2016-10-13  7:08           ` Thomas Monjalon
2016-10-13 10:47             ` Kulasek, TomaszX
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 2/6] e1000: " Tomasz Kulasek
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 3/6] fm10k: " Tomasz Kulasek
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 4/6] i40e: " Tomasz Kulasek
2016-10-10 14:02         ` Wu, Jingjing
2016-10-10 17:20           ` Kulasek, TomaszX
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 5/6] ixgbe: " Tomasz Kulasek
2016-09-30  9:00       ` [dpdk-dev] [PATCH v4 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-09-30  9:55       ` [dpdk-dev] [PATCH v4 0/6] add Tx preparation Ananyev, Konstantin
2016-10-13 17:36       ` [dpdk-dev] [PATCH v5 " Tomasz Kulasek
2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 1/6] ethdev: " Tomasz Kulasek
2016-10-13 19:21           ` Thomas Monjalon
2016-10-14 14:02             ` Kulasek, TomaszX
2016-10-14 14:20               ` Thomas Monjalon
2016-10-17 16:25                 ` Kulasek, TomaszX
2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 2/6] e1000: " Tomasz Kulasek
2016-10-13 17:36         ` [dpdk-dev] [PATCH v5 3/6] fm10k: " Tomasz Kulasek
2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 4/6] i40e: " Tomasz Kulasek
2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 5/6] ixgbe: " Tomasz Kulasek
2016-10-13 17:37         ` [dpdk-dev] [PATCH v5 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-14 15:05         ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Tomasz Kulasek
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 1/6] ethdev: " Tomasz Kulasek
2016-10-18 14:57             ` Olivier Matz
2016-10-19 15:42               ` Kulasek, TomaszX
2016-10-19 22:07                 ` Ananyev, Konstantin
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 2/6] e1000: " Tomasz Kulasek
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 3/6] fm10k: " Tomasz Kulasek
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 4/6] i40e: " Tomasz Kulasek
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 5/6] ixgbe: " Tomasz Kulasek
2016-10-14 15:05           ` [dpdk-dev] [PATCH v6 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-18 12:28           ` [dpdk-dev] [PATCH v6 0/6] add Tx preparation Ananyev, Konstantin
2016-10-21 13:42           ` [dpdk-dev] [PATCH v7 " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 1/6] ethdev: " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 2/6] e1000: " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 3/6] fm10k: " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 4/6] i40e: " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 5/6] ixgbe: " Tomasz Kulasek
2016-10-21 13:42             ` [dpdk-dev] [PATCH v7 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-21 14:46             ` [dpdk-dev] [PATCH v8 0/6] add Tx preparation Tomasz Kulasek
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 1/6] ethdev: " Tomasz Kulasek
2016-10-24 12:14                 ` Ananyev, Konstantin
2016-10-24 12:49                   ` Kulasek, TomaszX
2016-10-24 12:56                     ` Ananyev, Konstantin
2016-10-24 14:12                       ` Kulasek, TomaszX
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 2/6] e1000: " Tomasz Kulasek
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 3/6] fm10k: " Tomasz Kulasek
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 4/6] i40e: " Tomasz Kulasek
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 5/6] ixgbe: " Tomasz Kulasek
2016-10-21 14:46               ` [dpdk-dev] [PATCH v8 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-24 14:05               ` [dpdk-dev] [PATCH v9 0/6] add Tx preparation Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 1/6] ethdev: " Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 2/6] e1000: " Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 3/6] fm10k: " Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 4/6] i40e: " Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 5/6] ixgbe: " Tomasz Kulasek
2016-10-24 14:05                 ` [dpdk-dev] [PATCH v9 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-24 16:51                 ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Tomasz Kulasek
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 1/6] ethdev: " Tomasz Kulasek
2016-10-25 14:41                     ` Olivier Matz
2016-10-25 17:28                       ` Kulasek, TomaszX
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 2/6] e1000: " Tomasz Kulasek
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 3/6] fm10k: " Tomasz Kulasek
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 4/6] i40e: " Tomasz Kulasek
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 5/6] ixgbe: " Tomasz Kulasek
2016-10-24 16:51                   ` [dpdk-dev] [PATCH v10 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-10-24 17:26                   ` [dpdk-dev] [PATCH v10 0/6] add Tx preparation Ananyev, Konstantin
2016-10-26 12:56                   ` [dpdk-dev] [PATCH v11 " Tomasz Kulasek
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 1/6] ethdev: " Tomasz Kulasek
2016-10-27 12:38                       ` Olivier Matz
2016-10-27 15:01                       ` Thomas Monjalon
2016-10-27 15:52                         ` Ananyev, Konstantin
2016-10-27 16:02                           ` Thomas Monjalon
2016-10-27 16:24                             ` Ananyev, Konstantin
2016-10-27 16:39                               ` Thomas Monjalon
2016-10-28 11:29                                 ` Ananyev, Konstantin
2016-10-28 11:34                                   ` Ananyev, Konstantin
2016-10-28 12:23                                     ` Thomas Monjalon
2016-10-28 12:59                                       ` Ananyev, Konstantin
2016-10-28 13:42                                         ` Thomas Monjalon
2016-11-01 12:57                                           ` Ananyev, Konstantin
2016-11-04 11:35                                             ` Thomas Monjalon
2016-10-27 16:39                               ` Kulasek, TomaszX
2016-10-28 10:15                                 ` Ananyev, Konstantin
2016-10-28 10:22                                   ` Kulasek, TomaszX
2016-10-28 10:22                                   ` Thomas Monjalon
2016-10-28 10:28                                     ` Ananyev, Konstantin
2016-10-28 11:02                                       ` Richardson, Bruce
2016-10-28 11:14                                   ` Jerin Jacob
2016-10-27 16:29                         ` Kulasek, TomaszX
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 2/6] e1000: " Tomasz Kulasek
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 3/6] fm10k: " Tomasz Kulasek
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 4/6] i40e: " Tomasz Kulasek
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 5/6] ixgbe: " Tomasz Kulasek
2016-10-26 12:56                     ` [dpdk-dev] [PATCH v11 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-11-23 17:36                     ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Tomasz Kulasek
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 1/6] ethdev: " Tomasz Kulasek
2016-11-28 10:54                         ` Thomas Monjalon
2016-12-01 16:24                           ` Thomas Monjalon
2016-12-01 19:20                             ` Kulasek, TomaszX
2016-12-01 19:52                               ` Thomas Monjalon
2016-12-01 21:56                                 ` Jerin Jacob
2016-12-01 22:31                                 ` Kulasek, TomaszX
2016-12-01 23:50                                   ` Thomas Monjalon
2016-12-09 13:25                                     ` Kulasek, TomaszX
2016-12-02  0:10                                   ` Ananyev, Konstantin
2016-12-22 13:14                                     ` Thomas Monjalon
2016-12-22 13:37                                       ` Jerin Jacob
2016-12-01 16:26                         ` Thomas Monjalon
2016-12-01 16:28                         ` Thomas Monjalon
2016-12-02  1:06                           ` Ananyev, Konstantin
2016-12-02  8:24                             ` Olivier Matz
2016-12-02 16:17                               ` Ananyev, Konstantin
2016-12-08 17:24                                 ` Olivier Matz
2016-12-09 17:19                                   ` Kulasek, TomaszX
2016-12-12 11:51                                     ` Ananyev, Konstantin
2016-12-22 13:30                                       ` Thomas Monjalon
2016-12-22 14:11                                         ` Ananyev, Konstantin
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 2/6] e1000: " Tomasz Kulasek
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 3/6] fm10k: " Tomasz Kulasek
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 4/6] i40e: " Tomasz Kulasek
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 5/6] ixgbe: " Tomasz Kulasek
2016-11-23 17:36                       ` [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-12-07 11:13                         ` Ferruh Yigit
2016-12-07 12:00                           ` Mcnamara, John
2016-12-07 12:12                             ` Kulasek, TomaszX
2016-12-07 12:49                               ` Ananyev, Konstantin
2016-12-07 12:00                           ` Kulasek, TomaszX
2016-11-28 11:03                       ` [dpdk-dev] [PATCH v12 0/6] add Tx preparation Thomas Monjalon
2016-11-30  5:48                         ` John Daley (johndale)
2016-11-30 10:59                           ` Ananyev, Konstantin
2016-11-30  7:40                         ` Adrien Mazarguil
2016-11-30  8:50                           ` Thomas Monjalon
2016-11-30 10:30                             ` Kulasek, TomaszX
2016-12-01  7:19                               ` Adrien Mazarguil
2016-11-30 10:54                           ` Ananyev, Konstantin
2016-12-01  7:15                             ` Adrien Mazarguil
2016-12-01  8:58                               ` Thomas Monjalon
2016-12-01 22:03                                 ` Jerin Jacob
2016-12-02  1:00                               ` Ananyev, Konstantin
2016-12-05 15:03                                 ` Adrien Mazarguil
2016-12-05 16:43                                   ` Ananyev, Konstantin
2016-12-05 18:10                                     ` Adrien Mazarguil
2016-12-06 10:56                                       ` Ananyev, Konstantin
2016-12-06 13:59                                         ` Adrien Mazarguil
2016-12-06 20:31                                           ` Ananyev, Konstantin
2016-12-07 10:08                                             ` Adrien Mazarguil
2016-11-30 16:34                         ` Harish Patil
2016-11-30 17:42                           ` Ananyev, Konstantin
2016-11-30 18:26                             ` Thomas Monjalon
2016-11-30 21:01                               ` Jerin Jacob
2016-12-01 10:50                               ` Ferruh Yigit
2016-12-02 23:55                               ` Yong Wang
2016-12-04 12:11                                 ` Ananyev, Konstantin
2016-12-06 18:25                                   ` Yong Wang
2016-12-07  9:57                                     ` Ferruh Yigit
2016-12-07 10:03                                       ` Ananyev, Konstantin
2016-12-07 14:31                                         ` Alejandro Lucero
2016-12-08 18:20                                         ` Yong Wang
2016-12-09 14:40                                           ` Jan Mędala
2016-12-12 15:02                                             ` Ananyev, Konstantin
2016-12-16  0:15                                               ` Ananyev, Konstantin
2016-12-16 13:53                                                 ` Jan Mędala
2016-12-16 15:27                                                   ` Ananyev, Konstantin
2016-12-16 15:37                                                     ` Jan Mędala
2016-12-12 17:29                                           ` Ananyev, Konstantin
2016-11-30 18:39                             ` Harish Patil
2016-11-30 19:37                         ` Ajit Khaparde
2016-12-01  8:24                         ` Rahul Lakkireddy
2016-12-06 15:53                         ` Ferruh Yigit
2016-12-07  7:55                           ` Andrew Rybchenko
2016-12-07  8:11                           ` Yuanhan Liu
2016-12-07 10:13                             ` Ananyev, Konstantin
2016-12-07 10:18                               ` Yuanhan Liu
2016-12-07 10:22                                 ` Ananyev, Konstantin
2016-12-13 11:59                           ` Ferruh Yigit
2016-12-13 17:41                       ` [dpdk-dev] [PATCH v13 0/7] " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 1/7] ethdev: " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 2/7] e1000: " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 3/7] fm10k: " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 4/7] i40e: " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 5/7] ixgbe: " Tomasz Kulasek
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 6/7] vmxnet3: " Tomasz Kulasek
2016-12-13 18:15                           ` Yong Wang
2016-12-20 13:36                           ` Ferruh Yigit
2016-12-22 13:10                             ` Thomas Monjalon
2016-12-13 17:41                         ` [dpdk-dev] [PATCH v13 7/7] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-12-22 13:05                         ` [dpdk-dev] [PATCH v14 0/8] add Tx preparation Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 1/8] ethdev: " Tomasz Kulasek
2016-12-22 14:24                             ` Thomas Monjalon
2016-12-23 18:49                               ` Kulasek, TomaszX
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 2/8] e1000: " Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 3/8] fm10k: " Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 4/8] i40e: " Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 5/8] ixgbe: " Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 6/8] vmxnet3: " Tomasz Kulasek
2016-12-22 17:59                             ` Yong Wang
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 7/8] ena: " Tomasz Kulasek
2016-12-22 13:05                           ` [dpdk-dev] [PATCH v14 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2016-12-22 14:28                             ` Thomas Monjalon
2016-12-23 18:40                           ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 1/8] ethdev: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 2/8] e1000: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 3/8] fm10k: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 4/8] i40e: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 5/8] ixgbe: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 6/8] vmxnet3: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 7/8] ena: " Tomasz Kulasek
2016-12-23 18:40                             ` [dpdk-dev] [PATCH v15 8/8] testpmd: use Tx preparation in csum engine Tomasz Kulasek
2017-01-04 19:41                             ` [dpdk-dev] [PATCH v15 0/8] add Tx preparation Thomas Monjalon
2017-01-05 15:43                               ` Avi Kivity
  -- strict thread matches above, loose matches on Subject: below --
2016-08-26 16:05 [dpdk-dev] [PATCH 0/6] " Tomasz Kulasek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).