DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev
@ 2016-01-15 14:43 Tomasz Kulasek
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-01-15 14:43 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being inside the core ethdev API.
The new APIs in the ethdev library are:
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset also replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

Tomasz Kulasek (2):
  ethdev: add buffered tx api
  examples: sample apps rework to use buffered tx api

 config/common_bsdapp                               |    1 +
 config/common_linuxapp                             |    1 +
 examples/l2fwd-jobstats/main.c                     |   73 ++-----
 examples/l2fwd-keepalive/main.c                    |   79 ++------
 examples/l2fwd/main.c                              |   80 ++------
 examples/l3fwd-acl/main.c                          |   64 +-----
 examples/l3fwd-power/main.c                        |   63 +-----
 examples/link_status_interrupt/main.c              |   83 ++------
 .../client_server_mp/mp_client/client.c            |   77 +++----
 examples/multi_process/l2fwd_fork/main.c           |   81 ++------
 examples/packet_ordering/main.c                    |   62 +++---
 examples/qos_meter/main.c                          |   46 +----
 lib/librte_ether/rte_ethdev.c                      |   63 +++++-
 lib/librte_ether/rte_ethdev.h                      |  211 +++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map             |    8 +
 15 files changed, 445 insertions(+), 547 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-01-15 14:43 [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Tomasz Kulasek
@ 2016-01-15 14:43 ` Tomasz Kulasek
  2016-01-15 18:13   ` Stephen Hemminger
                     ` (2 more replies)
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 2/2] examples: sample apps rework to use " Tomasz Kulasek
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-01-15 14:43 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being inside the core ethdev API.
The new APIs in the ethdev library are:
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 config/common_bsdapp                   |    1 +
 config/common_linuxapp                 |    1 +
 lib/librte_ether/rte_ethdev.c          |   63 +++++++++-
 lib/librte_ether/rte_ethdev.h          |  211 +++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map |    8 ++
 5 files changed, 279 insertions(+), 5 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index ed7c31c..8a2e4c5 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -148,6 +148,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_BUFSIZE=32
 
 #
 # Support NIC bypass logic
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..6229cab 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -146,6 +146,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_BUFSIZE=32
 
 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..27dac1b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -826,11 +826,42 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id)
 
 }
 
+void
+rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata)
+{
+	unsigned long *count = userdata;
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+
+	*count += unsent;
+}
+
+int
+rte_eth_tx_buffer_set_err_callback(uint8_t port_id, uint16_t queue_id,
+		buffer_tx_error_fn cbfn, void *userdata)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!rte_eth_dev_is_valid_port(port_id) ||
+			queue_id >= dev->data->nb_tx_queues) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	dev->tx_buf_err_cb[queue_id].userdata = userdata;
+	dev->tx_buf_err_cb[queue_id].flush_cb = cbfn;
+	return 0;
+}
+
 static int
 rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
 	uint16_t old_nb_queues = dev->data->nb_tx_queues;
 	void **txq;
+	struct rte_eth_dev_tx_buffer *new_bufs;
 	unsigned i;
 
 	if (dev->data->tx_queues == NULL) { /* first time configuration */
@@ -841,17 +872,40 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 			dev->data->nb_tx_queues = 0;
 			return -(ENOMEM);
 		}
+
+		dev->data->txq_bufs = rte_zmalloc("ethdev->txq_bufs",
+				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
+		if (dev->data->txq_bufs == NULL) {
+			dev->data->nb_tx_queues = 0;
+			rte_free(dev->data->tx_queues);
+			return -(ENOMEM);
+		}
+
 	} else { /* re-configure */
+
+		/* flush the packets queued for all queues*/
+		for (i = 0; i < old_nb_queues; i++)
+			rte_eth_tx_buffer_flush(dev->data->port_id, i);
+
 		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP);
 
+		/* get new buffer space first, but keep old space around */
+		new_bufs = rte_zmalloc("ethdev->txq_bufs",
+				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
+		if (new_bufs == NULL)
+			return -(ENOMEM);
+
 		txq = dev->data->tx_queues;
 
 		for (i = nb_queues; i < old_nb_queues; i++)
 			(*dev->dev_ops->tx_queue_release)(txq[i]);
 		txq = rte_realloc(txq, sizeof(txq[0]) * nb_queues,
 				  RTE_CACHE_LINE_SIZE);
-		if (txq == NULL)
-			return -ENOMEM;
+		if (txq == NULL) {
+			rte_free(new_bufs);
+			return -(ENOMEM);
+		}
+
 		if (nb_queues > old_nb_queues) {
 			uint16_t new_qs = nb_queues - old_nb_queues;
 
@@ -861,6 +915,9 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 
 		dev->data->tx_queues = txq;
 
+		/* now replace old buffers with new */
+		rte_free(dev->data->txq_bufs);
+		dev->data->txq_bufs = new_bufs;
 	}
 	dev->data->nb_tx_queues = nb_queues;
 	return 0;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..23faa6a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -182,6 +182,7 @@ extern "C" {
 #include <rte_pci.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
+#include <rte_branch_prediction.h>
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -1519,6 +1520,34 @@ enum rte_eth_dev_type {
 	RTE_ETH_DEV_MAX		/**< max value of this enum */
 };
 
+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata);
+
+/**
+ * @internal
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
+	unsigned nb_pkts;
+	uint64_t errors;
+	/**< Total number of queue packets to sent that are dropped. */
+};
+
+/**
+ * @internal
+ * Structure to hold a callback to be used on error when a tx_buffer_flush
+ * call fails to send all packets.
+ * This needs to be a separate structure, as it must go in the ethdev structure
+ * rather than ethdev_data, due to the use of a function pointer, which is not
+ * multi-process safe.
+ */
+struct rte_eth_dev_tx_buffer_err_cb {
+	buffer_tx_error_fn flush_cb; /* callback for when tx_burst fails */
+	void *userdata;              /* userdata for callback */
+};
+
 /**
  * @internal
  * The generic data structure associated with each ethernet device.
@@ -1550,6 +1579,9 @@ struct rte_eth_dev {
 	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	uint8_t attached; /**< Flag indicating the port is attached */
 	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
+
+	/** Callbacks to be used on a tx_buffer_flush error */
+	struct rte_eth_dev_tx_buffer_err_cb tx_buf_err_cb[RTE_MAX_QUEUES_PER_PORT];
 };
 
 struct rte_eth_dev_sriov {
@@ -1610,6 +1642,8 @@ struct rte_eth_dev_data {
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough */
 	int numa_node;  /**< NUMA node connection */
 	const char *drv_name;   /**< Driver name */
+	struct rte_eth_dev_tx_buffer *txq_bufs;
+	/**< space to allow buffered transmits */
 };
 
 /** Device supports hotplug detach */
@@ -2661,8 +2695,181 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
- * The eth device event type for interrupt, and maybe others in the future.
+ * Buffer a single packet for future transmission on a port and queue
+ *
+ * This function takes a single mbuf/packet and buffers it for later
+ * transmission on the particular port and queue specified. Once the buffer is
+ * full of packets, an attempt will be made to transmit all the buffered
+ * packets. In case of error, where not all packets can be transmitted, a
+ * callback is called with the unsent packets as a parameter. If no callback
+ * is explicitly set up, the unsent packets are just freed back to the owning
+ * mempool. The function returns the number of packets actually sent i.e.
+ * 0 if no buffer flush occurred, otherwise the number of packets successfully
+ * flushed
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkt
+ *   Pointer to the packet mbuf to be sent.
+ * @return
+ *   0 = packet has been buffered for later transmission
+ *   N > 0 = packet has been buffered, and the buffer was subsequently flushed,
+ *     causing N packets to be sent, and the error callback to be called for
+ *     the rest.
+ */
+static inline uint16_t __attribute__((always_inline))
+rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id, struct rte_mbuf *tx_pkt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	struct rte_eth_dev_tx_buffer *qbuf = &dev->data->txq_bufs[queue_id];
+	uint16_t i;
+
+	qbuf->pkts[qbuf->nb_pkts++] = tx_pkt;
+	if (qbuf->nb_pkts < RTE_ETHDEV_TX_BUFSIZE)
+		return 0;
+
+	const uint16_t sent = rte_eth_tx_burst(port_id, queue_id, qbuf->pkts,
+			RTE_ETHDEV_TX_BUFSIZE);
+
+	qbuf->nb_pkts = 0;
+
+	/* All packets sent, or to be dealt with by callback below */
+	if (unlikely(sent != RTE_ETHDEV_TX_BUFSIZE)) {
+		if (dev->tx_buf_err_cb[queue_id].flush_cb)
+			dev->tx_buf_err_cb[queue_id].flush_cb(&qbuf->pkts[sent],
+					RTE_ETHDEV_TX_BUFSIZE - sent,
+					dev->tx_buf_err_cb[queue_id].userdata);
+		else {
+			qbuf->errors += RTE_ETHDEV_TX_BUFSIZE - sent;
+			for (i = sent; i < RTE_ETHDEV_TX_BUFSIZE; i++)
+				rte_pktmbuf_free(qbuf->pkts[i]);
+		}
+	}
+
+	return sent;
+}
+
+/**
+ * Send any packets queued up for transmission on a port and HW queue
+ *
+ * This causes an explicit flush of packets previously buffered via the
+ * rte_eth_tx_buffer() function. It returns the number of packets successfully
+ * sent to the NIC, and calls the error callback for any unsent packets. Unless
+ * explicitly set up otherwise, the default callback simply frees the unsent
+ * packets back to the owning mempool.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   The number of packets successfully sent to the Ethernet device. The error
+ *   callback is called for any packets which could not be sent.
+ */
+static inline uint16_t
+rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id)
+{
+	uint16_t i;
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	struct rte_eth_dev_tx_buffer *qbuf = &dev->data->txq_bufs[queue_id];
+
+	if (qbuf->nb_pkts == 0)
+		return 0;
+
+	const uint16_t to_send = qbuf->nb_pkts;
+
+	const uint16_t sent = rte_eth_tx_burst(port_id, queue_id, qbuf->pkts,
+			to_send);
+
+	qbuf->nb_pkts = 0;
+
+	/* All packets sent, or to be dealt with by callback below */
+	if (unlikely(sent != to_send)) {
+		if (dev->tx_buf_err_cb[queue_id].flush_cb)
+			dev->tx_buf_err_cb[queue_id].flush_cb(&qbuf->pkts[sent],
+					to_send - sent,
+					dev->tx_buf_err_cb[queue_id].userdata);
+		else {
+			qbuf->errors += to_send - sent;
+			for (i = sent; i < to_send; i++)
+				rte_pktmbuf_free(qbuf->pkts[i]);
+		}
+	}
+
+	return sent;
+}
+
+/**
+ * Configure a callback for buffered packets which cannot be sent
+ *
+ * Register a specific callback to be called when an attempt is made to send
+ * all packets buffered on an ethernet port, but not all packets can
+ * successfully be sent. The callback registered here will be called only
+ * from calls to rte_eth_tx_buffer() and rte_eth_tx_buffer_flush() APIs.
+ * The default callback configured for each queue by default just frees the
+ * packets back to the calling mempool. If additional behaviour is required,
+ * for example, to count dropped packets, or to retry transmission of packets
+ * which cannot be sent, this function should be used to register a suitable
+ * callback function to implement the desired behaviour.
+ * The example callback "rte_eth_count_unsent_packet_callback()" is also
+ * provided as reference.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param cbfn
+ *   The function to be used as the callback.
+ * @param userdata
+ *   Arbitrary parameter to be passed to the callback function
+ * @return
+ *   0 on success, or -1 on error with rte_errno set appropriately
  */
+int
+rte_eth_tx_buffer_set_err_callback(uint8_t port_id, uint16_t queue_id,
+		buffer_tx_error_fn cbfn, void *userdata);
+
+/**
+ * Callback function for tracking unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behaviour when buffered packets cannot be sent. This
+ * function drops any unsent packets, but also updates a user-supplied counter
+ * to track the overall number of packets dropped. The counter should be an
+ * unsigned long variable.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an unsigned long value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Pointer to an unsigned long value, which will be incremented by unsent
+ */
+void
+rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata);
+
+/**
+* The eth device event type for interrupt, and maybe others in the future.
+*/
 enum rte_eth_event_type {
 	RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
 	RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d8db24d..c2019d6 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -117,3 +117,11 @@ DPDK_2.2 {
 
 	local: *;
 };
+
+DPDK_2.3 {
+	global:
+
+	rte_eth_count_unsent_packet_callback;
+	rte_eth_tx_buffer_set_err_callback;
+
+} DPDK_2.2;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH 2/2] examples: sample apps rework to use buffered tx api
  2016-01-15 14:43 [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Tomasz Kulasek
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-01-15 14:43 ` Tomasz Kulasek
  2016-01-15 18:12 ` [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Stephen Hemminger
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  3 siblings, 0 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-01-15 14:43 UTC (permalink / raw)
  To: dev

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 examples/l2fwd-jobstats/main.c                     |   73 +++++------------
 examples/l2fwd-keepalive/main.c                    |   79 ++++---------------
 examples/l2fwd/main.c                              |   80 ++++---------------
 examples/l3fwd-acl/main.c                          |   64 ++-------------
 examples/l3fwd-power/main.c                        |   63 ++-------------
 examples/link_status_interrupt/main.c              |   83 ++++----------------
 .../client_server_mp/mp_client/client.c            |   77 ++++++++----------
 examples/multi_process/l2fwd_fork/main.c           |   81 ++++---------------
 examples/packet_ordering/main.c                    |   62 +++++++--------
 examples/qos_meter/main.c                          |   46 ++---------
 10 files changed, 166 insertions(+), 542 deletions(-)

diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 7b59f4e..9a6e6ea 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -99,8 +99,6 @@ static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
 struct mbuf_table {
 	uint64_t next_flush_time;
-	unsigned len;
-	struct rte_mbuf *mbufs[MAX_PKT_BURST];
 };
 
 #define MAX_RX_QUEUE_PER_LCORE 16
@@ -373,58 +371,12 @@ show_stats_cb(__rte_unused void *param)
 	rte_eal_alarm_set(timer_period * US_PER_S, show_stats_cb, NULL);
 }
 
-/* Send the burst of packets on an output interface */
-static void
-l2fwd_send_burst(struct lcore_queue_conf *qconf, uint8_t port)
-{
-	struct mbuf_table *m_table;
-	uint16_t ret;
-	uint16_t queueid = 0;
-	uint16_t n;
-
-	m_table = &qconf->tx_mbufs[port];
-	n = m_table->len;
-
-	m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
-	m_table->len = 0;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table->mbufs, n);
-
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table->mbufs[ret]);
-		} while (++ret < n);
-	}
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	const unsigned lcore_id = rte_lcore_id();
-	struct lcore_queue_conf *qconf = &lcore_queue_conf[lcore_id];
-	struct mbuf_table *m_table = &qconf->tx_mbufs[port];
-	uint16_t len = qconf->tx_mbufs[port].len;
-
-	m_table->mbufs[len] = m;
-
-	len++;
-	m_table->len = len;
-
-	/* Enough pkts to be sent. */
-	if (unlikely(len == MAX_PKT_BURST))
-		l2fwd_send_burst(qconf, port);
-
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
 
 	dst_port = l2fwd_dst_ports[portid];
@@ -437,7 +389,9 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	sent = rte_eth_tx_buffer(dst_port, 0, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 static void
@@ -513,6 +467,8 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	struct lcore_queue_conf *qconf;
 	struct mbuf_table *m_table;
 	uint8_t portid;
+	unsigned i;
+	uint32_t sent;
 
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
@@ -522,12 +478,19 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	now = rte_get_timer_cycles();
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
-	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-		m_table = &qconf->tx_mbufs[portid];
-		if (m_table->len == 0 || m_table->next_flush_time <= now)
+
+	for (i = 0; i < qconf->n_rx_port; i++) {
+		m_table = &qconf->tx_mbufs[i];
+
+		if (m_table->next_flush_time <= now)
 			continue;
+		m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
 
-		l2fwd_send_burst(qconf, portid);
+		portid = qconf->rx_port_list[i];
+		portid = l2fwd_dst_ports[portid];
+		sent = rte_eth_tx_buffer_flush(portid, 0);
+		if (sent)
+			port_statistics[portid].tx += sent;
 	}
 
 
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index f4d52f2..b59ff6d 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -97,17 +97,11 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
 
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
@@ -132,7 +126,7 @@ struct rte_mempool *l2fwd_pktmbuf_pool = NULL;
 struct l2fwd_port_statistics {
 	uint64_t tx;
 	uint64_t rx;
-	uint64_t dropped;
+	unsigned long dropped;
 } __rte_cache_aligned;
 struct l2fwd_port_statistics port_statistics[RTE_MAX_ETHPORTS];
 
@@ -192,57 +186,12 @@ print_stats(__attribute__((unused)) struct rte_timer *ptr_timer,
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid = 0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
 
 	dst_port = l2fwd_dst_ports[portid];
@@ -255,7 +204,9 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	sent = rte_eth_tx_buffer(dst_port, 0, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -265,6 +216,7 @@ l2fwd_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	int sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
@@ -312,13 +264,12 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+				portid = qconf->rx_port_list[i];
+				portid = l2fwd_dst_ports[portid];
+				sent = rte_eth_tx_buffer_flush(portid, 0);
+				if (sent)
+					port_statistics[portid].tx += sent;
 			}
 
 			prev_tsc = cur_tsc;
@@ -713,6 +664,10 @@ main(int argc, char **argv)
 				"rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		rte_eth_tx_buffer_set_err_callback(portid, 0,
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..e6dce27 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -95,17 +95,11 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
 
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
@@ -130,7 +124,7 @@ struct rte_mempool * l2fwd_pktmbuf_pool = NULL;
 struct l2fwd_port_statistics {
 	uint64_t tx;
 	uint64_t rx;
-	uint64_t dropped;
+	unsigned long dropped;
 } __rte_cache_aligned;
 struct l2fwd_port_statistics port_statistics[RTE_MAX_ETHPORTS];
 
@@ -185,57 +179,12 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
 
 	dst_port = l2fwd_dst_ports[portid];
@@ -248,7 +197,9 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	sent = rte_eth_tx_buffer(dst_port, 0, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -258,6 +209,7 @@ l2fwd_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	int sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
@@ -277,7 +229,6 @@ l2fwd_main_loop(void)
 	RTE_LOG(INFO, L2FWD, "entering main loop on lcore %u\n", lcore_id);
 
 	for (i = 0; i < qconf->n_rx_port; i++) {
-
 		portid = qconf->rx_port_list[i];
 		RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
 			portid);
@@ -293,13 +244,12 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+				portid = qconf->rx_port_list[i];
+				portid = l2fwd_dst_ports[portid];
+				sent = rte_eth_tx_buffer_flush(portid, 0);
+				if (sent)
+					port_statistics[portid].tx += sent;
 			}
 
 			/* if timer is enabled */
@@ -666,6 +616,10 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		rte_eth_tx_buffer_set_err_callback(portid, 0,
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f676d14..810cdac 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -119,11 +119,6 @@ static uint32_t enabled_port_mask;
 static int promiscuous_on; /**< Ports set in promiscuous mode off by default. */
 static int numa_on = 1; /**< NUMA is enabled by default. */
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -187,7 +182,7 @@ static struct rte_mempool *pktmbuf_pool[NB_SOCKETS];
 static inline int
 is_valid_ipv4_pkt(struct ipv4_hdr *pkt, uint32_t link_len);
 #endif
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port);
 
 #define MAX_ACL_RULE_NUM	100000
@@ -1292,55 +1287,17 @@ struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
 } __rte_cache_aligned;
 
 static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
-	uint32_t lcore_id;
-	uint16_t len;
-	struct lcore_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
+	uint16_t q = lcore_conf[rte_lcore_id()].tx_queue_id[port];
 
-	qconf->tx_mbufs[port].len = len;
-	return 0;
+	rte_eth_tx_buffer(port, q, m);
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1433,14 +1390,9 @@ main_loop(__attribute__((unused)) void *dummy)
 			 * This could be optimized (use queueid instead of
 			 * portid), but it is not called so often
 			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
-			}
+			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid]);
 
 			prev_tsc = cur_tsc;
 		}
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 828c18a..6f32242 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -173,11 +173,6 @@ enum freq_scale_hint_t
 	FREQ_HIGHEST  =       2
 };
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -348,7 +343,6 @@ struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
 	lookup_struct_t * ipv4_lookup_struct;
 	lookup_struct_t * ipv6_lookup_struct;
 } __rte_cache_aligned;
@@ -442,50 +436,12 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim,
 	stats[lcore_id].sleep_time = 0;
 }
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue a single packet, and send burst if queue is filled */
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
-	uint32_t lcore_id;
-	uint16_t len;
-	struct lcore_conf *qconf;
-
-	lcore_id = rte_lcore_id();
+	uint16_t q = lcore_conf[rte_lcore_id()].tx_queue_id[port];
 
-	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
+	rte_eth_tx_buffer(port, q, m);
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -910,14 +866,9 @@ main_loop(__attribute__((unused)) void *dummy)
 			 * This could be optimized (use queueid instead of
 			 * portid), but it is not called so often
 			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
-			}
+			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid]);
 
 			prev_tsc = cur_tsc;
 		}
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index c57a08a..ec51cbe 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -96,19 +96,12 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 /* destination port for L2 forwarding */
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
-#define MAX_PKT_BURST 32
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
 	unsigned tx_queue_id;
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
 
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
@@ -136,7 +129,7 @@ struct rte_mempool * lsi_pktmbuf_pool = NULL;
 struct lsi_port_statistics {
 	uint64_t tx;
 	uint64_t rx;
-	uint64_t dropped;
+	unsigned long dropped;
 } __rte_cache_aligned;
 struct lsi_port_statistics port_statistics[RTE_MAX_ETHPORTS];
 
@@ -202,58 +195,12 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the packet on an output interface */
-static int
-lsi_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid;
-
-	queueid = (uint16_t) qconf->tx_queue_id;
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-lsi_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		lsi_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	unsigned sent;
 	unsigned dst_port = lsi_dst_ports[portid];
 
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -265,7 +212,9 @@ lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&lsi_ports_eth_addr[dst_port], &eth->s_addr);
 
-	lsi_send_packet(m, (uint8_t) dst_port);
+	sent = rte_eth_tx_buffer(dst_port, 0, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -275,6 +224,7 @@ lsi_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	unsigned sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
@@ -310,15 +260,12 @@ lsi_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			/* this could be optimized (use queueid instead of
-			 * portid), but it is not called so often */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				lsi_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+				portid = qconf->rx_port_list[i];
+				portid = lsi_dst_ports[portid];
+				sent = rte_eth_tx_buffer_flush(portid, 0);
+				if (sent)
+					port_statistics[portid].tx += sent;
 			}
 
 			/* if timer is enabled */
@@ -700,6 +647,10 @@ main(int argc, char **argv)
 		rte_eth_dev_callback_register(portid,
 			RTE_ETH_EVENT_INTR_LSC, lsi_event_callback, NULL);
 
+		rte_eth_tx_buffer_set_err_callback(portid, 0,
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+
 		rte_eth_macaddr_get(portid,
 				    &lsi_ports_eth_addr[portid]);
 
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index bf049a4..2321550 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -72,22 +72,12 @@
  * queue to write to. */
 static uint8_t client_id = 0;
 
-struct mbuf_queue {
-#define MBQ_CAPACITY 32
-	struct rte_mbuf *bufs[MBQ_CAPACITY];
-	uint16_t top;
-};
-
 /* maps input ports to output ports for packets */
 static uint8_t output_ports[RTE_MAX_ETHPORTS];
 
-/* buffers up a set of packet that are ready to send */
-static struct mbuf_queue output_bufs[RTE_MAX_ETHPORTS];
-
 /* shared data from server. We update statistics here */
 static volatile struct tx_stats *tx_stats;
 
-
 /*
  * print a usage message
  */
@@ -149,6 +139,23 @@ parse_app_args(int argc, char *argv[])
 }
 
 /*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata) {
+	int i;
+	uint8_t port = (uintptr_t)userdata;
+
+	tx_stats->tx_drop[port] += count;
+
+	/* free the mbufs which failed from transmit */
+	for (i = 0; i < count; i++)
+		rte_pktmbuf_free(unsent[i]);
+
+}
+
+/*
  * set up output ports so that all traffic on port gets sent out
  * its paired port. Index using actual port numbers since that is
  * what comes in the mbuf structure.
@@ -164,41 +171,14 @@ static void configure_output_ports(const struct port_info *ports)
 		uint8_t p2 = ports->id[i+1];
 		output_ports[p1] = p2;
 		output_ports[p2] = p1;
-	}
-}
-
 
-static inline void
-send_packets(uint8_t port)
-{
-	uint16_t i, sent;
-	struct mbuf_queue *mbq = &output_bufs[port];
+		rte_eth_tx_buffer_set_err_callback(p1, client_id,
+				flush_tx_error_callback, (void *)(intptr_t)p1);
 
-	if (unlikely(mbq->top == 0))
-		return;
+		rte_eth_tx_buffer_set_err_callback(p2, client_id,
+				flush_tx_error_callback, (void *)(intptr_t)p2);
 
-	sent = rte_eth_tx_burst(port, client_id, mbq->bufs, mbq->top);
-	if (unlikely(sent < mbq->top)){
-		for (i = sent; i < mbq->top; i++)
-			rte_pktmbuf_free(mbq->bufs[i]);
-		tx_stats->tx_drop[port] += (mbq->top - sent);
 	}
-	tx_stats->tx[port] += sent;
-	mbq->top = 0;
-}
-
-/*
- * Enqueue a packet to be sent on a particular port, but
- * don't send it yet. Only when the buffer is full.
- */
-static inline void
-enqueue_packet(struct rte_mbuf *buf, uint8_t port)
-{
-	struct mbuf_queue *mbq = &output_bufs[port];
-	mbq->bufs[mbq->top++] = buf;
-
-	if (mbq->top == MBQ_CAPACITY)
-		send_packets(port);
 }
 
 /*
@@ -209,10 +189,13 @@ enqueue_packet(struct rte_mbuf *buf, uint8_t port)
 static void
 handle_packet(struct rte_mbuf *buf)
 {
+	unsigned sent;
 	const uint8_t in_port = buf->port;
 	const uint8_t out_port = output_ports[in_port];
 
-	enqueue_packet(buf, out_port);
+	sent = rte_eth_tx_buffer(out_port, client_id, buf);
+	if (unlikely(sent))
+		tx_stats->tx[out_port] += sent;
 }
 
 /*
@@ -229,6 +212,7 @@ main(int argc, char *argv[])
 	int need_flush = 0; /* indicates whether we have unsent packets */
 	int retval;
 	void *pkts[PKT_READ_SIZE];
+	uint16_t sent;
 
 	if ((retval = rte_eal_init(argc, argv)) < 0)
 		return -1;
@@ -274,8 +258,11 @@ main(int argc, char *argv[])
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
-				for (port = 0; port < ports->num_ports; port++)
-					send_packets(ports->id[port]);
+				for (port = 0; port < ports->num_ports; port++) {
+					sent = rte_eth_tx_buffer_flush(ports->id[port], client_id);
+					if (unlikely(sent))
+						tx_stats->tx[port] += sent;
+				}
 			need_flush = 0;
 			continue;
 		}
diff --git a/examples/multi_process/l2fwd_fork/main.c b/examples/multi_process/l2fwd_fork/main.c
index f2d7eab..f919e07 100644
--- a/examples/multi_process/l2fwd_fork/main.c
+++ b/examples/multi_process/l2fwd_fork/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -117,18 +117,11 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
@@ -176,7 +169,7 @@ static struct rte_mempool * l2fwd_pktmbuf_pool[RTE_MAX_ETHPORTS];
 struct l2fwd_port_statistics {
 	uint64_t tx;
 	uint64_t rx;
-	uint64_t dropped;
+	unsigned long dropped;
 } __rte_cache_aligned;
 struct l2fwd_port_statistics *port_statistics;
 /**
@@ -583,57 +576,12 @@ slave_exit_cb(unsigned slaveid, __attribute__((unused))int stat)
 	rte_spinlock_unlock(&res_lock);
 }
 
-/* Send the packet on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	unsigned sent;
 	unsigned dst_port;
 
 	dst_port = l2fwd_dst_ports[portid];
@@ -646,7 +594,9 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	sent = rte_eth_tx_buffer(dst_port, 0, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -656,6 +606,7 @@ l2fwd_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	unsigned sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
@@ -698,14 +649,12 @@ l2fwd_main_loop(void)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+				portid = qconf->rx_port_list[i];
+				portid = l2fwd_dst_ports[portid];
+				sent = rte_eth_tx_buffer_flush(portid, 0);
+				if (sent)
+					port_statistics[portid].tx += sent;
 			}
 		}
 
@@ -1144,6 +1093,10 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		rte_eth_tx_buffer_set_err_callback(portid, 0,
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 1d9a86f..a11d68e 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -86,11 +86,6 @@ struct send_thread_args {
 	struct rte_reorder_buffer *buffer;
 };
 
-struct output_buffer {
-	unsigned count;
-	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-};
-
 volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
@@ -235,6 +230,20 @@ parse_args(int argc, char **argv)
 	return 0;
 }
 
+/*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata __rte_unused) {
+
+	/* free the mbufs which failed from transmit */
+	app_stats.tx.ro_tx_failed_pkts += count;
+	LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
+	pktmbuf_free_bulk(unsent, count);
+
+}
+
 static inline int
 configure_eth_port(uint8_t port_id)
 {
@@ -266,6 +275,9 @@ configure_eth_port(uint8_t port_id)
 			return ret;
 	}
 
+	rte_eth_tx_buffer_set_err_callback(port_id, 0, flush_tx_error_callback,
+			NULL);
+
 	ret = rte_eth_dev_start(port_id);
 	if (ret < 0)
 		return ret;
@@ -438,22 +450,6 @@ worker_thread(void *args_ptr)
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.ro_tx_pkts += nb_tx;
-
-	if (unlikely(nb_tx < outbuf->count)) {
-		/* free the mbufs which failed from transmit */
-		app_stats.tx.ro_tx_failed_pkts += (outbuf->count - nb_tx);
-		LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
-		pktmbuf_free_bulk(&outbuf->mbufs[nb_tx], outbuf->count - nb_tx);
-	}
-	outbuf->count = 0;
-}
-
 /**
  * Dequeue mbufs from the workers_to_tx ring and reorder them before
  * transmitting.
@@ -464,8 +460,8 @@ send_thread(struct send_thread_args *args)
 	int ret;
 	unsigned int i, dret;
 	uint16_t nb_dq_mbufs;
+	uint16_t sent;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
 	struct rte_mbuf *rombufs[MAX_PKTS_BURST] = {NULL};
 
@@ -515,7 +511,6 @@ send_thread(struct send_thread_args *args)
 		dret = rte_reorder_drain(args->buffer, rombufs, MAX_PKTS_BURST);
 		for (i = 0; i < dret; i++) {
 
-			struct output_buffer *outbuf;
 			uint8_t outp1;
 
 			outp1 = rombufs[i]->port;
@@ -525,10 +520,10 @@ send_thread(struct send_thread_args *args)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp1];
-			outbuf->mbufs[outbuf->count++] = rombufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp1);
+			sent = rte_eth_tx_buffer(outp1, 0, rombufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
+
 		}
 	}
 	return 0;
@@ -541,10 +536,9 @@ static int
 tx_thread(struct rte_ring *ring_in)
 {
 	uint32_t i, dqnum;
+	uint16_t sent;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-	struct output_buffer *outbuf;
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__,
 							rte_lcore_id());
@@ -567,10 +561,10 @@ tx_thread(struct rte_ring *ring_in)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp];
-			outbuf->mbufs[outbuf->count++] = mbufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp);
+			sent = rte_eth_tx_buffer(outp, 0, mbufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
+
 		}
 	}
 
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index 0de5e7f..7d901d2 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -118,8 +118,6 @@ static struct rte_eth_conf port_conf = {
 static uint8_t port_rx;
 static uint8_t port_tx;
 static struct rte_mbuf *pkts_rx[PKT_RX_BURST_MAX];
-static struct rte_mbuf *pkts_tx[PKT_TX_BURST_MAX];
-static uint16_t pkts_tx_len = 0;
 
 
 struct rte_meter_srtcm_params app_srtcm_params[] = {
@@ -188,26 +186,9 @@ main_loop(__attribute__((unused)) void *dummy)
 		current_time = rte_rdtsc();
 		time_diff = current_time - last_time;
 		if (unlikely(time_diff > TIME_TX_DRAIN)) {
-			int ret;
 
-			if (pkts_tx_len == 0) {
-				last_time = current_time;
-
-				continue;
-			}
-
-			/* Write packet burst to NIC TX */
-			ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, pkts_tx_len);
-
-			/* Free buffers for any packets not written successfully */
-			if (unlikely(ret < pkts_tx_len)) {
-				for ( ; ret < pkts_tx_len; ret ++) {
-					rte_pktmbuf_free(pkts_tx[ret]);
-				}
-			}
-
-			/* Empty the output buffer */
-			pkts_tx_len = 0;
+			/* Flush tx buffer */
+			rte_eth_tx_buffer_flush(port_tx, NIC_TX_QUEUE);
 
 			last_time = current_time;
 		}
@@ -222,26 +203,9 @@ main_loop(__attribute__((unused)) void *dummy)
 			/* Handle current packet */
 			if (app_pkt_handle(pkt, current_time) == DROP)
 				rte_pktmbuf_free(pkt);
-			else {
-				pkts_tx[pkts_tx_len] = pkt;
-				pkts_tx_len ++;
-			}
-
-			/* Write packets from output buffer to NIC TX when full burst is available */
-			if (unlikely(pkts_tx_len == PKT_TX_BURST_MAX)) {
-				/* Write packet burst to NIC TX */
-				int ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, PKT_TX_BURST_MAX);
+			else
+				rte_eth_tx_buffer(port_tx, NIC_TX_QUEUE, pkt);
 
-				/* Free buffers for any packets not written successfully */
-				if (unlikely(ret < PKT_TX_BURST_MAX)) {
-					for ( ; ret < PKT_TX_BURST_MAX; ret ++) {
-						rte_pktmbuf_free(pkts_tx[ret]);
-					}
-				}
-
-				/* Empty the output buffer */
-				pkts_tx_len = 0;
-			}
 		}
 	}
 }
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev
  2016-01-15 14:43 [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Tomasz Kulasek
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 2/2] examples: sample apps rework to use " Tomasz Kulasek
@ 2016-01-15 18:12 ` Stephen Hemminger
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  3 siblings, 0 replies; 43+ messages in thread
From: Stephen Hemminger @ 2016-01-15 18:12 UTC (permalink / raw)
  To: dev

On Fri, 15 Jan 2016 15:43:56 +0100
Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:

> Many sample apps include internal buffering for single-packet-at-a-time
> operation. Since this is such a common paradigm, this functionality is
> better suited to being inside the core ethdev API.
> The new APIs in the ethdev library are:
> * rte_eth_tx_buffer - buffer up a single packet for future transmission
> * rte_eth_tx_buffer_flush - flush any unsent buffered packets
> * rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
>   case transmitting a buffered burst fails. By default, we just free the
>   unsent packets.
> 
> As well as these, an additional reference callback is provided, which
> frees the packets (as the default callback does), as well as updating a
> user-provided counter, so that the number of dropped packets can be
> tracked.
> 
> The internal buffering of packets for TX in sample apps is no longer
> needed, so this patchset also replaces this code with calls to the new
> rte_eth_tx_buffer* APIs in:

The pipeline code also has its own implementation of this.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-01-15 18:13   ` Stephen Hemminger
  2016-01-15 18:14   ` Stephen Hemminger
  2016-01-15 18:44   ` Ananyev, Konstantin
  2 siblings, 0 replies; 43+ messages in thread
From: Stephen Hemminger @ 2016-01-15 18:13 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

On Fri, 15 Jan 2016 15:43:57 +0100
Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:

>  static int
>  rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
>  {
>  	uint16_t old_nb_queues = dev->data->nb_tx_queues;
>  	void **txq;
> +	struct rte_eth_dev_tx_buffer *new_bufs;
>  	unsigned i;
>  
>  	if (dev->data->tx_queues == NULL) { /* first time configuration */
> @@ -841,17 +872,40 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
>  			dev->data->nb_tx_queues = 0;
>  			return -(ENOMEM);
>  		}
> +
> +		dev->data->txq_bufs = rte_zmalloc("ethdev->txq_bufs",
> +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);

You should use zmalloc_socket and put the buffering on the same numa
node as the device?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-01-15 18:13   ` Stephen Hemminger
@ 2016-01-15 18:14   ` Stephen Hemminger
  2016-01-15 18:44   ` Ananyev, Konstantin
  2 siblings, 0 replies; 43+ messages in thread
From: Stephen Hemminger @ 2016-01-15 18:14 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

On Fri, 15 Jan 2016 15:43:57 +0100
Tomasz Kulasek <tomaszx.kulasek@intel.com> wrote:

> +			return -(ENOMEM);

Please don't put () around args to return, it is a BSD stylism

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-01-15 18:13   ` Stephen Hemminger
  2016-01-15 18:14   ` Stephen Hemminger
@ 2016-01-15 18:44   ` Ananyev, Konstantin
  2016-02-02 10:00     ` Kulasek, TomaszX
  2 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-01-15 18:44 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

Hi Tomasz,

>  static int
>  rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
>  {
>  	uint16_t old_nb_queues = dev->data->nb_tx_queues;
>  	void **txq;
> +	struct rte_eth_dev_tx_buffer *new_bufs;
>  	unsigned i;
> 
>  	if (dev->data->tx_queues == NULL) { /* first time configuration */
> @@ -841,17 +872,40 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
>  			dev->data->nb_tx_queues = 0;
>  			return -(ENOMEM);
>  		}
> +
> +		dev->data->txq_bufs = rte_zmalloc("ethdev->txq_bufs",
> +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> +		if (dev->data->txq_bufs == NULL) {
> +			dev->data->nb_tx_queues = 0;
> +			rte_free(dev->data->tx_queues);
> +			return -(ENOMEM);
> +		}
> +
>  	} else { /* re-configure */
> +
> +		/* flush the packets queued for all queues*/
> +		for (i = 0; i < old_nb_queues; i++)
> +			rte_eth_tx_buffer_flush(dev->data->port_id, i);
> +

I don't think it is safe to call tx_burst() at queue config stage.
Instead you need to flush (or just empty) your txq)bufs at tx_queue_stop stage.

>  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP);
> 
> +		/* get new buffer space first, but keep old space around */
> +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> +		if (new_bufs == NULL)
> +			return -(ENOMEM);
> +


Why not to allocate space for txq_bufs together with tx_queues (as one chunk for both)?
As I understand there is always one to one mapping between them anyway.
Would simplify things a bit.
Or even introduce a new struct to group with all related tx queue info togetehr
struct rte_eth_txq_data {
	void *queue; /*actual pmd  queue*/
	struct rte_eth_dev_tx_buffer buf;
	uint8_t state;
}
And use it inside struct rte_eth_dev_data?
Would probably give a better data locality.

>  		txq = dev->data->tx_queues;
> 
>  		for (i = nb_queues; i < old_nb_queues; i++)
>  			(*dev->dev_ops->tx_queue_release)(txq[i]);
>  		txq = rte_realloc(txq, sizeof(txq[0]) * nb_queues,
>  				  RTE_CACHE_LINE_SIZE);
> -		if (txq == NULL)
> -			return -ENOMEM;
> +		if (txq == NULL) {
> +			rte_free(new_bufs);
> +			return -(ENOMEM);
> +		}
> +
>  		if (nb_queues > old_nb_queues) {
>  			uint16_t new_qs = nb_queues - old_nb_queues;
> 
> @@ -861,6 +915,9 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
> 
>  		dev->data->tx_queues = txq;
> 
> +		/* now replace old buffers with new */
> +		rte_free(dev->data->txq_bufs);
> +		dev->data->txq_bufs = new_bufs;
>  	}
>  	dev->data->nb_tx_queues = nb_queues;
>  	return 0;
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index bada8ad..23faa6a 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -182,6 +182,7 @@ extern "C" {
>  #include <rte_pci.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
> +#include <rte_branch_prediction.h>
>  #include "rte_ether.h"
>  #include "rte_eth_ctrl.h"
>  #include "rte_dev_info.h"
> @@ -1519,6 +1520,34 @@ enum rte_eth_dev_type {
>  	RTE_ETH_DEV_MAX		/**< max value of this enum */
>  };
> 
> +typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
> +		void *userdata);
> +
> +/**
> + * @internal
> + * Structure used to buffer packets for future TX
> + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
> + */
> +struct rte_eth_dev_tx_buffer {
> +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];

I think it is better to make size of pkts[] configurable at runtime.
There are a lot of different usage scenarios - hard to predict what would be an
optimal buffer size for all cases.  

> +	unsigned nb_pkts;
> +	uint64_t errors;
> +	/**< Total number of queue packets to sent that are dropped. */
> +};
> +
> +/**
> + * @internal
> + * Structure to hold a callback to be used on error when a tx_buffer_flush
> + * call fails to send all packets.
> + * This needs to be a separate structure, as it must go in the ethdev structure
> + * rather than ethdev_data, due to the use of a function pointer, which is not
> + * multi-process safe.
> + */
> +struct rte_eth_dev_tx_buffer_err_cb {
> +	buffer_tx_error_fn flush_cb; /* callback for when tx_burst fails */
> +	void *userdata;              /* userdata for callback */
> +};
> +
>  /**
>   * @internal
>   * The generic data structure associated with each ethernet device.
> @@ -1550,6 +1579,9 @@ struct rte_eth_dev {
>  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>  	uint8_t attached; /**< Flag indicating the port is attached */
>  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> +
> +	/** Callbacks to be used on a tx_buffer_flush error */
> +	struct rte_eth_dev_tx_buffer_err_cb tx_buf_err_cb[RTE_MAX_QUEUES_PER_PORT];
>  };
> 
>  struct rte_eth_dev_sriov {
> @@ -1610,6 +1642,8 @@ struct rte_eth_dev_data {
>  	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough */
>  	int numa_node;  /**< NUMA node connection */
>  	const char *drv_name;   /**< Driver name */
> +	struct rte_eth_dev_tx_buffer *txq_bufs;
> +	/**< space to allow buffered transmits */
>  };
> 
>  /** Device supports hotplug detach */
> @@ -2661,8 +2695,181 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>  }
> 
>  /**
> - * The eth device event type for interrupt, and maybe others in the future.
> + * Buffer a single packet for future transmission on a port and queue
> + *
> + * This function takes a single mbuf/packet and buffers it for later
> + * transmission on the particular port and queue specified. Once the buffer is
> + * full of packets, an attempt will be made to transmit all the buffered
> + * packets. In case of error, where not all packets can be transmitted, a
> + * callback is called with the unsent packets as a parameter. If no callback
> + * is explicitly set up, the unsent packets are just freed back to the owning
> + * mempool. The function returns the number of packets actually sent i.e.
> + * 0 if no buffer flush occurred, otherwise the number of packets successfully
> + * flushed
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param tx_pkt
> + *   Pointer to the packet mbuf to be sent.
> + * @return
> + *   0 = packet has been buffered for later transmission
> + *   N > 0 = packet has been buffered, and the buffer was subsequently flushed,
> + *     causing N packets to be sent, and the error callback to be called for
> + *     the rest.
> + */
> +static inline uint16_t __attribute__((always_inline))
> +rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id, struct rte_mbuf *tx_pkt)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	struct rte_eth_dev_tx_buffer *qbuf = &dev->data->txq_bufs[queue_id];
> +	uint16_t i;
> +
> +	qbuf->pkts[qbuf->nb_pkts++] = tx_pkt;
> +	if (qbuf->nb_pkts < RTE_ETHDEV_TX_BUFSIZE)
> +		return 0;
> +

Probably just call rte_eth_tx_buffer_flush() here to avoid duplication.

> +	const uint16_t sent = rte_eth_tx_burst(port_id, queue_id, qbuf->pkts,
> +			RTE_ETHDEV_TX_BUFSIZE);
> +
> +	qbuf->nb_pkts = 0;
> +
> +	/* All packets sent, or to be dealt with by callback below */
> +	if (unlikely(sent != RTE_ETHDEV_TX_BUFSIZE)) {
> +		if (dev->tx_buf_err_cb[queue_id].flush_cb)
> +			dev->tx_buf_err_cb[queue_id].flush_cb(&qbuf->pkts[sent],
> +					RTE_ETHDEV_TX_BUFSIZE - sent,
> +					dev->tx_buf_err_cb[queue_id].userdata);
> +		else {
> +			qbuf->errors += RTE_ETHDEV_TX_BUFSIZE - sent;
> +			for (i = sent; i < RTE_ETHDEV_TX_BUFSIZE; i++)
> +				rte_pktmbuf_free(qbuf->pkts[i]);
> +		}
> +	}
> +
> +	return sent;
> +}
> +
> +/**
> + * Send any packets queued up for transmission on a port and HW queue
> + *
> + * This causes an explicit flush of packets previously buffered via the
> + * rte_eth_tx_buffer() function. It returns the number of packets successfully
> + * sent to the NIC, and calls the error callback for any unsent packets. Unless
> + * explicitly set up otherwise, the default callback simply frees the unsent
> + * packets back to the owning mempool.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @return
> + *   The number of packets successfully sent to the Ethernet device. The error
> + *   callback is called for any packets which could not be sent.
> + */
> +static inline uint16_t
> +rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id)
> +{
> +	uint16_t i;
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	struct rte_eth_dev_tx_buffer *qbuf = &dev->data->txq_bufs[queue_id];
> +
> +	if (qbuf->nb_pkts == 0)
> +		return 0;
> +
> +	const uint16_t to_send = qbuf->nb_pkts;
> +
> +	const uint16_t sent = rte_eth_tx_burst(port_id, queue_id, qbuf->pkts,
> +			to_send);

Try to avoid defining variables in the middle of the code block.
Again no much value in having these 2 above variables as 'const'.
Konstantin

> +
> +	qbuf->nb_pkts = 0;
> +
> +	/* All packets sent, or to be dealt with by callback below */
> +	if (unlikely(sent != to_send)) {
> +		if (dev->tx_buf_err_cb[queue_id].flush_cb)
> +			dev->tx_buf_err_cb[queue_id].flush_cb(&qbuf->pkts[sent],
> +					to_send - sent,
> +					dev->tx_buf_err_cb[queue_id].userdata);
> +		else {
> +			qbuf->errors += to_send - sent;
> +			for (i = sent; i < to_send; i++)
> +				rte_pktmbuf_free(qbuf->pkts[i]);
> +		}
> +	}
> +
> +	return sent;
> +}
> +

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-01-15 18:44   ` Ananyev, Konstantin
@ 2016-02-02 10:00     ` Kulasek, TomaszX
  2016-02-02 13:49       ` Ananyev, Konstantin
  0 siblings, 1 reply; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-02-02 10:00 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, January 15, 2016 19:45
> To: Kulasek, TomaszX; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> Hi Tomasz,
> 
> >
> > +		/* get new buffer space first, but keep old space around */
> > +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > +		if (new_bufs == NULL)
> > +			return -(ENOMEM);
> > +
> 
> Why not to allocate space for txq_bufs together with tx_queues (as one
> chunk for both)?
> As I understand there is always one to one mapping between them anyway.
> Would simplify things a bit.
> Or even introduce a new struct to group with all related tx queue info
> togetehr struct rte_eth_txq_data {
> 	void *queue; /*actual pmd  queue*/
> 	struct rte_eth_dev_tx_buffer buf;
> 	uint8_t state;
> }
> And use it inside struct rte_eth_dev_data?
> Would probably give a better data locality.
> 

Introducing such a struct will require a huge rework of pmd drivers. I don't think it's worth only for this one feature. 


> > +/**
> > + * @internal
> > + * Structure used to buffer packets for future TX
> > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > +struct rte_eth_dev_tx_buffer {
> > +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> 
> I think it is better to make size of pkts[] configurable at runtime.
> There are a lot of different usage scenarios - hard to predict what would
> be an optimal buffer size for all cases.
> 

This buffer is allocated in eth_dev shared memory, so there are two scenarios:
1) We have prealocated buffer with maximal size, and then we can set threshold level without restarting device, or
2) We need to set its size before starting device.

Second one is better, I think.

Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-02 10:00     ` Kulasek, TomaszX
@ 2016-02-02 13:49       ` Ananyev, Konstantin
  2016-02-09 17:02         ` Kulasek, TomaszX
  0 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-02-02 13:49 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

Hi Tomasz,

> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Tuesday, February 02, 2016 10:01 AM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, January 15, 2016 19:45
> > To: Kulasek, TomaszX; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> >
> > Hi Tomasz,
> >
> > >
> > > +		/* get new buffer space first, but keep old space around */
> > > +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > > +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > > +		if (new_bufs == NULL)
> > > +			return -(ENOMEM);
> > > +
> >
> > Why not to allocate space for txq_bufs together with tx_queues (as one
> > chunk for both)?
> > As I understand there is always one to one mapping between them anyway.
> > Would simplify things a bit.
> > Or even introduce a new struct to group with all related tx queue info
> > togetehr struct rte_eth_txq_data {
> > 	void *queue; /*actual pmd  queue*/
> > 	struct rte_eth_dev_tx_buffer buf;
> > 	uint8_t state;
> > }
> > And use it inside struct rte_eth_dev_data?
> > Would probably give a better data locality.
> >
> 
> Introducing such a struct will require a huge rework of pmd drivers. I don't think it's worth only for this one feature.

Why not?
Things are getting more and more messy here: now we have a separate array of pointer to queues,
Separate array of queue states, you are going to add separate array of tx buffers.
For me it seems logical to unite all these 3 fields into one sub-struct. 

> 
> 
> > > +/**
> > > + * @internal
> > > + * Structure used to buffer packets for future TX
> > > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > > +struct rte_eth_dev_tx_buffer {
> > > +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> >
> > I think it is better to make size of pkts[] configurable at runtime.
> > There are a lot of different usage scenarios - hard to predict what would
> > be an optimal buffer size for all cases.
> >
> 
> This buffer is allocated in eth_dev shared memory, so there are two scenarios:
> 1) We have prealocated buffer with maximal size, and then we can set threshold level without restarting device, or
> 2) We need to set its size before starting device.

> 
> Second one is better, I think.

Yep, I was thinking about 2) too.
Might be an extra parameter in struct rte_eth_txconf.

> 
> Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-02 13:49       ` Ananyev, Konstantin
@ 2016-02-09 17:02         ` Kulasek, TomaszX
  2016-02-09 23:56           ` Ananyev, Konstantin
  0 siblings, 1 reply; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-02-09 17:02 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, February 2, 2016 14:50
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> Hi Tomasz,
> 
> > -----Original Message-----
> > From: Kulasek, TomaszX
> > Sent: Tuesday, February 02, 2016 10:01 AM
> > To: Ananyev, Konstantin; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Friday, January 15, 2016 19:45
> > > To: Kulasek, TomaszX; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > >
> > > Hi Tomasz,
> > >
> > > >
> > > > +		/* get new buffer space first, but keep old space around
> */
> > > > +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > > > +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > > > +		if (new_bufs == NULL)
> > > > +			return -(ENOMEM);
> > > > +
> > >
> > > Why not to allocate space for txq_bufs together with tx_queues (as
> > > one chunk for both)?
> > > As I understand there is always one to one mapping between them
> anyway.
> > > Would simplify things a bit.
> > > Or even introduce a new struct to group with all related tx queue
> > > info togetehr struct rte_eth_txq_data {
> > > 	void *queue; /*actual pmd  queue*/
> > > 	struct rte_eth_dev_tx_buffer buf;
> > > 	uint8_t state;
> > > }
> > > And use it inside struct rte_eth_dev_data?
> > > Would probably give a better data locality.
> > >
> >
> > Introducing such a struct will require a huge rework of pmd drivers. I
> don't think it's worth only for this one feature.
> 
> Why not?
> Things are getting more and more messy here: now we have a separate array
> of pointer to queues, Separate array of queue states, you are going to add
> separate array of tx buffers.
> For me it seems logical to unite all these 3 fields into one sub-struct.
> 

I agree with you, and probably such a work will be nice also for rx queues, but these two changes impacts on another part of dpdk. While buffered tx API is more client application helper.

For me these two thinks are different features and should be made separately because:
1) They are independent and can be done separately,
2) They can (and should) be reviewed, tested and approved separately,
3) They are addressed to another type of people (tx buffering to application developers, rte_eth_dev_data to pmd developers), so another people can be interested in having (or not) one or second feature

Even for bug tracking it will be cleaner to separate these two things. And yes, it is logical to unite it, maybe also for rx queues, but should be discussed separately.

I've made a prototype with this rework, and the impact on the code not related to this particular feature is too wide and strong to join them. I would rather to provide it as independent patch for further discussion only on it, if needed.

> >
> >
> > > > +/**
> > > > + * @internal
> > > > + * Structure used to buffer packets for future TX
> > > > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > > > +struct rte_eth_dev_tx_buffer {
> > > > +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> > >
> > > I think it is better to make size of pkts[] configurable at runtime.
> > > There are a lot of different usage scenarios - hard to predict what
> > > would be an optimal buffer size for all cases.
> > >
> >
> > This buffer is allocated in eth_dev shared memory, so there are two
> scenarios:
> > 1) We have prealocated buffer with maximal size, and then we can set
> > threshold level without restarting device, or
> > 2) We need to set its size before starting device.
> 
> >
> > Second one is better, I think.
> 
> Yep, I was thinking about 2) too.
> Might be an extra parameter in struct rte_eth_txconf.
> 

Struct rte_eth_txconf is passed to ethdev after rte_eth_dev_tx_queue_config, so we don't know its value when buffers are allocated.
I'm looking for another solution.

> >
> > Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-09 17:02         ` Kulasek, TomaszX
@ 2016-02-09 23:56           ` Ananyev, Konstantin
  2016-02-12 11:44             ` Ananyev, Konstantin
  0 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-02-09 23:56 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev



> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Tuesday, February 09, 2016 5:03 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, February 2, 2016 14:50
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> >
> > Hi Tomasz,
> >
> > > -----Original Message-----
> > > From: Kulasek, TomaszX
> > > Sent: Tuesday, February 02, 2016 10:01 AM
> > > To: Ananyev, Konstantin; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > >
> > > Hi Konstantin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, January 15, 2016 19:45
> > > > To: Kulasek, TomaszX; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > > >
> > > > Hi Tomasz,
> > > >
> > > > >
> > > > > +		/* get new buffer space first, but keep old space around
> > */
> > > > > +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > > > > +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > > > > +		if (new_bufs == NULL)
> > > > > +			return -(ENOMEM);
> > > > > +
> > > >
> > > > Why not to allocate space for txq_bufs together with tx_queues (as
> > > > one chunk for both)?
> > > > As I understand there is always one to one mapping between them
> > anyway.
> > > > Would simplify things a bit.
> > > > Or even introduce a new struct to group with all related tx queue
> > > > info togetehr struct rte_eth_txq_data {
> > > > 	void *queue; /*actual pmd  queue*/
> > > > 	struct rte_eth_dev_tx_buffer buf;
> > > > 	uint8_t state;
> > > > }
> > > > And use it inside struct rte_eth_dev_data?
> > > > Would probably give a better data locality.
> > > >
> > >
> > > Introducing such a struct will require a huge rework of pmd drivers. I
> > don't think it's worth only for this one feature.
> >
> > Why not?
> > Things are getting more and more messy here: now we have a separate array
> > of pointer to queues, Separate array of queue states, you are going to add
> > separate array of tx buffers.
> > For me it seems logical to unite all these 3 fields into one sub-struct.
> >
> 
> I agree with you, and probably such a work will be nice also for rx queues, but these two changes impacts on another part of dpdk.
> While buffered tx API is more client application helper.
> 
> For me these two thinks are different features and should be made separately because:
> 1) They are independent and can be done separately,
> 2) They can (and should) be reviewed, tested and approved separately,
> 3) They are addressed to another type of people (tx buffering to application developers, rte_eth_dev_data to pmd developers), so
> another people can be interested in having (or not) one or second feature

Such division seems a bit artificial to me :)
You are making changes in rte_ethdev.[c,h]  - I think that filed regrouping would make code cleaner and easier to read/maintain.

> 
> Even for bug tracking it will be cleaner to separate these two things. And yes, it is logical to unite it, maybe also for rx queues, but
> should be discussed separately.
> 
> I've made a prototype with this rework, and the impact on the code not related to this particular feature is too wide and strong to join
> them. I would rather to provide it as independent patch for further discussion only on it, if needed.

Sure, separate patch is fine.
Why not to submit it as extra one is the series?


> 
> > >
> > >
> > > > > +/**
> > > > > + * @internal
> > > > > + * Structure used to buffer packets for future TX
> > > > > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > > > > +struct rte_eth_dev_tx_buffer {
> > > > > +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> > > >
> > > > I think it is better to make size of pkts[] configurable at runtime.
> > > > There are a lot of different usage scenarios - hard to predict what
> > > > would be an optimal buffer size for all cases.
> > > >
> > >
> > > This buffer is allocated in eth_dev shared memory, so there are two
> > scenarios:
> > > 1) We have prealocated buffer with maximal size, and then we can set
> > > threshold level without restarting device, or
> > > 2) We need to set its size before starting device.
> >
> > >
> > > Second one is better, I think.
> >
> > Yep, I was thinking about 2) too.
> > Might be an extra parameter in struct rte_eth_txconf.
> >
> 
> Struct rte_eth_txconf is passed to ethdev after rte_eth_dev_tx_queue_config, so we don't know its value when buffers are
> allocated.

Ok, and why allocation of the tx buffer can't be done at rte_eth_tx_queue_setup()? 

Actually just thought why not to let rte_eth_tx_buffer() to accept struct rte_eth_dev_tx_buffer * as a parameter:
+static inline int __attribute__((always_inline))
+rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,  accept struct rte_eth_dev_tx_buffer * txb, struct rte_mbuf *tx_pkt)
?

In that case we don't need to make any changes at rte_ethdev.[h,c] to alloc/free/maintain tx_buffer inside each queue...
It all will be upper layer responsibility.
So no need to modify existing rte_ethdev structures/code.
Again, no need for error callback - caller would check return value and decide what to do with unsent packets in the tx_buffer.

Konstantin

> I'm looking for another solution.
> 
> > >
> > > Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-09 23:56           ` Ananyev, Konstantin
@ 2016-02-12 11:44             ` Ananyev, Konstantin
  2016-02-12 16:40               ` Ivan Boule
  0 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-02-12 11:44 UTC (permalink / raw)
  To: Ananyev, Konstantin, Kulasek, TomaszX, dev


> 
> > -----Original Message-----
> > From: Kulasek, TomaszX
> > Sent: Tuesday, February 09, 2016 5:03 PM
> > To: Ananyev, Konstantin; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> >
> >
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, February 2, 2016 14:50
> > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > >
> > > Hi Tomasz,
> > >
> > > > -----Original Message-----
> > > > From: Kulasek, TomaszX
> > > > Sent: Tuesday, February 02, 2016 10:01 AM
> > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > > >
> > > > Hi Konstantin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Ananyev, Konstantin
> > > > > Sent: Friday, January 15, 2016 19:45
> > > > > To: Kulasek, TomaszX; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> > > > >
> > > > > Hi Tomasz,
> > > > >
> > > > > >
> > > > > > +		/* get new buffer space first, but keep old space around
> > > */
> > > > > > +		new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > > > > > +				sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > > > > > +		if (new_bufs == NULL)
> > > > > > +			return -(ENOMEM);
> > > > > > +
> > > > >
> > > > > Why not to allocate space for txq_bufs together with tx_queues (as
> > > > > one chunk for both)?
> > > > > As I understand there is always one to one mapping between them
> > > anyway.
> > > > > Would simplify things a bit.
> > > > > Or even introduce a new struct to group with all related tx queue
> > > > > info togetehr struct rte_eth_txq_data {
> > > > > 	void *queue; /*actual pmd  queue*/
> > > > > 	struct rte_eth_dev_tx_buffer buf;
> > > > > 	uint8_t state;
> > > > > }
> > > > > And use it inside struct rte_eth_dev_data?
> > > > > Would probably give a better data locality.
> > > > >
> > > >
> > > > Introducing such a struct will require a huge rework of pmd drivers. I
> > > don't think it's worth only for this one feature.
> > >
> > > Why not?
> > > Things are getting more and more messy here: now we have a separate array
> > > of pointer to queues, Separate array of queue states, you are going to add
> > > separate array of tx buffers.
> > > For me it seems logical to unite all these 3 fields into one sub-struct.
> > >
> >
> > I agree with you, and probably such a work will be nice also for rx queues, but these two changes impacts on another part of dpdk.
> > While buffered tx API is more client application helper.
> >
> > For me these two thinks are different features and should be made separately because:
> > 1) They are independent and can be done separately,
> > 2) They can (and should) be reviewed, tested and approved separately,
> > 3) They are addressed to another type of people (tx buffering to application developers, rte_eth_dev_data to pmd developers), so
> > another people can be interested in having (or not) one or second feature
> 
> Such division seems a bit artificial to me :)
> You are making changes in rte_ethdev.[c,h]  - I think that filed regrouping would make code cleaner and easier to read/maintain.
> 
> >
> > Even for bug tracking it will be cleaner to separate these two things. And yes, it is logical to unite it, maybe also for rx queues, but
> > should be discussed separately.
> >
> > I've made a prototype with this rework, and the impact on the code not related to this particular feature is too wide and strong to
> join
> > them. I would rather to provide it as independent patch for further discussion only on it, if needed.
> 
> Sure, separate patch is fine.
> Why not to submit it as extra one is the series?
> 
> 
> >
> > > >
> > > >
> > > > > > +/**
> > > > > > + * @internal
> > > > > > + * Structure used to buffer packets for future TX
> > > > > > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > > > > > +struct rte_eth_dev_tx_buffer {
> > > > > > +	struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> > > > >
> > > > > I think it is better to make size of pkts[] configurable at runtime.
> > > > > There are a lot of different usage scenarios - hard to predict what
> > > > > would be an optimal buffer size for all cases.
> > > > >
> > > >
> > > > This buffer is allocated in eth_dev shared memory, so there are two
> > > scenarios:
> > > > 1) We have prealocated buffer with maximal size, and then we can set
> > > > threshold level without restarting device, or
> > > > 2) We need to set its size before starting device.
> > >
> > > >
> > > > Second one is better, I think.
> > >
> > > Yep, I was thinking about 2) too.
> > > Might be an extra parameter in struct rte_eth_txconf.
> > >
> >
> > Struct rte_eth_txconf is passed to ethdev after rte_eth_dev_tx_queue_config, so we don't know its value when buffers are
> > allocated.
> 
> Ok, and why allocation of the tx buffer can't be done at rte_eth_tx_queue_setup()?
> 
> Actually just thought why not to let rte_eth_tx_buffer() to accept struct rte_eth_dev_tx_buffer * as a parameter:
> +static inline int __attribute__((always_inline))
> +rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,  accept struct rte_eth_dev_tx_buffer * txb, struct rte_mbuf *tx_pkt)
> ?
> 
> In that case we don't need to make any changes at rte_ethdev.[h,c] to alloc/free/maintain tx_buffer inside each queue...
> It all will be upper layer responsibility.
> So no need to modify existing rte_ethdev structures/code.
> Again, no need for error callback - caller would check return value and decide what to do with unsent packets in the tx_buffer.
> 

Just to summarise why I think it is better to have tx buffering managed on the app level:

1. avoid any ABI change.
2. Avoid extra changes in rte_ethdev.c: tx_queue_setup/tx_queue_stop.
3. Provides much more flexibility to the user:
   a) where to allocate space for tx_buffer (stack, heap, hugepages, etc).
   b) user can mix and match plain tx_burst() and   tx_buffer/tx_buffer_flush()
        in any way he fills it appropriate.
   c) user can change the size of tx_buffer without stop/re-config/start queue:
        just allocate new larger(smaller) tx_buffer & copy contents to the new one.
   d) user can preserve buffered packets through device restart circle:
        i.e if let say TX hang happened, and user has to do dev_stop/dev_start -
        contents of tx_buffer will stay unchanged and its contents could be
        (re-)transmitted after device is up again, or  through different port/queue if needed.
 
As a drawbacks mentioned - tx error handling becomes less transparent...
But we can add error handling routine and it's user provided parameter
into struct rte_eth_dev_tx_buffer', something like that:

+struct rte_eth_dev_tx_buffer {
+	buffer_tx_error_fn cbfn;
+	void *userdata;
+	unsigned nb_pkts;
+	uint64_t errors;
+	/**< Total number of queue packets to sent that are dropped. */
+	struct rte_mbuf *pkts[];
+};

Konstantin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-12 11:44             ` Ananyev, Konstantin
@ 2016-02-12 16:40               ` Ivan Boule
  2016-02-12 17:33                 ` Bruce Richardson
  0 siblings, 1 reply; 43+ messages in thread
From: Ivan Boule @ 2016-02-12 16:40 UTC (permalink / raw)
  To: Ananyev, Konstantin, Kulasek, TomaszX, dev

On 02/12/2016 12:44 PM, Ananyev, Konstantin wrote:
>
>>
>>> -----Original Message-----
...
>>
>> In that case we don't need to make any changes at rte_ethdev.[h,c] to alloc/free/maintain tx_buffer inside each queue...
>> It all will be upper layer responsibility.
>> So no need to modify existing rte_ethdev structures/code.
>> Again, no need for error callback - caller would check return value and decide what to do with unsent packets in the tx_buffer.
>>
>
> Just to summarise why I think it is better to have tx buffering managed on the app level:
>
> 1. avoid any ABI change.
> 2. Avoid extra changes in rte_ethdev.c: tx_queue_setup/tx_queue_stop.
> 3. Provides much more flexibility to the user:
>     a) where to allocate space for tx_buffer (stack, heap, hugepages, etc).
>     b) user can mix and match plain tx_burst() and   tx_buffer/tx_buffer_flush()
>          in any way he fills it appropriate.
>     c) user can change the size of tx_buffer without stop/re-config/start queue:
>          just allocate new larger(smaller) tx_buffer & copy contents to the new one.
>     d) user can preserve buffered packets through device restart circle:
>          i.e if let say TX hang happened, and user has to do dev_stop/dev_start -
>          contents of tx_buffer will stay unchanged and its contents could be
>          (re-)transmitted after device is up again, or  through different port/queue if needed.
>
> As a drawbacks mentioned - tx error handling becomes less transparent...
> But we can add error handling routine and it's user provided parameter
> into struct rte_eth_dev_tx_buffer', something like that:
>
> +struct rte_eth_dev_tx_buffer {
> +	buffer_tx_error_fn cbfn;
> +	void *userdata;
> +	unsigned nb_pkts;
> +	uint64_t errors;
> +	/**< Total number of queue packets to sent that are dropped. */
> +	struct rte_mbuf *pkts[];
> +};
>
> Konstantin
>

Just to enforce Konstantin's comments.
As a very basic - not to say fundamental - rule, one should avoid adding 
in the PMD RX/TX API any extra processing that can be handled at a 
higher level.
The only and self-sufficient reason is that we must avoid impacting 
performances on the critical path, in particular for those - usually the 
majority of - applications that do not need such extra operations, or 
better implement them at upper level.

Maybe in a not so far future will come a proposal for forking a new open 
source fast-dpdk project aiming at providing API simplicity, 
zero-overhead, modular design, and all those nice properties that every 
one claims to seek :-)

Ivan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
  2016-02-12 16:40               ` Ivan Boule
@ 2016-02-12 17:33                 ` Bruce Richardson
  0 siblings, 0 replies; 43+ messages in thread
From: Bruce Richardson @ 2016-02-12 17:33 UTC (permalink / raw)
  To: Ivan Boule; +Cc: dev

On Fri, Feb 12, 2016 at 05:40:02PM +0100, Ivan Boule wrote:
> On 02/12/2016 12:44 PM, Ananyev, Konstantin wrote:
> >
> >>
> >>>-----Original Message-----
> ...
> >>
> >>In that case we don't need to make any changes at rte_ethdev.[h,c] to alloc/free/maintain tx_buffer inside each queue...
> >>It all will be upper layer responsibility.
> >>So no need to modify existing rte_ethdev structures/code.
> >>Again, no need for error callback - caller would check return value and decide what to do with unsent packets in the tx_buffer.
> >>
> >
> >Just to summarise why I think it is better to have tx buffering managed on the app level:
> >
> >1. avoid any ABI change.
> >2. Avoid extra changes in rte_ethdev.c: tx_queue_setup/tx_queue_stop.
> >3. Provides much more flexibility to the user:
> >    a) where to allocate space for tx_buffer (stack, heap, hugepages, etc).
> >    b) user can mix and match plain tx_burst() and   tx_buffer/tx_buffer_flush()
> >         in any way he fills it appropriate.
> >    c) user can change the size of tx_buffer without stop/re-config/start queue:
> >         just allocate new larger(smaller) tx_buffer & copy contents to the new one.
> >    d) user can preserve buffered packets through device restart circle:
> >         i.e if let say TX hang happened, and user has to do dev_stop/dev_start -
> >         contents of tx_buffer will stay unchanged and its contents could be
> >         (re-)transmitted after device is up again, or  through different port/queue if needed.
> >
> >As a drawbacks mentioned - tx error handling becomes less transparent...
> >But we can add error handling routine and it's user provided parameter
> >into struct rte_eth_dev_tx_buffer', something like that:
> >
> >+struct rte_eth_dev_tx_buffer {
> >+	buffer_tx_error_fn cbfn;
> >+	void *userdata;
> >+	unsigned nb_pkts;
> >+	uint64_t errors;
> >+	/**< Total number of queue packets to sent that are dropped. */
> >+	struct rte_mbuf *pkts[];
> >+};
> >
> >Konstantin
> >
> 
> Just to enforce Konstantin's comments.
> As a very basic - not to say fundamental - rule, one should avoid adding in
> the PMD RX/TX API any extra processing that can be handled at a higher
> level.
> The only and self-sufficient reason is that we must avoid impacting
> performances on the critical path, in particular for those - usually the
> majority of - applications that do not need such extra operations, or better
> implement them at upper level.
> 
> Maybe in a not so far future will come a proposal for forking a new open
> source fast-dpdk project aiming at providing API simplicity, zero-overhead,
> modular design, and all those nice properties that every one claims to seek
> :-)
> 
> Ivan
> 
Hi Ivan,

I completely agree with your comments. However, none of the proposals for TX 
buffering would impact the existing fast-path processing paths. They simply add
an optional buffering layer above it - as is done by a very large number of our
sample apps. The point of this patchset is to reduce or eliminate this duplication
of code by centralising it in the libs.

Of the different ways of doing this proposed, my slight preference is for the
original one due to the simplicity of the APIs it provides, but the advantages
in flexibility provided by Konstantin's proposals may outway the additional 
"ugliness" in the APIs.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev
  2016-01-15 14:43 [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Tomasz Kulasek
                   ` (2 preceding siblings ...)
  2016-01-15 18:12 ` [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Stephen Hemminger
@ 2016-02-24 17:08 ` Tomasz Kulasek
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api Tomasz Kulasek
                     ` (3 more replies)
  3 siblings, 4 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-02-24 17:08 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

Due to the feedback from mailing list, that buffer management facilities
in the user application are more preferable than API simplicity, we decided
to move internal buffer table, as well as callback functions and user data,
from rte_eth_dev/rte_eth_dev_data to the application space.
It prevents ABI breakage and gives some more flexibility in the buffer's
management such as allocation, dynamical size change, reuse buffers on many
ports or after fail, and so on.


The following steps illustrate how tx buffers can be used in application:

1) Initialization

a) Allocate memory for a buffer

   struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
           RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);

   RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
   "size" packets in buffer.

b) Initialize allocated memory and set up default values. Threshold level
   must be lower than or equal to the MAX_PKT_BURST from 1a)

   rte_eth_tx_buffer_init(buffer, threshold);


c) Set error callback (optional)

   rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);


2) Store packet "pkt" in buffer and send them all to the queue_id on
   port_id when number of packets reaches threshold level set up in 1b)

   rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);


3) Send all stored packets to the queue_id on port_id

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);


4) Flush buffer and free memory

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
   ...
   rte_free(buffer);


v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart


Tomasz Kulasek (2):
  ethdev: add buffered tx api
  examples: rework to use buffered tx

 examples/l2fwd-jobstats/main.c                     |  104 +++++------
 examples/l2fwd-keepalive/main.c                    |  100 ++++-------
 examples/l2fwd/main.c                              |  104 +++++------
 examples/l3fwd-acl/main.c                          |   92 ++++------
 examples/l3fwd-power/main.c                        |   89 ++++------
 examples/link_status_interrupt/main.c              |  107 +++++-------
 .../client_server_mp/mp_client/client.c            |  101 ++++++-----
 examples/multi_process/l2fwd_fork/main.c           |   97 +++++------
 examples/packet_ordering/main.c                    |  122 +++++++++----
 examples/qos_meter/main.c                          |   61 ++-----
 lib/librte_ether/rte_ethdev.c                      |   36 ++++
 lib/librte_ether/rte_ethdev.h                      |  182 +++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map             |    9 +
 13 files changed, 662 insertions(+), 542 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
@ 2016-02-24 17:08   ` Tomasz Kulasek
  2016-03-08 22:52     ` Thomas Monjalon
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 2/2] examples: rework to use " Tomasz Kulasek
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Tomasz Kulasek @ 2016-02-24 17:08 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 lib/librte_ether/rte_ethdev.c          |   36 +++++++
 lib/librte_ether/rte_ethdev.h          |  182 +++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map |    9 ++
 3 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 756b234..b8ab747 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1307,6 +1307,42 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 }
 
 void
+rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata)
+{
+	uint64_t *count = userdata;
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+
+	*count += unsent;
+}
+
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn cbfn, void *userdata)
+{
+	buffer->cbfn = cbfn;
+	buffer->userdata = userdata;
+	return 0;
+}
+
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
+{
+	if (buffer == NULL)
+		return -EINVAL;
+
+	buffer->size = size;
+	if (buffer->cbfn == NULL)
+		rte_eth_tx_buffer_set_err_callback(buffer,
+				rte_eth_count_unsent_packet_callback, (void *)&buffer->errors);
+
+	return 0;
+}
+
+void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16da821..b0d4932 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -2655,6 +2655,186 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata);
+
+/**
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+	unsigned nb_pkts;
+	uint64_t errors;
+	/**< Total number of queue packets to sent that are dropped. */
+	buffer_tx_error_fn cbfn;
+	void *userdata;
+	uint16_t size;           /**< Size of buffer for buffered tx */
+	struct rte_mbuf *pkts[];
+};
+
+/**
+ * Calculate the size of the tx buffer.
+ *
+ * @param sz
+ *   Number of stored packets.
+ */
+#define RTE_ETH_TX_BUFFER_SIZE(sz) \
+	(sizeof(struct rte_eth_dev_tx_buffer) + (sz) * sizeof(struct rte_mbuf *))
+
+/**
+ * Initialize default values for buffered transmitting
+ *
+ * @param buffer
+ *   Tx buffer to be initialized.
+ * @param size
+ *   Buffer size
+ * @return
+ *   0 if no error
+ */
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size);
+
+/**
+ * Send any packets queued up for transmission on a port and HW queue
+ *
+ * This causes an explicit flush of packets previously buffered via the
+ * rte_eth_tx_buffer() function. It returns the number of packets successfully
+ * sent to the NIC, and calls the error callback for any unsent packets. Unless
+ * explicitly set up otherwise, the default callback simply frees the unsent
+ * packets back to the owning mempool.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer of packets to be transmit.
+ * @return
+ *   The number of packets successfully sent to the Ethernet device. The error
+ *   callback is called for any packets which could not be sent.
+ */
+static inline uint16_t
+rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer)
+{
+	uint16_t sent;
+
+	uint16_t to_send = buffer->nb_pkts;
+
+	if (to_send == 0)
+		return 0;
+
+	sent = rte_eth_tx_burst(port_id, queue_id, buffer->pkts, to_send);
+
+	buffer->nb_pkts = 0;
+
+	/* All packets sent, or to be dealt with by callback below */
+	if (unlikely(sent != to_send))
+		buffer->cbfn(&buffer->pkts[sent], to_send - sent,
+				buffer->userdata);
+
+	return sent;
+}
+
+/**
+ * Buffer a single packet for future transmission on a port and queue
+ *
+ * This function takes a single mbuf/packet and buffers it for later
+ * transmission on the particular port and queue specified. Once the buffer is
+ * full of packets, an attempt will be made to transmit all the buffered
+ * packets. In case of error, where not all packets can be transmitted, a
+ * callback is called with the unsent packets as a parameter. If no callback
+ * is explicitly set up, the unsent packets are just freed back to the owning
+ * mempool. The function returns the number of packets actually sent i.e.
+ * 0 if no buffer flush occurred, otherwise the number of packets successfully
+ * flushed
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer used to collect packets to be sent.
+ * @param tx_pkt
+ *   Pointer to the packet mbuf to be sent.
+ * @return
+ *   0 = packet has been buffered for later transmission
+ *   N > 0 = packet has been buffered, and the buffer was subsequently flushed,
+ *     causing N packets to be sent, and the error callback to be called for
+ *     the rest.
+ */
+static inline uint16_t __attribute__((always_inline))
+rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer, struct rte_mbuf *tx_pkt)
+{
+	buffer->pkts[buffer->nb_pkts++] = tx_pkt;
+	if (buffer->nb_pkts < buffer->size)
+		return 0;
+
+	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
+}
+
+/**
+ * Configure a callback for buffered packets which cannot be sent
+ *
+ * Register a specific callback to be called when an attempt is made to send
+ * all packets buffered on an ethernet port, but not all packets can
+ * successfully be sent. The callback registered here will be called only
+ * from calls to rte_eth_tx_buffer() and rte_eth_tx_buffer_flush() APIs.
+ * The default callback configured for each queue by default just frees the
+ * packets back to the calling mempool. If additional behaviour is required,
+ * for example, to count dropped packets, or to retry transmission of packets
+ * which cannot be sent, this function should be used to register a suitable
+ * callback function to implement the desired behaviour.
+ * The example callback "rte_eth_count_unsent_packet_callback()" is also
+ * provided as reference.
+ *
+ * @param buffer
+ *   The port identifier of the Ethernet device.
+ * @param cbfn
+ *   The function to be used as the callback.
+ * @param userdata
+ *   Arbitrary parameter to be passed to the callback function
+ * @return
+ *   0 on success, or -1 on error with rte_errno set appropriately
+ */
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn cbfn, void *userdata);
+
+/**
+ * Callback function for tracking unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behaviour when buffered packets cannot be sent. This
+ * function drops any unsent packets, but also updates a user-supplied counter
+ * to track the overall number of packets dropped. The counter should be an
+ * uint64_t variable.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an uint64_t value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Pointer to an unsigned long value, which will be incremented by unsent
+ */
+void
+rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata);
+
 /**
  * The eth device event type for interrupt, and maybe others in the future.
  */
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d8db24d..ad11c71 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -117,3 +117,12 @@ DPDK_2.2 {
 
 	local: *;
 };
+
+DPDK_2.3 {
+	global:
+
+	rte_eth_count_unsent_packet_callback;
+	rte_eth_tx_buffer_init;
+	rte_eth_tx_buffer_set_err_callback;
+
+} DPDK_2.2;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] examples: rework to use buffered tx api
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-02-24 17:08   ` Tomasz Kulasek
  2016-02-25 16:17   ` [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
  3 siblings, 0 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-02-24 17:08 UTC (permalink / raw)
  To: dev

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset also replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

v2 changes
 - rework synced with tx buffer API changes

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 examples/l2fwd-jobstats/main.c                     |  104 +++++++----------
 examples/l2fwd-keepalive/main.c                    |  100 ++++++----------
 examples/l2fwd/main.c                              |  104 +++++++----------
 examples/l3fwd-acl/main.c                          |   92 ++++++---------
 examples/l3fwd-power/main.c                        |   89 ++++++--------
 examples/link_status_interrupt/main.c              |  107 +++++++----------
 .../client_server_mp/mp_client/client.c            |  101 +++++++++-------
 examples/multi_process/l2fwd_fork/main.c           |   97 +++++++---------
 examples/packet_ordering/main.c                    |  122 ++++++++++++++------
 examples/qos_meter/main.c                          |   61 +++-------
 10 files changed, 436 insertions(+), 541 deletions(-)

diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 7b59f4e..f159168 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,6 +41,7 @@
 #include <rte_alarm.h>
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,18 +98,12 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	uint64_t next_flush_time;
-	unsigned len;
-	struct rte_mbuf *mbufs[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	uint64_t next_flush_time[RTE_MAX_ETHPORTS];
 
 	struct rte_timer rx_timers[MAX_RX_QUEUE_PER_LCORE];
 	struct rte_jobstats port_fwd_jobs[MAX_RX_QUEUE_PER_LCORE];
@@ -123,6 +118,8 @@ struct lcore_queue_conf {
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -373,59 +370,14 @@ show_stats_cb(__rte_unused void *param)
 	rte_eal_alarm_set(timer_period * US_PER_S, show_stats_cb, NULL);
 }
 
-/* Send the burst of packets on an output interface */
-static void
-l2fwd_send_burst(struct lcore_queue_conf *qconf, uint8_t port)
-{
-	struct mbuf_table *m_table;
-	uint16_t ret;
-	uint16_t queueid = 0;
-	uint16_t n;
-
-	m_table = &qconf->tx_mbufs[port];
-	n = m_table->len;
-
-	m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
-	m_table->len = 0;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table->mbufs, n);
-
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table->mbufs[ret]);
-		} while (++ret < n);
-	}
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	const unsigned lcore_id = rte_lcore_id();
-	struct lcore_queue_conf *qconf = &lcore_queue_conf[lcore_id];
-	struct mbuf_table *m_table = &qconf->tx_mbufs[port];
-	uint16_t len = qconf->tx_mbufs[port].len;
-
-	m_table->mbufs[len] = m;
-
-	len++;
-	m_table->len = len;
-
-	/* Enough pkts to be sent. */
-	if (unlikely(len == MAX_PKT_BURST))
-		l2fwd_send_burst(qconf, port);
-
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -437,7 +389,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 static void
@@ -511,8 +466,10 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	uint64_t now;
 	unsigned lcore_id;
 	struct lcore_queue_conf *qconf;
-	struct mbuf_table *m_table;
 	uint8_t portid;
+	unsigned i;
+	uint32_t sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
@@ -522,14 +479,20 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	now = rte_get_timer_cycles();
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
-	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-		m_table = &qconf->tx_mbufs[portid];
-		if (m_table->len == 0 || m_table->next_flush_time <= now)
+
+	for (i = 0; i < qconf->n_rx_port; i++) {
+		portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+
+		if (qconf->next_flush_time[portid] <= now)
 			continue;
 
-		l2fwd_send_burst(qconf, portid);
-	}
+		buffer = tx_buffer[portid];
+		sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+		if (sent)
+			port_statistics[portid].tx += sent;
 
+		qconf->next_flush_time[portid] = rte_get_timer_cycles() + drain_tsc;
+	}
 
 	/* Pass target to indicate that this job is happy of time interwal
 	 * in which it was called. */
@@ -938,6 +901,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index f4d52f2..3ae4750 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,21 +98,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -192,58 +188,14 @@ print_stats(__attribute__((unused)) struct rte_timer *ptr_timer,
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid = 0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -255,7 +207,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -264,12 +219,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
 	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1)
 		/ US_PER_S * BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -312,13 +269,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			prev_tsc = cur_tsc;
@@ -713,6 +672,23 @@ main(int argc, char **argv)
 				"rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index f35d8a1..b42b985 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -49,6 +49,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -99,21 +100,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -189,58 +185,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -252,7 +204,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -261,11 +216,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -285,6 +243,7 @@ l2fwd_main_loop(void)
 		portid = qconf->rx_port_list[i];
 		RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
 			portid);
+
 	}
 
 	while (!force_quit) {
@@ -297,13 +256,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -688,6 +649,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f676d14..3a895b7 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -119,11 +119,6 @@ static uint32_t enabled_port_mask;
 static int promiscuous_on; /**< Ports set in promiscuous mode off by default. */
 static int numa_on = 1; /**< NUMA is enabled by default. */
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -187,7 +182,7 @@ static struct rte_mempool *pktmbuf_pool[NB_SOCKETS];
 static inline int
 is_valid_ipv4_pkt(struct ipv4_hdr *pkt, uint32_t link_len);
 #endif
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port);
 
 #define MAX_ACL_RULE_NUM	100000
@@ -1291,56 +1286,26 @@ app_acl_init(void)
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 } __rte_cache_aligned;
 
 static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
 
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1428,20 +1393,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1936,6 +1893,7 @@ main(int argc, char **argv)
 	unsigned lcore_id;
 	uint32_t n_tx_queue, nb_lcores;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* init EAL */
 	ret = rte_eal_init(argc, argv);
@@ -1968,6 +1926,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "app_acl_init failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -2003,6 +1962,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+					RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+					rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -2032,8 +2007,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 828c18a..2ed106b 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 #include <rte_common.h>
 #include <rte_byteorder.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -173,11 +174,6 @@ enum freq_scale_hint_t
 	FREQ_HIGHEST  =       2
 };
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -347,8 +343,10 @@ static lookup_struct_t *ipv4_l3fwd_lookup_struct[NB_SOCKETS];
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 	lookup_struct_t * ipv4_lookup_struct;
 	lookup_struct_t * ipv6_lookup_struct;
 } __rte_cache_aligned;
@@ -442,49 +440,19 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim,
 	stats[lcore_id].sleep_time = 0;
 }
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
 static inline int
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
-
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
 
-	qconf->tx_mbufs[port].len = len;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
+
 	return 0;
 }
 
@@ -905,20 +873,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1579,6 +1539,7 @@ main(int argc, char **argv)
 	uint32_t n_tx_queue, nb_lcores;
 	uint32_t dev_rxq_num, dev_txq_num;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* catch SIGINT and restore cpufreq governor to ondemand */
 	signal(SIGINT, signal_exit_now);
@@ -1614,6 +1575,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "check_port_config failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -1657,6 +1619,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -1689,8 +1667,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index c57a08a..36dbf94 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,10 +98,6 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
 #define MAX_PKT_BURST 32
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
 
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
@@ -108,11 +105,11 @@ struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
 	unsigned tx_queue_id;
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -202,59 +199,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the packet on an output interface */
-static int
-lsi_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid;
-
-	queueid = (uint16_t) qconf->tx_queue_id;
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-lsi_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		lsi_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port = lsi_dst_ports[portid];
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
@@ -265,7 +217,10 @@ lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&lsi_ports_eth_addr[dst_port], &eth->s_addr);
 
-	lsi_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -275,10 +230,13 @@ lsi_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	unsigned sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -310,15 +268,15 @@ lsi_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			/* this could be optimized (use queueid instead of
-			 * portid), but it is not called so often */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				lsi_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = lsi_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -722,6 +680,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup: err=%d,port=%u\n",
 				  ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
@@ -729,6 +704,8 @@ main(int argc, char **argv)
 				  ret, (unsigned) portid);
 		printf("done:\n");
 
+		rte_eth_promiscuous_enable(portid);
+
 		printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n",
 				(unsigned) portid,
 				lsi_ports_eth_addr[portid].addr_bytes[0],
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index bf049a4..d4f9ca3 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -42,6 +42,7 @@
 #include <string.h>
 
 #include <rte_common.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_eal.h>
@@ -72,17 +73,13 @@
  * queue to write to. */
 static uint8_t client_id = 0;
 
-struct mbuf_queue {
 #define MBQ_CAPACITY 32
-	struct rte_mbuf *bufs[MBQ_CAPACITY];
-	uint16_t top;
-};
 
 /* maps input ports to output ports for packets */
 static uint8_t output_ports[RTE_MAX_ETHPORTS];
 
 /* buffers up a set of packet that are ready to send */
-static struct mbuf_queue output_bufs[RTE_MAX_ETHPORTS];
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 /* shared data from server. We update statistics here */
 static volatile struct tx_stats *tx_stats;
@@ -149,11 +146,51 @@ parse_app_args(int argc, char *argv[])
 }
 
 /*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata) {
+	int i;
+	uint8_t port_id = (uintptr_t)userdata;
+
+	tx_stats->tx_drop[port_id] += count;
+
+	/* free the mbufs which failed from transmit */
+	for (i = 0; i < count; i++)
+		rte_pktmbuf_free(unsent[i]);
+
+}
+
+static void
+configure_tx_buffer(uint8_t port_id, uint16_t size)
+{
+	int ret;
+
+	/* Initialize TX buffers */
+	tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(size), 0,
+			rte_eth_dev_socket_id(port_id));
+	if (tx_buffer[port_id] == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+				(unsigned) port_id);
+
+	rte_eth_tx_buffer_init(tx_buffer[port_id], size);
+
+	ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+			flush_tx_error_callback, (void *)(intptr_t)port_id);
+	if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+}
+
+/*
  * set up output ports so that all traffic on port gets sent out
  * its paired port. Index using actual port numbers since that is
  * what comes in the mbuf structure.
  */
-static void configure_output_ports(const struct port_info *ports)
+static void
+configure_output_ports(const struct port_info *ports)
 {
 	int i;
 	if (ports->num_ports > RTE_MAX_ETHPORTS)
@@ -164,41 +201,11 @@ static void configure_output_ports(const struct port_info *ports)
 		uint8_t p2 = ports->id[i+1];
 		output_ports[p1] = p2;
 		output_ports[p2] = p1;
-	}
-}
 
+		configure_tx_buffer(p1, MBQ_CAPACITY);
+		configure_tx_buffer(p2, MBQ_CAPACITY);
 
-static inline void
-send_packets(uint8_t port)
-{
-	uint16_t i, sent;
-	struct mbuf_queue *mbq = &output_bufs[port];
-
-	if (unlikely(mbq->top == 0))
-		return;
-
-	sent = rte_eth_tx_burst(port, client_id, mbq->bufs, mbq->top);
-	if (unlikely(sent < mbq->top)){
-		for (i = sent; i < mbq->top; i++)
-			rte_pktmbuf_free(mbq->bufs[i]);
-		tx_stats->tx_drop[port] += (mbq->top - sent);
 	}
-	tx_stats->tx[port] += sent;
-	mbq->top = 0;
-}
-
-/*
- * Enqueue a packet to be sent on a particular port, but
- * don't send it yet. Only when the buffer is full.
- */
-static inline void
-enqueue_packet(struct rte_mbuf *buf, uint8_t port)
-{
-	struct mbuf_queue *mbq = &output_bufs[port];
-	mbq->bufs[mbq->top++] = buf;
-
-	if (mbq->top == MBQ_CAPACITY)
-		send_packets(port);
 }
 
 /*
@@ -209,10 +216,15 @@ enqueue_packet(struct rte_mbuf *buf, uint8_t port)
 static void
 handle_packet(struct rte_mbuf *buf)
 {
+	int sent;
 	const uint8_t in_port = buf->port;
 	const uint8_t out_port = output_ports[in_port];
+	struct rte_eth_dev_tx_buffer *buffer = tx_buffer[out_port];
+
+	sent = rte_eth_tx_buffer(out_port, client_id, buffer, buf);
+	if (sent)
+		tx_stats->tx[out_port] += sent;
 
-	enqueue_packet(buf, out_port);
 }
 
 /*
@@ -229,6 +241,7 @@ main(int argc, char *argv[])
 	int need_flush = 0; /* indicates whether we have unsent packets */
 	int retval;
 	void *pkts[PKT_READ_SIZE];
+	uint16_t sent;
 
 	if ((retval = rte_eal_init(argc, argv)) < 0)
 		return -1;
@@ -274,8 +287,12 @@ main(int argc, char *argv[])
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
-				for (port = 0; port < ports->num_ports; port++)
-					send_packets(ports->id[port]);
+				for (port = 0; port < ports->num_ports; port++) {
+					sent = rte_eth_tx_buffer_flush(ports->id[port], client_id,
+							tx_buffer[port]);
+					if (unlikely(sent))
+						tx_stats->tx[port] += sent;
+				}
 			need_flush = 0;
 			continue;
 		}
diff --git a/examples/multi_process/l2fwd_fork/main.c b/examples/multi_process/l2fwd_fork/main.c
index f2d7eab..2cab70e 100644
--- a/examples/multi_process/l2fwd_fork/main.c
+++ b/examples/multi_process/l2fwd_fork/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -127,11 +127,11 @@ struct mbuf_table {
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 struct lcore_resource_struct {
 	int enabled;	/* Only set in case this lcore involved into packet forwarding */
 	int flags; 	    /* Set only slave need to restart or recreate */
@@ -583,58 +583,14 @@ slave_exit_cb(unsigned slaveid, __attribute__((unused))int stat)
 	rte_spinlock_unlock(&res_lock);
 }
 
-/* Send the packet on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -646,7 +602,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -655,11 +614,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -699,13 +661,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 		}
 
@@ -1144,6 +1108,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_count_unsent_packet_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 1d9a86f..15bb900 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -39,6 +39,7 @@
 #include <rte_errno.h>
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
+#include <rte_malloc.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
@@ -54,7 +55,7 @@
 
 #define RING_SIZE 16384
 
-/* uncommnet below line to enable debug logs */
+/* uncomment below line to enable debug logs */
 /* #define DEBUG */
 
 #ifdef DEBUG
@@ -86,11 +87,6 @@ struct send_thread_args {
 	struct rte_reorder_buffer *buffer;
 };
 
-struct output_buffer {
-	unsigned count;
-	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-};
-
 volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
@@ -235,6 +231,68 @@ parse_args(int argc, char **argv)
 	return 0;
 }
 
+/*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata __rte_unused) {
+
+	/* free the mbufs which failed from transmit */
+	app_stats.tx.ro_tx_failed_pkts += count;
+	LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
+	pktmbuf_free_bulk(unsent, count);
+
+}
+
+static inline int
+free_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[]) {
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		rte_free(tx_buffer[port_id]);
+	}
+	return 0;
+}
+
+static inline int
+configure_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[])
+{
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+	int ret;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		/* Initialize TX buffers */
+		tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKTS_BURST), 0,
+				rte_eth_dev_socket_id(port_id));
+		if (tx_buffer[port_id] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) port_id);
+
+		rte_eth_tx_buffer_init(tx_buffer[port_id], MAX_PKTS_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+				flush_tx_error_callback, NULL);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+	}
+	return 0;
+}
+
 static inline int
 configure_eth_port(uint8_t port_id)
 {
@@ -438,22 +496,6 @@ worker_thread(void *args_ptr)
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.ro_tx_pkts += nb_tx;
-
-	if (unlikely(nb_tx < outbuf->count)) {
-		/* free the mbufs which failed from transmit */
-		app_stats.tx.ro_tx_failed_pkts += (outbuf->count - nb_tx);
-		LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
-		pktmbuf_free_bulk(&outbuf->mbufs[nb_tx], outbuf->count - nb_tx);
-	}
-	outbuf->count = 0;
-}
-
 /**
  * Dequeue mbufs from the workers_to_tx ring and reorder them before
  * transmitting.
@@ -465,12 +507,15 @@ send_thread(struct send_thread_args *args)
 	unsigned int i, dret;
 	uint16_t nb_dq_mbufs;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
 	struct rte_mbuf *rombufs[MAX_PKTS_BURST] = {NULL};
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, rte_lcore_id());
 
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -515,7 +560,7 @@ send_thread(struct send_thread_args *args)
 		dret = rte_reorder_drain(args->buffer, rombufs, MAX_PKTS_BURST);
 		for (i = 0; i < dret; i++) {
 
-			struct output_buffer *outbuf;
+			struct rte_eth_dev_tx_buffer *outbuf;
 			uint8_t outp1;
 
 			outp1 = rombufs[i]->port;
@@ -525,12 +570,15 @@ send_thread(struct send_thread_args *args)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp1];
-			outbuf->mbufs[outbuf->count++] = rombufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp1);
+			outbuf = tx_buffer[outp1];
+			sent = rte_eth_tx_buffer(outp1, 0, outbuf, rombufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
+
+	free_tx_buffers(tx_buffer);
+
 	return 0;
 }
 
@@ -542,12 +590,16 @@ tx_thread(struct rte_ring *ring_in)
 {
 	uint32_t i, dqnum;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-	struct output_buffer *outbuf;
+	struct rte_eth_dev_tx_buffer *outbuf;
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__,
 							rte_lcore_id());
+
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -567,10 +619,10 @@ tx_thread(struct rte_ring *ring_in)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp];
-			outbuf->mbufs[outbuf->count++] = mbufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp);
+			outbuf = tx_buffer[outp];
+			sent = rte_eth_tx_buffer(outp, 0, outbuf, mbufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
 
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index 0de5e7f..b968b00 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -36,6 +36,7 @@
 
 #include <rte_common.h>
 #include <rte_eal.h>
+#include <rte_malloc.h>
 #include <rte_mempool.h>
 #include <rte_ethdev.h>
 #include <rte_cycles.h>
@@ -118,9 +119,7 @@ static struct rte_eth_conf port_conf = {
 static uint8_t port_rx;
 static uint8_t port_tx;
 static struct rte_mbuf *pkts_rx[PKT_RX_BURST_MAX];
-static struct rte_mbuf *pkts_tx[PKT_TX_BURST_MAX];
-static uint16_t pkts_tx_len = 0;
-
+struct rte_eth_dev_tx_buffer *tx_buffer;
 
 struct rte_meter_srtcm_params app_srtcm_params[] = {
 	{.cir = 1000000 * 46,  .cbs = 2048, .ebs = 2048},
@@ -188,27 +187,8 @@ main_loop(__attribute__((unused)) void *dummy)
 		current_time = rte_rdtsc();
 		time_diff = current_time - last_time;
 		if (unlikely(time_diff > TIME_TX_DRAIN)) {
-			int ret;
-
-			if (pkts_tx_len == 0) {
-				last_time = current_time;
-
-				continue;
-			}
-
-			/* Write packet burst to NIC TX */
-			ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, pkts_tx_len);
-
-			/* Free buffers for any packets not written successfully */
-			if (unlikely(ret < pkts_tx_len)) {
-				for ( ; ret < pkts_tx_len; ret ++) {
-					rte_pktmbuf_free(pkts_tx[ret]);
-				}
-			}
-
-			/* Empty the output buffer */
-			pkts_tx_len = 0;
-
+			/* Flush tx buffer */
+			rte_eth_tx_buffer_flush(port_tx, NIC_TX_QUEUE, tx_buffer);
 			last_time = current_time;
 		}
 
@@ -222,26 +202,8 @@ main_loop(__attribute__((unused)) void *dummy)
 			/* Handle current packet */
 			if (app_pkt_handle(pkt, current_time) == DROP)
 				rte_pktmbuf_free(pkt);
-			else {
-				pkts_tx[pkts_tx_len] = pkt;
-				pkts_tx_len ++;
-			}
-
-			/* Write packets from output buffer to NIC TX when full burst is available */
-			if (unlikely(pkts_tx_len == PKT_TX_BURST_MAX)) {
-				/* Write packet burst to NIC TX */
-				int ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, PKT_TX_BURST_MAX);
-
-				/* Free buffers for any packets not written successfully */
-				if (unlikely(ret < PKT_TX_BURST_MAX)) {
-					for ( ; ret < PKT_TX_BURST_MAX; ret ++) {
-						rte_pktmbuf_free(pkts_tx[ret]);
-					}
-				}
-
-				/* Empty the output buffer */
-				pkts_tx_len = 0;
-			}
+			else
+				rte_eth_tx_buffer(port_tx, NIC_TX_QUEUE, tx_buffer, pkt);
 		}
 	}
 }
@@ -397,6 +359,15 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d TX queue setup error (%d)\n", port_tx, ret);
 
+	tx_buffer = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(PKT_TX_BURST_MAX), 0,
+			rte_eth_dev_socket_id(port_tx));
+	if (tx_buffer == NULL)
+		rte_exit(EXIT_FAILURE, "Port %d TX buffer allocation error\n",
+				port_tx);
+
+	rte_eth_tx_buffer_init(tx_buffer, PKT_TX_BURST_MAX);
+
 	ret = rte_eth_dev_start(port_rx);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d start error (%d)\n", port_rx, ret);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 2/2] examples: rework to use " Tomasz Kulasek
@ 2016-02-25 16:17   ` Ananyev, Konstantin
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
  3 siblings, 0 replies; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-02-25 16:17 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Wednesday, February 24, 2016 5:09 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev
> 
> Many sample apps include internal buffering for single-packet-at-a-time
> operation. Since this is such a common paradigm, this functionality is
> better suited to being implemented in the ethdev API.
> 
> The new APIs in the ethdev library are:
> * rte_eth_tx_buffer_init - initialize buffer
> * rte_eth_tx_buffer - buffer up a single packet for future transmission
> * rte_eth_tx_buffer_flush - flush any unsent buffered packets
> * rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
>   case transmitting a buffered burst fails. By default, we just free the
>   unsent packets.
> 
> As well as these, an additional reference callback is provided, which
> frees the packets (as the default callback does), as well as updating a
> user-provided counter, so that the number of dropped packets can be
> tracked.
> 
> Due to the feedback from mailing list, that buffer management facilities
> in the user application are more preferable than API simplicity, we decided
> to move internal buffer table, as well as callback functions and user data,
> from rte_eth_dev/rte_eth_dev_data to the application space.
> It prevents ABI breakage and gives some more flexibility in the buffer's
> management such as allocation, dynamical size change, reuse buffers on many
> ports or after fail, and so on.
> 
> 
> The following steps illustrate how tx buffers can be used in application:
> 
> 1) Initialization
> 
> a) Allocate memory for a buffer
> 
>    struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
>            RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);
> 
>    RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
>    "size" packets in buffer.
> 
> b) Initialize allocated memory and set up default values. Threshold level
>    must be lower than or equal to the MAX_PKT_BURST from 1a)
> 
>    rte_eth_tx_buffer_init(buffer, threshold);
> 
> 
> c) Set error callback (optional)
> 
>    rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);
> 
> 
> 2) Store packet "pkt" in buffer and send them all to the queue_id on
>    port_id when number of packets reaches threshold level set up in 1b)
> 
>    rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);
> 
> 
> 3) Send all stored packets to the queue_id on port_id
> 
>    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> 
> 
> 4) Flush buffer and free memory
> 
>    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>    ...
>    rte_free(buffer);
> 
> 
> v2 changes:
>  - reworked to use new buffer model
>  - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
>    so this patch doesn't brake an ABI anymore
>  - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
>  - buffers are not attached to the port-queue
>  - buffers can be allocated dynamically during application work
>  - size of buffer can be changed without port restart
> 
> 
> Tomasz Kulasek (2):
>   ethdev: add buffered tx api
>   examples: rework to use buffered tx
> 
>  examples/l2fwd-jobstats/main.c                     |  104 +++++------
>  examples/l2fwd-keepalive/main.c                    |  100 ++++-------
>  examples/l2fwd/main.c                              |  104 +++++------
>  examples/l3fwd-acl/main.c                          |   92 ++++------
>  examples/l3fwd-power/main.c                        |   89 ++++------
>  examples/link_status_interrupt/main.c              |  107 +++++-------
>  .../client_server_mp/mp_client/client.c            |  101 ++++++-----
>  examples/multi_process/l2fwd_fork/main.c           |   97 +++++------
>  examples/packet_ordering/main.c                    |  122 +++++++++----
>  examples/qos_meter/main.c                          |   61 ++-----
>  lib/librte_ether/rte_ethdev.c                      |   36 ++++
>  lib/librte_ether/rte_ethdev.h                      |  182 +++++++++++++++++++-
>  lib/librte_ether/rte_ether_version.map             |    9 +
>  13 files changed, 662 insertions(+), 542 deletions(-)
> 


Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> --
> 1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-03-08 22:52     ` Thomas Monjalon
  2016-03-09 13:36       ` Ananyev, Konstantin
  2016-03-09 16:35       ` Kulasek, TomaszX
  0 siblings, 2 replies; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-08 22:52 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

Hi,

It is an overlay on the tx burst API.
Probably it doesn't hurt to add it but we have to be really cautious
with the API definition to try keeping it stable in the future.

2016-02-24 18:08, Tomasz Kulasek:
> +/**
> + * Structure used to buffer packets for future TX
> + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
> + */
> +struct rte_eth_dev_tx_buffer {
> +	unsigned nb_pkts;

What about "length"?
Why is it unsigned and the size is uint16_t?

> +	uint64_t errors;
> +	/**< Total number of queue packets to sent that are dropped. */

The errors are passed as userdata to the default callback.
If we really want to have this kind of counter, we can define our
own callback. So why defining this field as standard?
I would like to keep it as simple as possible.

> +	buffer_tx_error_fn cbfn;

Why not simply "callback" as name?

> +	void *userdata;
> +	uint16_t size;           /**< Size of buffer for buffered tx */
> +	struct rte_mbuf *pkts[];
> +};

What is the benefit of exposing this structure in the API,
except that it is used in some inline functions?

> +static inline uint16_t
> +rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id,
> +		struct rte_eth_dev_tx_buffer *buffer)
> +{
> +	uint16_t sent;
> +
> +	uint16_t to_send = buffer->nb_pkts;
> +
> +	if (to_send == 0)
> +		return 0;

Why this check is done in the lib?
What is the performance gain if we are idle?
It can be done outside if needed.

> +	sent = rte_eth_tx_burst(port_id, queue_id, buffer->pkts, to_send);
> +
> +	buffer->nb_pkts = 0;
> +
> +	/* All packets sent, or to be dealt with by callback below */
> +	if (unlikely(sent != to_send))
> +		buffer->cbfn(&buffer->pkts[sent], to_send - sent,
> +				buffer->userdata);
> +
> +	return sent;
> +}
[...]
> +/**
> + * Callback function for tracking unsent buffered packets.
> + *
> + * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
> + * adjust the default behaviour when buffered packets cannot be sent. This
> + * function drops any unsent packets, but also updates a user-supplied counter
> + * to track the overall number of packets dropped. The counter should be an
> + * uint64_t variable.
> + *
> + * NOTE: this function should not be called directly, instead it should be used
> + *       as a callback for packet buffering.
> + *
> + * NOTE: when configuring this function as a callback with
> + *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
> + *       should point to an uint64_t value.

Please forget this idea of counter in the default callback.

[...]
> +void
> +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
> +		void *userdata);

What about rte_eth_tx_buffer_default_callback as name?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-08 22:52     ` Thomas Monjalon
@ 2016-03-09 13:36       ` Ananyev, Konstantin
  2016-03-09 14:25         ` Thomas Monjalon
  2016-03-09 16:35       ` Kulasek, TomaszX
  1 sibling, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-03-09 13:36 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev

Hi Thomas,

> 
> Hi,
> 
> It is an overlay on the tx burst API.
> Probably it doesn't hurt to add it but we have to be really cautious
> with the API definition to try keeping it stable in the future.
> 
> 2016-02-24 18:08, Tomasz Kulasek:
> > +/**
> > + * Structure used to buffer packets for future TX
> > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
> > + */
> > +struct rte_eth_dev_tx_buffer {
> > +	unsigned nb_pkts;
> 
> What about "length"?
> Why is it unsigned and the size is uint16_t?

Good point,  yes need to make it consistent.

> 
> > +	uint64_t errors;
> > +	/**< Total number of queue packets to sent that are dropped. */
> 
> The errors are passed as userdata to the default callback.
> If we really want to have this kind of counter, we can define our
> own callback. So why defining this field as standard?
> I would like to keep it as simple as possible.
> 
> > +	buffer_tx_error_fn cbfn;
> 
> Why not simply "callback" as name?
> 
> > +	void *userdata;
> > +	uint16_t size;           /**< Size of buffer for buffered tx */
> > +	struct rte_mbuf *pkts[];
> > +};
> 
> What is the benefit of exposing this structure in the API,
> except that it is used in some inline functions?
> 

I described the benefits, I think it provides here:
http://dpdk.org/ml/archives/dev/2016-February/033058.html

> > +static inline uint16_t
> > +rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id,
> > +		struct rte_eth_dev_tx_buffer *buffer)
> > +{
> > +	uint16_t sent;
> > +
> > +	uint16_t to_send = buffer->nb_pkts;
> > +
> > +	if (to_send == 0)
> > +		return 0;
> 
> Why this check is done in the lib?
> What is the performance gain if we are idle?
> It can be done outside if needed.

Yes, that could be done outside, but if user has to do it anyway,
why not to put it inside?
I don't expect any performance gain/loss because of that -
just seems a bit more convenient to the user.

Konstantin

> 
> > +	sent = rte_eth_tx_burst(port_id, queue_id, buffer->pkts, to_send);
> > +
> > +	buffer->nb_pkts = 0;
> > +
> > +	/* All packets sent, or to be dealt with by callback below */
> > +	if (unlikely(sent != to_send))
> > +		buffer->cbfn(&buffer->pkts[sent], to_send - sent,
> > +				buffer->userdata);
> > +
> > +	return sent;
> > +}
> [...]
> > +/**
> > + * Callback function for tracking unsent buffered packets.
> > + *
> > + * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
> > + * adjust the default behaviour when buffered packets cannot be sent. This
> > + * function drops any unsent packets, but also updates a user-supplied counter
> > + * to track the overall number of packets dropped. The counter should be an
> > + * uint64_t variable.
> > + *
> > + * NOTE: this function should not be called directly, instead it should be used
> > + *       as a callback for packet buffering.
> > + *
> > + * NOTE: when configuring this function as a callback with
> > + *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
> > + *       should point to an uint64_t value.
> 
> Please forget this idea of counter in the default callback.
> 
> [...]
> > +void
> > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
> > +		void *userdata);
> 
> What about rte_eth_tx_buffer_default_callback as name?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 13:36       ` Ananyev, Konstantin
@ 2016-03-09 14:25         ` Thomas Monjalon
  2016-03-09 15:23           ` Ananyev, Konstantin
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 14:25 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-03-09 13:36, Ananyev, Konstantin:
> > > +   if (to_send == 0)
> > > +           return 0;
> > 
> > Why this check is done in the lib?
> > What is the performance gain if we are idle?
> > It can be done outside if needed.
> 
> Yes, that could be done outside, but if user has to do it anyway,
> why not to put it inside?
> I don't expect any performance gain/loss because of that -
> just seems a bit more convenient to the user.

It is handling an idle case so there is no gain obviously.
But the condition branching is surely a loss.
So why the user would you like to do this check?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 14:25         ` Thomas Monjalon
@ 2016-03-09 15:23           ` Ananyev, Konstantin
  2016-03-09 15:26             ` Thomas Monjalon
  0 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-03-09 15:23 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

> 
> 2016-03-09 13:36, Ananyev, Konstantin:
> > > > +   if (to_send == 0)
> > > > +           return 0;
> > >
> > > Why this check is done in the lib?
> > > What is the performance gain if we are idle?
> > > It can be done outside if needed.
> >
> > Yes, that could be done outside, but if user has to do it anyway,
> > why not to put it inside?
> > I don't expect any performance gain/loss because of that -
> > just seems a bit more convenient to the user.
> 
> It is handling an idle case so there is no gain obviously.
> But the condition branching is surely a loss.

I suppose that condition should always be checked:
either in user code prior to function call or inside the
function call itself.
So don't expect any difference in performance here...
Do you have any particular example when you think it would? 
Or are you talking about rte_eth_tx_buffer() calling
rte_eth_tx_buffer_flush() internally?
For that one - both are flush is 'static inline' , so I expect
compiler be smart enough to remove this redundant check.  

> So why the user would you like to do this check?
Just for user convenience - to save him doing that manually.

Konstantin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:23           ` Ananyev, Konstantin
@ 2016-03-09 15:26             ` Thomas Monjalon
  2016-03-09 15:32               ` Kulasek, TomaszX
  2016-03-09 15:42               ` Ananyev, Konstantin
  0 siblings, 2 replies; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 15:26 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-03-09 15:23, Ananyev, Konstantin:
> > 
> > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > +   if (to_send == 0)
> > > > > +           return 0;
> > > >
> > > > Why this check is done in the lib?
> > > > What is the performance gain if we are idle?
> > > > It can be done outside if needed.
> > >
> > > Yes, that could be done outside, but if user has to do it anyway,
> > > why not to put it inside?
> > > I don't expect any performance gain/loss because of that -
> > > just seems a bit more convenient to the user.
> > 
> > It is handling an idle case so there is no gain obviously.
> > But the condition branching is surely a loss.
> 
> I suppose that condition should always be checked:
> either in user code prior to function call or inside the
> function call itself.
> So don't expect any difference in performance here...
> Do you have any particular example when you think it would? 
> Or are you talking about rte_eth_tx_buffer() calling
> rte_eth_tx_buffer_flush() internally?
> For that one - both are flush is 'static inline' , so I expect
> compiler be smart enough to remove this redundant check.  
> 
> > So why the user would you like to do this check?
> Just for user convenience - to save him doing that manually.

Probably I've missed something. If we remove this check, the function
will do nothing, right? How is it changing the behaviour?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:26             ` Thomas Monjalon
@ 2016-03-09 15:32               ` Kulasek, TomaszX
  2016-03-09 15:37                 ` Thomas Monjalon
  2016-03-09 15:42               ` Ananyev, Konstantin
  1 sibling, 1 reply; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-03-09 15:32 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 9, 2016 16:27
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 15:23, Ananyev, Konstantin:
> > >
> > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > +   if (to_send == 0)
> > > > > > +           return 0;
> > > > >
> > > > > Why this check is done in the lib?
> > > > > What is the performance gain if we are idle?
> > > > > It can be done outside if needed.
> > > >
> > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > why not to put it inside?
> > > > I don't expect any performance gain/loss because of that - just
> > > > seems a bit more convenient to the user.
> > >
> > > It is handling an idle case so there is no gain obviously.
> > > But the condition branching is surely a loss.
> >
> > I suppose that condition should always be checked:
> > either in user code prior to function call or inside the function call
> > itself.
> > So don't expect any difference in performance here...
> > Do you have any particular example when you think it would?
> > Or are you talking about rte_eth_tx_buffer() calling
> > rte_eth_tx_buffer_flush() internally?
> > For that one - both are flush is 'static inline' , so I expect
> > compiler be smart enough to remove this redundant check.
> >
> > > So why the user would you like to do this check?
> > Just for user convenience - to save him doing that manually.
> 
> Probably I've missed something. If we remove this check, the function will
> do nothing, right? How is it changing the behaviour?

If we remove this check, function will try to send 0 packets and check condition for error. So we gain nothing with removing that.

Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:32               ` Kulasek, TomaszX
@ 2016-03-09 15:37                 ` Thomas Monjalon
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 15:37 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev

2016-03-09 15:32, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-03-09 15:23, Ananyev, Konstantin:
> > > >
> > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > +   if (to_send == 0)
> > > > > > > +           return 0;
> > > > > >
> > > > > > Why this check is done in the lib?
> > > > > > What is the performance gain if we are idle?
> > > > > > It can be done outside if needed.
> > > > >
> > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > why not to put it inside?
> > > > > I don't expect any performance gain/loss because of that - just
> > > > > seems a bit more convenient to the user.
> > > >
> > > > It is handling an idle case so there is no gain obviously.
> > > > But the condition branching is surely a loss.
> > >
> > > I suppose that condition should always be checked:
> > > either in user code prior to function call or inside the function call
> > > itself.
> > > So don't expect any difference in performance here...
> > > Do you have any particular example when you think it would?
> > > Or are you talking about rte_eth_tx_buffer() calling
> > > rte_eth_tx_buffer_flush() internally?
> > > For that one - both are flush is 'static inline' , so I expect
> > > compiler be smart enough to remove this redundant check.
> > >
> > > > So why the user would you like to do this check?
> > > Just for user convenience - to save him doing that manually.
> > 
> > Probably I've missed something. If we remove this check, the function will
> > do nothing, right? How is it changing the behaviour?
> 
> If we remove this check, function will try to send 0 packets and check
> condition for error. So we gain nothing with removing that.

Actually I should not arguing why removing it,
but you should arguing why adding it :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:26             ` Thomas Monjalon
  2016-03-09 15:32               ` Kulasek, TomaszX
@ 2016-03-09 15:42               ` Ananyev, Konstantin
  2016-03-09 15:52                 ` Thomas Monjalon
  1 sibling, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-03-09 15:42 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 09, 2016 3:27 PM
> To: Ananyev, Konstantin
> Cc: Kulasek, TomaszX; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 15:23, Ananyev, Konstantin:
> > >
> > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > +   if (to_send == 0)
> > > > > > +           return 0;
> > > > >
> > > > > Why this check is done in the lib?
> > > > > What is the performance gain if we are idle?
> > > > > It can be done outside if needed.
> > > >
> > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > why not to put it inside?
> > > > I don't expect any performance gain/loss because of that -
> > > > just seems a bit more convenient to the user.
> > >
> > > It is handling an idle case so there is no gain obviously.
> > > But the condition branching is surely a loss.
> >
> > I suppose that condition should always be checked:
> > either in user code prior to function call or inside the
> > function call itself.
> > So don't expect any difference in performance here...
> > Do you have any particular example when you think it would?
> > Or are you talking about rte_eth_tx_buffer() calling
> > rte_eth_tx_buffer_flush() internally?
> > For that one - both are flush is 'static inline' , so I expect
> > compiler be smart enough to remove this redundant check.
> >
> > > So why the user would you like to do this check?
> > Just for user convenience - to save him doing that manually.
> 
> Probably I've missed something. If we remove this check, the function
> will do nothing, right? How is it changing the behaviour?

If we'll remove that check, then 
rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
will be called.
So in that case it might be even slower, as we'll have to do a proper call.
Of course user can avoid it by:

If(tx_buffer->nb_pkts != 0)
	rte_eth_tx_buffer_flush(port, queue, tx_buffer);

But as I said what for to force user to do that?
Why not to  make this check inside the function? 
Konstantin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:42               ` Ananyev, Konstantin
@ 2016-03-09 15:52                 ` Thomas Monjalon
  2016-03-09 16:17                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 15:52 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-03-09 15:42, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-03-09 15:23, Ananyev, Konstantin:
> > > >
> > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > +   if (to_send == 0)
> > > > > > > +           return 0;
> > > > > >
> > > > > > Why this check is done in the lib?
> > > > > > What is the performance gain if we are idle?
> > > > > > It can be done outside if needed.
> > > > >
> > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > why not to put it inside?
> > > > > I don't expect any performance gain/loss because of that -
> > > > > just seems a bit more convenient to the user.
> > > >
> > > > It is handling an idle case so there is no gain obviously.
> > > > But the condition branching is surely a loss.
> > >
> > > I suppose that condition should always be checked:
> > > either in user code prior to function call or inside the
> > > function call itself.
> > > So don't expect any difference in performance here...
> > > Do you have any particular example when you think it would?
> > > Or are you talking about rte_eth_tx_buffer() calling
> > > rte_eth_tx_buffer_flush() internally?
> > > For that one - both are flush is 'static inline' , so I expect
> > > compiler be smart enough to remove this redundant check.
> > >
> > > > So why the user would you like to do this check?
> > > Just for user convenience - to save him doing that manually.
> > 
> > Probably I've missed something. If we remove this check, the function
> > will do nothing, right? How is it changing the behaviour?
> 
> If we'll remove that check, then 
> rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> will be called.
> So in that case it might be even slower, as we'll have to do a proper call.

If there is no packet, we have time to do a useless call.

> Of course user can avoid it by:
> 
> If(tx_buffer->nb_pkts != 0)
> 	rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> 
> But as I said what for to force user to do that?
> Why not to  make this check inside the function?

Because it may be slower when there are some packets
and will "accelerate" only the no-packet case.

We do not progress in this discussion. It is not a big deal, just a non sense.
So I agree to keep it if we change the website to announce that DPDK
accelerates the idle processing ;)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 15:52                 ` Thomas Monjalon
@ 2016-03-09 16:17                   ` Ananyev, Konstantin
  2016-03-09 16:21                     ` Thomas Monjalon
  0 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-03-09 16:17 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 09, 2016 3:52 PM
> To: Ananyev, Konstantin
> Cc: Kulasek, TomaszX; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 15:42, Ananyev, Konstantin:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-03-09 15:23, Ananyev, Konstantin:
> > > > >
> > > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > > +   if (to_send == 0)
> > > > > > > > +           return 0;
> > > > > > >
> > > > > > > Why this check is done in the lib?
> > > > > > > What is the performance gain if we are idle?
> > > > > > > It can be done outside if needed.
> > > > > >
> > > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > > why not to put it inside?
> > > > > > I don't expect any performance gain/loss because of that -
> > > > > > just seems a bit more convenient to the user.
> > > > >
> > > > > It is handling an idle case so there is no gain obviously.
> > > > > But the condition branching is surely a loss.
> > > >
> > > > I suppose that condition should always be checked:
> > > > either in user code prior to function call or inside the
> > > > function call itself.
> > > > So don't expect any difference in performance here...
> > > > Do you have any particular example when you think it would?
> > > > Or are you talking about rte_eth_tx_buffer() calling
> > > > rte_eth_tx_buffer_flush() internally?
> > > > For that one - both are flush is 'static inline' , so I expect
> > > > compiler be smart enough to remove this redundant check.
> > > >
> > > > > So why the user would you like to do this check?
> > > > Just for user convenience - to save him doing that manually.
> > >
> > > Probably I've missed something. If we remove this check, the function
> > > will do nothing, right? How is it changing the behaviour?
> >
> > If we'll remove that check, then
> > rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> > will be called.
> > So in that case it might be even slower, as we'll have to do a proper call.
> 
> If there is no packet, we have time to do a useless call.

One lcore can do TX for several queues/ports.
Let say we have N queues to handle, but right now traffic is going only through
one of them. 
That means we'll have to do N-1 useless calls and reduce number of cycles
available to send actual traffic.

> 
> > Of course user can avoid it by:
> >
> > If(tx_buffer->nb_pkts != 0)
> > 	rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> >
> > But as I said what for to force user to do that?
> > Why not to  make this check inside the function?
> 
> Because it may be slower when there are some packets
> and will "accelerate" only the no-packet case.
> 
> We do not progress in this discussion.
> It is not a big deal, 

Exactly.

>just a non sense.

Look at what most of current DPDK examples do: they do check manually
does nb_pkts==0 or not, if not call tx_burst().
For me it makes sense to move that check into the library function -
so each and every caller doesn't have to do it manually.

> So I agree to keep it if we change the website to announce that DPDK
> accelerates the idle processing ;)

That's fine by me, but at first I suppose you'll have to provide some data
showing that this approach slowdowns things, right? :)

Konstantin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 16:17                   ` Ananyev, Konstantin
@ 2016-03-09 16:21                     ` Thomas Monjalon
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

2016-03-09 16:17, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-03-09 15:42, Ananyev, Konstantin:
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2016-03-09 15:23, Ananyev, Konstantin:
> > > > > >
> > > > > > 2016-03-09 13:36, Ananyev, Konstantin:
> > > > > > > > > +   if (to_send == 0)
> > > > > > > > > +           return 0;
> > > > > > > >
> > > > > > > > Why this check is done in the lib?
> > > > > > > > What is the performance gain if we are idle?
> > > > > > > > It can be done outside if needed.
> > > > > > >
> > > > > > > Yes, that could be done outside, but if user has to do it anyway,
> > > > > > > why not to put it inside?
> > > > > > > I don't expect any performance gain/loss because of that -
> > > > > > > just seems a bit more convenient to the user.
> > > > > >
> > > > > > It is handling an idle case so there is no gain obviously.
> > > > > > But the condition branching is surely a loss.
> > > > >
> > > > > I suppose that condition should always be checked:
> > > > > either in user code prior to function call or inside the
> > > > > function call itself.
> > > > > So don't expect any difference in performance here...
> > > > > Do you have any particular example when you think it would?
> > > > > Or are you talking about rte_eth_tx_buffer() calling
> > > > > rte_eth_tx_buffer_flush() internally?
> > > > > For that one - both are flush is 'static inline' , so I expect
> > > > > compiler be smart enough to remove this redundant check.
> > > > >
> > > > > > So why the user would you like to do this check?
> > > > > Just for user convenience - to save him doing that manually.
> > > >
> > > > Probably I've missed something. If we remove this check, the function
> > > > will do nothing, right? How is it changing the behaviour?
> > >
> > > If we'll remove that check, then
> > > rte_eth_tx_burst(...,nb_pkts=0)->(*dev->tx_pkt_burst)(...,nb_pkts=0)
> > > will be called.
> > > So in that case it might be even slower, as we'll have to do a proper call.
> > 
> > If there is no packet, we have time to do a useless call.
> 
> One lcore can do TX for several queues/ports.
> Let say we have N queues to handle, but right now traffic is going only through
> one of them. 
> That means we'll have to do N-1 useless calls and reduce number of cycles
> available to send actual traffic.

OK, good justification, thanks.

> > > Of course user can avoid it by:
> > >
> > > If(tx_buffer->nb_pkts != 0)
> > > 	rte_eth_tx_buffer_flush(port, queue, tx_buffer);
> > >
> > > But as I said what for to force user to do that?
> > > Why not to  make this check inside the function?
> > 
> > Because it may be slower when there are some packets
> > and will "accelerate" only the no-packet case.
> > 
> > We do not progress in this discussion.
> > It is not a big deal, 
> 
> Exactly.
> 
> >just a non sense.
> 
> Look at what most of current DPDK examples do: they do check manually
> does nb_pkts==0 or not, if not call tx_burst().
> For me it makes sense to move that check into the library function -
> so each and every caller doesn't have to do it manually.
> 
> > So I agree to keep it if we change the website to announce that DPDK
> > accelerates the idle processing ;)
> 
> That's fine by me, but at first I suppose you'll have to provide some data
> showing that this approach slowdowns things, right? :)

You got me

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-08 22:52     ` Thomas Monjalon
  2016-03-09 13:36       ` Ananyev, Konstantin
@ 2016-03-09 16:35       ` Kulasek, TomaszX
  2016-03-09 17:06         ` Thomas Monjalon
  1 sibling, 1 reply; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-03-09 16:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Tuesday, March 8, 2016 23:52
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> Hi,
> 

[...]

> > +/**
> > + * Callback function for tracking unsent buffered packets.
> > + *
> > + * This function can be passed to
> > +rte_eth_tx_buffer_set_err_callback() to
> > + * adjust the default behaviour when buffered packets cannot be sent.
> > +This
> > + * function drops any unsent packets, but also updates a
> > +user-supplied counter
> > + * to track the overall number of packets dropped. The counter should
> > +be an
> > + * uint64_t variable.
> > + *
> > + * NOTE: this function should not be called directly, instead it should
> be used
> > + *       as a callback for packet buffering.
> > + *
> > + * NOTE: when configuring this function as a callback with
> > + *       rte_eth_tx_buffer_set_err_callback(), the final, userdata
> parameter
> > + *       should point to an uint64_t value.
> 
> Please forget this idea of counter in the default callback.
> 

Ok, I forgot.

> [...]
> > +void
> > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t
> unsent,
> > +		void *userdata);
> 
> What about rte_eth_tx_buffer_default_callback as name?

This function is used now as default to count silently dropped packets and update error counter in tx_buffer structure. When I remove error counter and set silent drop as default behavior, it's better to have two callbacks to choice:

1) silently dropping packets (set as default)
2) as defined above to dropping with counter.

Maybe better is to define two default callbacks while many applications can still update it's internal error counter,
So IHMO these names are more descriptive:

rte_eth_tx_buffer_drop_callback
rte_eth_tx_buffer_count_callback

What you think?

Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 16:35       ` Kulasek, TomaszX
@ 2016-03-09 17:06         ` Thomas Monjalon
  2016-03-09 18:12           ` Kulasek, TomaszX
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-09 17:06 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev

2016-03-09 16:35, Kulasek, TomaszX:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > +void
> > > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t
> > unsent,
> > > +		void *userdata);
> > 
> > What about rte_eth_tx_buffer_default_callback as name?
> 
> This function is used now as default to count silently dropped packets and update error counter in tx_buffer structure. When I remove error counter and set silent drop as default behavior, it's better to have two callbacks to choice:
> 
> 1) silently dropping packets (set as default)
> 2) as defined above to dropping with counter.
> 
> Maybe better is to define two default callbacks while many applications can still update it's internal error counter,
> So IHMO these names are more descriptive:
> 
> rte_eth_tx_buffer_drop_callback
> rte_eth_tx_buffer_count_callback
> 
> What you think?

I think you are right about the name.

Are you sure providing a "count" callback is needed?
Is it just to refactor the examples?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
  2016-03-09 17:06         ` Thomas Monjalon
@ 2016-03-09 18:12           ` Kulasek, TomaszX
  0 siblings, 0 replies; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-03-09 18:12 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 9, 2016 18:07
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api
> 
> 2016-03-09 16:35, Kulasek, TomaszX:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > +void
> > > > +rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts,
> > > > +uint16_t
> > > unsent,
> > > > +		void *userdata);
> > >
> > > What about rte_eth_tx_buffer_default_callback as name?
> >
> > This function is used now as default to count silently dropped packets
> and update error counter in tx_buffer structure. When I remove error
> counter and set silent drop as default behavior, it's better to have two
> callbacks to choice:
> >
> > 1) silently dropping packets (set as default)
> > 2) as defined above to dropping with counter.
> >
> > Maybe better is to define two default callbacks while many
> > applications can still update it's internal error counter, So IHMO these
> names are more descriptive:
> >
> > rte_eth_tx_buffer_drop_callback
> > rte_eth_tx_buffer_count_callback
> >
> > What you think?
> 
> I think you are right about the name.
> 
> Are you sure providing a "count" callback is needed?
> Is it just to refactor the examples?

I think it's useful to have a callback which let you easily to track the overall number of packets dropped. It's handy when you want to drop packets and not leave them untracked.

It's good to have it, but it's not critical.

Changing the examples is not a problem while I've got copy-paste superpower.

Tomasz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev
  2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
                     ` (2 preceding siblings ...)
  2016-02-25 16:17   ` [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
@ 2016-03-10 10:57   ` Tomasz Kulasek
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api Tomasz Kulasek
                       ` (3 more replies)
  3 siblings, 4 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 10:57 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callbacks are provided, which
frees the packets:

* rte_eth_tx_buffer_drop_callback - silently drop packets (default
  behavior)
* rte_eth_tx_buffer_count_callback - drop and update user-provided counter
  to track the number of dropped packets

Due to the feedback from mailing list, that buffer management facilities
in the user application are more preferable than API simplicity, we decided
to move internal buffer table, as well as callback functions and user data,
from rte_eth_dev/rte_eth_dev_data to the application space.
It prevents ABI breakage and gives some more flexibility in the buffer's
management such as allocation, dynamical size change, reuse buffers on many
ports or after fail, and so on.


The following steps illustrate how tx buffers can be used in application:

1) Initialization

a) Allocate memory for a buffer

   struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
           RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);

   RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
   "size" packets in buffer.

b) Initialize allocated memory and set up default values. Threshold level
   must be lower than or equal to the MAX_PKT_BURST from 1a)

   rte_eth_tx_buffer_init(buffer, threshold);


c) Set error callback (optional)

   rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);


2) Store packet "pkt" in buffer and send them all to the queue_id on
   port_id when number of packets reaches threshold level set up in 1b)

   rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);


3) Send all stored packets to the queue_id on port_id

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);


4) Flush buffer and free memory

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
   ...
   rte_free(buffer);

v3 changes:
 - error counter removed from tx buffer structure, now default behavior is
   silent drop of unsent packets
 - some names was changed in tx buffer structure to be more descriptive
 - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
   rte_eth_tx_buffer_count_callback

v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart

Tomasz Kulasek (2):
  ethdev: add buffered tx api
  examples: rework to use buffered tx

 examples/l2fwd-jobstats/main.c                     |  104 ++++------
 examples/l2fwd-keepalive/main.c                    |  100 ++++------
 examples/l2fwd/main.c                              |  104 ++++------
 examples/l3fwd-acl/main.c                          |   92 ++++-----
 examples/l3fwd-power/main.c                        |   89 ++++-----
 examples/link_status_interrupt/main.c              |  107 ++++------
 .../client_server_mp/mp_client/client.c            |  101 ++++++----
 examples/multi_process/l2fwd_fork/main.c           |   97 ++++-----
 examples/packet_ordering/main.c                    |  122 ++++++++----
 examples/qos_meter/main.c                          |   61 ++----
 lib/librte_ether/rte_ethdev.c                      |   46 +++++
 lib/librte_ether/rte_ethdev.h                      |  205 +++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map             |   10 +
 13 files changed, 696 insertions(+), 542 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
@ 2016-03-10 10:57     ` Tomasz Kulasek
  2016-03-10 16:23       ` Thomas Monjalon
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 2/2] examples: rework to use buffered tx Tomasz Kulasek
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 10:57 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callbacks are provided, which
frees the packets:

* rte_eth_tx_buffer_drop_callback - silently drop packets (default
  behavior)
* rte_eth_tx_buffer_count_callback - drop and update user-provided counter
  to track the number of dropped packets

v3 changes:
 - error counter removed from tx buffer structure, now default behavior is
   silent drop of unsent packets
 - some names was changed in tx buffer structure to be more descriptive
 - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
   rte_eth_tx_buffer_count_callback

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 lib/librte_ether/rte_ethdev.c          |   46 +++++++
 lib/librte_ether/rte_ethdev.h          |  205 +++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map |   10 ++
 3 files changed, 260 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5c2b416..b682af4 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1271,6 +1271,52 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 }
 
 void
+rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata __rte_unused)
+{
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+}
+
+void
+rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata)
+{
+	uint64_t *count = userdata;
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+
+	*count += unsent;
+}
+
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn cbfn, void *userdata)
+{
+	buffer->callback = cbfn;
+	buffer->userdata = userdata;
+	return 0;
+}
+
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
+{
+	if (buffer == NULL)
+		return -EINVAL;
+
+	buffer->size = size;
+	if (buffer->callback == NULL)
+		rte_eth_tx_buffer_set_err_callback(buffer,
+				rte_eth_tx_buffer_drop_callback, NULL);
+
+	return 0;
+}
+
+void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d53e362..c457728 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -2655,6 +2655,209 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata);
+
+/**
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+	buffer_tx_error_fn callback;
+	void *userdata;
+	uint16_t size;           /**< Size of buffer for buffered tx */
+	uint16_t length;
+	struct rte_mbuf *pkts[];
+};
+
+/**
+ * Calculate the size of the tx buffer.
+ *
+ * @param sz
+ *   Number of stored packets.
+ */
+#define RTE_ETH_TX_BUFFER_SIZE(sz) \
+	(sizeof(struct rte_eth_dev_tx_buffer) + (sz) * sizeof(struct rte_mbuf *))
+
+/**
+ * Initialize default values for buffered transmitting
+ *
+ * @param buffer
+ *   Tx buffer to be initialized.
+ * @param size
+ *   Buffer size
+ * @return
+ *   0 if no error
+ */
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size);
+
+/**
+ * Send any packets queued up for transmission on a port and HW queue
+ *
+ * This causes an explicit flush of packets previously buffered via the
+ * rte_eth_tx_buffer() function. It returns the number of packets successfully
+ * sent to the NIC, and calls the error callback for any unsent packets. Unless
+ * explicitly set up otherwise, the default callback simply frees the unsent
+ * packets back to the owning mempool.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer of packets to be transmit.
+ * @return
+ *   The number of packets successfully sent to the Ethernet device. The error
+ *   callback is called for any packets which could not be sent.
+ */
+static inline uint16_t
+rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer)
+{
+	uint16_t sent;
+	uint16_t to_send = buffer->length;
+
+	if (to_send == 0)
+		return 0;
+
+	sent = rte_eth_tx_burst(port_id, queue_id, buffer->pkts, to_send);
+
+	buffer->length = 0;
+
+	/* All packets sent, or to be dealt with by callback below */
+	if (unlikely(sent != to_send))
+		buffer->callback(&buffer->pkts[sent], to_send - sent,
+				buffer->userdata);
+
+	return sent;
+}
+
+/**
+ * Buffer a single packet for future transmission on a port and queue
+ *
+ * This function takes a single mbuf/packet and buffers it for later
+ * transmission on the particular port and queue specified. Once the buffer is
+ * full of packets, an attempt will be made to transmit all the buffered
+ * packets. In case of error, where not all packets can be transmitted, a
+ * callback is called with the unsent packets as a parameter. If no callback
+ * is explicitly set up, the unsent packets are just freed back to the owning
+ * mempool. The function returns the number of packets actually sent i.e.
+ * 0 if no buffer flush occurred, otherwise the number of packets successfully
+ * flushed
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer used to collect packets to be sent.
+ * @param tx_pkt
+ *   Pointer to the packet mbuf to be sent.
+ * @return
+ *   0 = packet has been buffered for later transmission
+ *   N > 0 = packet has been buffered, and the buffer was subsequently flushed,
+ *     causing N packets to be sent, and the error callback to be called for
+ *     the rest.
+ */
+static inline uint16_t __attribute__((always_inline))
+rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer, struct rte_mbuf *tx_pkt)
+{
+	buffer->pkts[buffer->length++] = tx_pkt;
+	if (buffer->length < buffer->size)
+		return 0;
+
+	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
+}
+
+/**
+ * Configure a callback for buffered packets which cannot be sent
+ *
+ * Register a specific callback to be called when an attempt is made to send
+ * all packets buffered on an ethernet port, but not all packets can
+ * successfully be sent. The callback registered here will be called only
+ * from calls to rte_eth_tx_buffer() and rte_eth_tx_buffer_flush() APIs.
+ * The default callback configured for each queue by default just frees the
+ * packets back to the calling mempool. If additional behaviour is required,
+ * for example, to count dropped packets, or to retry transmission of packets
+ * which cannot be sent, this function should be used to register a suitable
+ * callback function to implement the desired behaviour.
+ * The example callback "rte_eth_count_unsent_packet_callback()" is also
+ * provided as reference.
+ *
+ * @param buffer
+ *   The port identifier of the Ethernet device.
+ * @param callback
+ *   The function to be used as the callback.
+ * @param userdata
+ *   Arbitrary parameter to be passed to the callback function
+ * @return
+ *   0 on success, or -1 on error with rte_errno set appropriately
+ */
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn callback, void *userdata);
+
+/**
+ * Callback function for silently dropping unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behavior when buffered packets cannot be sent. This
+ * function drops any unsent packets silently and is used by tx buffered
+ * operations as default behavior.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an uint64_t value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Not used
+ */
+void
+rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata __rte_unused);
+
+/**
+ * Callback function for tracking unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behavior when buffered packets cannot be sent. This
+ * function drops any unsent packets, but also updates a user-supplied counter
+ * to track the overall number of packets dropped. The counter should be an
+ * uint64_t variable.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an uint64_t value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Pointer to an unsigned long value, which will be incremented by unsent
+ */
+void
+rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata);
+
 /**
  * The eth device event type for interrupt, and maybe others in the future.
  */
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d8db24d..10a4815 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -117,3 +117,13 @@ DPDK_2.2 {
 
 	local: *;
 };
+
+DPDK_16.04 {
+	global:
+
+	rte_eth_tx_buffer_drop_callback;
+	rte_eth_tx_buffer_count_callback;
+	rte_eth_tx_buffer_init;
+	rte_eth_tx_buffer_set_err_callback;
+
+} DPDK_2.2;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] examples: rework to use buffered tx
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-03-10 10:57     ` Tomasz Kulasek
  2016-03-10 11:31     ` [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
  2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
  3 siblings, 0 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 10:57 UTC (permalink / raw)
  To: dev

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset also replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

v3 changes
 - updated due to the change of callback name

v2 changes
 - rework synced with tx buffer API changes

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 examples/l2fwd-jobstats/main.c                     |  104 +++++++----------
 examples/l2fwd-keepalive/main.c                    |  100 ++++++----------
 examples/l2fwd/main.c                              |  104 +++++++----------
 examples/l3fwd-acl/main.c                          |   92 ++++++---------
 examples/l3fwd-power/main.c                        |   89 ++++++--------
 examples/link_status_interrupt/main.c              |  107 +++++++----------
 .../client_server_mp/mp_client/client.c            |  101 +++++++++-------
 examples/multi_process/l2fwd_fork/main.c           |   97 +++++++---------
 examples/packet_ordering/main.c                    |  122 ++++++++++++++------
 examples/qos_meter/main.c                          |   61 +++-------
 10 files changed, 436 insertions(+), 541 deletions(-)

diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 6da60e0..d1e9bf7 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,6 +41,7 @@
 #include <rte_alarm.h>
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,18 +98,12 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	uint64_t next_flush_time;
-	unsigned len;
-	struct rte_mbuf *mbufs[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	uint64_t next_flush_time[RTE_MAX_ETHPORTS];
 
 	struct rte_timer rx_timers[MAX_RX_QUEUE_PER_LCORE];
 	struct rte_jobstats port_fwd_jobs[MAX_RX_QUEUE_PER_LCORE];
@@ -123,6 +118,8 @@ struct lcore_queue_conf {
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -373,59 +370,14 @@ show_stats_cb(__rte_unused void *param)
 	rte_eal_alarm_set(timer_period * US_PER_S, show_stats_cb, NULL);
 }
 
-/* Send the burst of packets on an output interface */
-static void
-l2fwd_send_burst(struct lcore_queue_conf *qconf, uint8_t port)
-{
-	struct mbuf_table *m_table;
-	uint16_t ret;
-	uint16_t queueid = 0;
-	uint16_t n;
-
-	m_table = &qconf->tx_mbufs[port];
-	n = m_table->len;
-
-	m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
-	m_table->len = 0;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table->mbufs, n);
-
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table->mbufs[ret]);
-		} while (++ret < n);
-	}
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	const unsigned lcore_id = rte_lcore_id();
-	struct lcore_queue_conf *qconf = &lcore_queue_conf[lcore_id];
-	struct mbuf_table *m_table = &qconf->tx_mbufs[port];
-	uint16_t len = qconf->tx_mbufs[port].len;
-
-	m_table->mbufs[len] = m;
-
-	len++;
-	m_table->len = len;
-
-	/* Enough pkts to be sent. */
-	if (unlikely(len == MAX_PKT_BURST))
-		l2fwd_send_burst(qconf, port);
-
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -437,7 +389,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 static void
@@ -511,8 +466,10 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	uint64_t now;
 	unsigned lcore_id;
 	struct lcore_queue_conf *qconf;
-	struct mbuf_table *m_table;
 	uint8_t portid;
+	unsigned i;
+	uint32_t sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
@@ -522,14 +479,20 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	now = rte_get_timer_cycles();
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
-	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-		m_table = &qconf->tx_mbufs[portid];
-		if (m_table->len == 0 || m_table->next_flush_time <= now)
+
+	for (i = 0; i < qconf->n_rx_port; i++) {
+		portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+
+		if (qconf->next_flush_time[portid] <= now)
 			continue;
 
-		l2fwd_send_burst(qconf, portid);
-	}
+		buffer = tx_buffer[portid];
+		sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+		if (sent)
+			port_statistics[portid].tx += sent;
 
+		qconf->next_flush_time[portid] = rte_get_timer_cycles() + drain_tsc;
+	}
 
 	/* Pass target to indicate that this job is happy of time interwal
 	 * in which it was called. */
@@ -945,6 +908,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index f4d52f2..94b8677 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,21 +98,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -192,58 +188,14 @@ print_stats(__attribute__((unused)) struct rte_timer *ptr_timer,
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid = 0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -255,7 +207,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -264,12 +219,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
 	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1)
 		/ US_PER_S * BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -312,13 +269,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			prev_tsc = cur_tsc;
@@ -713,6 +672,23 @@ main(int argc, char **argv)
 				"rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index f35d8a1..e175681 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -49,6 +49,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -99,21 +100,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -189,58 +185,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -252,7 +204,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -261,11 +216,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -285,6 +243,7 @@ l2fwd_main_loop(void)
 		portid = qconf->rx_port_list[i];
 		RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
 			portid);
+
 	}
 
 	while (!force_quit) {
@@ -297,13 +256,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -688,6 +649,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f676d14..3a895b7 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -119,11 +119,6 @@ static uint32_t enabled_port_mask;
 static int promiscuous_on; /**< Ports set in promiscuous mode off by default. */
 static int numa_on = 1; /**< NUMA is enabled by default. */
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -187,7 +182,7 @@ static struct rte_mempool *pktmbuf_pool[NB_SOCKETS];
 static inline int
 is_valid_ipv4_pkt(struct ipv4_hdr *pkt, uint32_t link_len);
 #endif
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port);
 
 #define MAX_ACL_RULE_NUM	100000
@@ -1291,56 +1286,26 @@ app_acl_init(void)
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 } __rte_cache_aligned;
 
 static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
 
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1428,20 +1393,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1936,6 +1893,7 @@ main(int argc, char **argv)
 	unsigned lcore_id;
 	uint32_t n_tx_queue, nb_lcores;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* init EAL */
 	ret = rte_eal_init(argc, argv);
@@ -1968,6 +1926,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "app_acl_init failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -2003,6 +1962,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+					RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+					rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -2032,8 +2007,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 828c18a..2ed106b 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 #include <rte_common.h>
 #include <rte_byteorder.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -173,11 +174,6 @@ enum freq_scale_hint_t
 	FREQ_HIGHEST  =       2
 };
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -347,8 +343,10 @@ static lookup_struct_t *ipv4_l3fwd_lookup_struct[NB_SOCKETS];
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 	lookup_struct_t * ipv4_lookup_struct;
 	lookup_struct_t * ipv6_lookup_struct;
 } __rte_cache_aligned;
@@ -442,49 +440,19 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim,
 	stats[lcore_id].sleep_time = 0;
 }
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
 static inline int
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
-
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
 
-	qconf->tx_mbufs[port].len = len;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
+
 	return 0;
 }
 
@@ -905,20 +873,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1579,6 +1539,7 @@ main(int argc, char **argv)
 	uint32_t n_tx_queue, nb_lcores;
 	uint32_t dev_rxq_num, dev_txq_num;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* catch SIGINT and restore cpufreq governor to ondemand */
 	signal(SIGINT, signal_exit_now);
@@ -1614,6 +1575,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "check_port_config failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -1657,6 +1619,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -1689,8 +1667,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index c57a08a..cbc29bc 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,10 +98,6 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
 #define MAX_PKT_BURST 32
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
 
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
@@ -108,11 +105,11 @@ struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
 	unsigned tx_queue_id;
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -202,59 +199,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the packet on an output interface */
-static int
-lsi_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid;
-
-	queueid = (uint16_t) qconf->tx_queue_id;
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-lsi_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		lsi_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port = lsi_dst_ports[portid];
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
@@ -265,7 +217,10 @@ lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&lsi_ports_eth_addr[dst_port], &eth->s_addr);
 
-	lsi_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -275,10 +230,13 @@ lsi_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	unsigned sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -310,15 +268,15 @@ lsi_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			/* this could be optimized (use queueid instead of
-			 * portid), but it is not called so often */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				lsi_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = lsi_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -722,6 +680,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup: err=%d,port=%u\n",
 				  ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
@@ -729,6 +704,8 @@ main(int argc, char **argv)
 				  ret, (unsigned) portid);
 		printf("done:\n");
 
+		rte_eth_promiscuous_enable(portid);
+
 		printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n",
 				(unsigned) portid,
 				lsi_ports_eth_addr[portid].addr_bytes[0],
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index bf049a4..d4f9ca3 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -42,6 +42,7 @@
 #include <string.h>
 
 #include <rte_common.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_eal.h>
@@ -72,17 +73,13 @@
  * queue to write to. */
 static uint8_t client_id = 0;
 
-struct mbuf_queue {
 #define MBQ_CAPACITY 32
-	struct rte_mbuf *bufs[MBQ_CAPACITY];
-	uint16_t top;
-};
 
 /* maps input ports to output ports for packets */
 static uint8_t output_ports[RTE_MAX_ETHPORTS];
 
 /* buffers up a set of packet that are ready to send */
-static struct mbuf_queue output_bufs[RTE_MAX_ETHPORTS];
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 /* shared data from server. We update statistics here */
 static volatile struct tx_stats *tx_stats;
@@ -149,11 +146,51 @@ parse_app_args(int argc, char *argv[])
 }
 
 /*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata) {
+	int i;
+	uint8_t port_id = (uintptr_t)userdata;
+
+	tx_stats->tx_drop[port_id] += count;
+
+	/* free the mbufs which failed from transmit */
+	for (i = 0; i < count; i++)
+		rte_pktmbuf_free(unsent[i]);
+
+}
+
+static void
+configure_tx_buffer(uint8_t port_id, uint16_t size)
+{
+	int ret;
+
+	/* Initialize TX buffers */
+	tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(size), 0,
+			rte_eth_dev_socket_id(port_id));
+	if (tx_buffer[port_id] == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+				(unsigned) port_id);
+
+	rte_eth_tx_buffer_init(tx_buffer[port_id], size);
+
+	ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+			flush_tx_error_callback, (void *)(intptr_t)port_id);
+	if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+}
+
+/*
  * set up output ports so that all traffic on port gets sent out
  * its paired port. Index using actual port numbers since that is
  * what comes in the mbuf structure.
  */
-static void configure_output_ports(const struct port_info *ports)
+static void
+configure_output_ports(const struct port_info *ports)
 {
 	int i;
 	if (ports->num_ports > RTE_MAX_ETHPORTS)
@@ -164,41 +201,11 @@ static void configure_output_ports(const struct port_info *ports)
 		uint8_t p2 = ports->id[i+1];
 		output_ports[p1] = p2;
 		output_ports[p2] = p1;
-	}
-}
 
+		configure_tx_buffer(p1, MBQ_CAPACITY);
+		configure_tx_buffer(p2, MBQ_CAPACITY);
 
-static inline void
-send_packets(uint8_t port)
-{
-	uint16_t i, sent;
-	struct mbuf_queue *mbq = &output_bufs[port];
-
-	if (unlikely(mbq->top == 0))
-		return;
-
-	sent = rte_eth_tx_burst(port, client_id, mbq->bufs, mbq->top);
-	if (unlikely(sent < mbq->top)){
-		for (i = sent; i < mbq->top; i++)
-			rte_pktmbuf_free(mbq->bufs[i]);
-		tx_stats->tx_drop[port] += (mbq->top - sent);
 	}
-	tx_stats->tx[port] += sent;
-	mbq->top = 0;
-}
-
-/*
- * Enqueue a packet to be sent on a particular port, but
- * don't send it yet. Only when the buffer is full.
- */
-static inline void
-enqueue_packet(struct rte_mbuf *buf, uint8_t port)
-{
-	struct mbuf_queue *mbq = &output_bufs[port];
-	mbq->bufs[mbq->top++] = buf;
-
-	if (mbq->top == MBQ_CAPACITY)
-		send_packets(port);
 }
 
 /*
@@ -209,10 +216,15 @@ enqueue_packet(struct rte_mbuf *buf, uint8_t port)
 static void
 handle_packet(struct rte_mbuf *buf)
 {
+	int sent;
 	const uint8_t in_port = buf->port;
 	const uint8_t out_port = output_ports[in_port];
+	struct rte_eth_dev_tx_buffer *buffer = tx_buffer[out_port];
+
+	sent = rte_eth_tx_buffer(out_port, client_id, buffer, buf);
+	if (sent)
+		tx_stats->tx[out_port] += sent;
 
-	enqueue_packet(buf, out_port);
 }
 
 /*
@@ -229,6 +241,7 @@ main(int argc, char *argv[])
 	int need_flush = 0; /* indicates whether we have unsent packets */
 	int retval;
 	void *pkts[PKT_READ_SIZE];
+	uint16_t sent;
 
 	if ((retval = rte_eal_init(argc, argv)) < 0)
 		return -1;
@@ -274,8 +287,12 @@ main(int argc, char *argv[])
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
-				for (port = 0; port < ports->num_ports; port++)
-					send_packets(ports->id[port]);
+				for (port = 0; port < ports->num_ports; port++) {
+					sent = rte_eth_tx_buffer_flush(ports->id[port], client_id,
+							tx_buffer[port]);
+					if (unlikely(sent))
+						tx_stats->tx[port] += sent;
+				}
 			need_flush = 0;
 			continue;
 		}
diff --git a/examples/multi_process/l2fwd_fork/main.c b/examples/multi_process/l2fwd_fork/main.c
index f2d7eab..aebf531 100644
--- a/examples/multi_process/l2fwd_fork/main.c
+++ b/examples/multi_process/l2fwd_fork/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -127,11 +127,11 @@ struct mbuf_table {
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 struct lcore_resource_struct {
 	int enabled;	/* Only set in case this lcore involved into packet forwarding */
 	int flags; 	    /* Set only slave need to restart or recreate */
@@ -583,58 +583,14 @@ slave_exit_cb(unsigned slaveid, __attribute__((unused))int stat)
 	rte_spinlock_unlock(&res_lock);
 }
 
-/* Send the packet on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -646,7 +602,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -655,11 +614,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -699,13 +661,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 		}
 
@@ -1144,6 +1108,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 1d9a86f..15bb900 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -39,6 +39,7 @@
 #include <rte_errno.h>
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
+#include <rte_malloc.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
@@ -54,7 +55,7 @@
 
 #define RING_SIZE 16384
 
-/* uncommnet below line to enable debug logs */
+/* uncomment below line to enable debug logs */
 /* #define DEBUG */
 
 #ifdef DEBUG
@@ -86,11 +87,6 @@ struct send_thread_args {
 	struct rte_reorder_buffer *buffer;
 };
 
-struct output_buffer {
-	unsigned count;
-	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-};
-
 volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
@@ -235,6 +231,68 @@ parse_args(int argc, char **argv)
 	return 0;
 }
 
+/*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata __rte_unused) {
+
+	/* free the mbufs which failed from transmit */
+	app_stats.tx.ro_tx_failed_pkts += count;
+	LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
+	pktmbuf_free_bulk(unsent, count);
+
+}
+
+static inline int
+free_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[]) {
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		rte_free(tx_buffer[port_id]);
+	}
+	return 0;
+}
+
+static inline int
+configure_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[])
+{
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+	int ret;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		/* Initialize TX buffers */
+		tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKTS_BURST), 0,
+				rte_eth_dev_socket_id(port_id));
+		if (tx_buffer[port_id] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) port_id);
+
+		rte_eth_tx_buffer_init(tx_buffer[port_id], MAX_PKTS_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+				flush_tx_error_callback, NULL);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+	}
+	return 0;
+}
+
 static inline int
 configure_eth_port(uint8_t port_id)
 {
@@ -438,22 +496,6 @@ worker_thread(void *args_ptr)
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.ro_tx_pkts += nb_tx;
-
-	if (unlikely(nb_tx < outbuf->count)) {
-		/* free the mbufs which failed from transmit */
-		app_stats.tx.ro_tx_failed_pkts += (outbuf->count - nb_tx);
-		LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
-		pktmbuf_free_bulk(&outbuf->mbufs[nb_tx], outbuf->count - nb_tx);
-	}
-	outbuf->count = 0;
-}
-
 /**
  * Dequeue mbufs from the workers_to_tx ring and reorder them before
  * transmitting.
@@ -465,12 +507,15 @@ send_thread(struct send_thread_args *args)
 	unsigned int i, dret;
 	uint16_t nb_dq_mbufs;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
 	struct rte_mbuf *rombufs[MAX_PKTS_BURST] = {NULL};
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, rte_lcore_id());
 
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -515,7 +560,7 @@ send_thread(struct send_thread_args *args)
 		dret = rte_reorder_drain(args->buffer, rombufs, MAX_PKTS_BURST);
 		for (i = 0; i < dret; i++) {
 
-			struct output_buffer *outbuf;
+			struct rte_eth_dev_tx_buffer *outbuf;
 			uint8_t outp1;
 
 			outp1 = rombufs[i]->port;
@@ -525,12 +570,15 @@ send_thread(struct send_thread_args *args)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp1];
-			outbuf->mbufs[outbuf->count++] = rombufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp1);
+			outbuf = tx_buffer[outp1];
+			sent = rte_eth_tx_buffer(outp1, 0, outbuf, rombufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
+
+	free_tx_buffers(tx_buffer);
+
 	return 0;
 }
 
@@ -542,12 +590,16 @@ tx_thread(struct rte_ring *ring_in)
 {
 	uint32_t i, dqnum;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-	struct output_buffer *outbuf;
+	struct rte_eth_dev_tx_buffer *outbuf;
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__,
 							rte_lcore_id());
+
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -567,10 +619,10 @@ tx_thread(struct rte_ring *ring_in)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp];
-			outbuf->mbufs[outbuf->count++] = mbufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp);
+			outbuf = tx_buffer[outp];
+			sent = rte_eth_tx_buffer(outp, 0, outbuf, mbufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
 
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index 0de5e7f..b968b00 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -36,6 +36,7 @@
 
 #include <rte_common.h>
 #include <rte_eal.h>
+#include <rte_malloc.h>
 #include <rte_mempool.h>
 #include <rte_ethdev.h>
 #include <rte_cycles.h>
@@ -118,9 +119,7 @@ static struct rte_eth_conf port_conf = {
 static uint8_t port_rx;
 static uint8_t port_tx;
 static struct rte_mbuf *pkts_rx[PKT_RX_BURST_MAX];
-static struct rte_mbuf *pkts_tx[PKT_TX_BURST_MAX];
-static uint16_t pkts_tx_len = 0;
-
+struct rte_eth_dev_tx_buffer *tx_buffer;
 
 struct rte_meter_srtcm_params app_srtcm_params[] = {
 	{.cir = 1000000 * 46,  .cbs = 2048, .ebs = 2048},
@@ -188,27 +187,8 @@ main_loop(__attribute__((unused)) void *dummy)
 		current_time = rte_rdtsc();
 		time_diff = current_time - last_time;
 		if (unlikely(time_diff > TIME_TX_DRAIN)) {
-			int ret;
-
-			if (pkts_tx_len == 0) {
-				last_time = current_time;
-
-				continue;
-			}
-
-			/* Write packet burst to NIC TX */
-			ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, pkts_tx_len);
-
-			/* Free buffers for any packets not written successfully */
-			if (unlikely(ret < pkts_tx_len)) {
-				for ( ; ret < pkts_tx_len; ret ++) {
-					rte_pktmbuf_free(pkts_tx[ret]);
-				}
-			}
-
-			/* Empty the output buffer */
-			pkts_tx_len = 0;
-
+			/* Flush tx buffer */
+			rte_eth_tx_buffer_flush(port_tx, NIC_TX_QUEUE, tx_buffer);
 			last_time = current_time;
 		}
 
@@ -222,26 +202,8 @@ main_loop(__attribute__((unused)) void *dummy)
 			/* Handle current packet */
 			if (app_pkt_handle(pkt, current_time) == DROP)
 				rte_pktmbuf_free(pkt);
-			else {
-				pkts_tx[pkts_tx_len] = pkt;
-				pkts_tx_len ++;
-			}
-
-			/* Write packets from output buffer to NIC TX when full burst is available */
-			if (unlikely(pkts_tx_len == PKT_TX_BURST_MAX)) {
-				/* Write packet burst to NIC TX */
-				int ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, PKT_TX_BURST_MAX);
-
-				/* Free buffers for any packets not written successfully */
-				if (unlikely(ret < PKT_TX_BURST_MAX)) {
-					for ( ; ret < PKT_TX_BURST_MAX; ret ++) {
-						rte_pktmbuf_free(pkts_tx[ret]);
-					}
-				}
-
-				/* Empty the output buffer */
-				pkts_tx_len = 0;
-			}
+			else
+				rte_eth_tx_buffer(port_tx, NIC_TX_QUEUE, tx_buffer, pkt);
 		}
 	}
 }
@@ -397,6 +359,15 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d TX queue setup error (%d)\n", port_tx, ret);
 
+	tx_buffer = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(PKT_TX_BURST_MAX), 0,
+			rte_eth_dev_socket_id(port_tx));
+	if (tx_buffer == NULL)
+		rte_exit(EXIT_FAILURE, "Port %d TX buffer allocation error\n",
+				port_tx);
+
+	rte_eth_tx_buffer_init(tx_buffer, PKT_TX_BURST_MAX);
+
 	ret = rte_eth_dev_start(port_rx);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d start error (%d)\n", port_rx, ret);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 2/2] examples: rework to use buffered tx Tomasz Kulasek
@ 2016-03-10 11:31     ` Ananyev, Konstantin
  2016-03-10 16:01       ` Jastrzebski, MichalX K
  2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
  3 siblings, 1 reply; 43+ messages in thread
From: Ananyev, Konstantin @ 2016-03-10 11:31 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev

> Many sample apps include internal buffering for single-packet-at-a-time
> operation. Since this is such a common paradigm, this functionality is
> better suited to being implemented in the ethdev API.
> 
> The new APIs in the ethdev library are:
> * rte_eth_tx_buffer_init - initialize buffer
> * rte_eth_tx_buffer - buffer up a single packet for future transmission
> * rte_eth_tx_buffer_flush - flush any unsent buffered packets
> * rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
>   case transmitting a buffered burst fails. By default, we just free the
>   unsent packets.
> 
> As well as these, an additional reference callbacks are provided, which
> frees the packets:
> 
> * rte_eth_tx_buffer_drop_callback - silently drop packets (default
>   behavior)
> * rte_eth_tx_buffer_count_callback - drop and update user-provided counter
>   to track the number of dropped packets
> 
> Due to the feedback from mailing list, that buffer management facilities
> in the user application are more preferable than API simplicity, we decided
> to move internal buffer table, as well as callback functions and user data,
> from rte_eth_dev/rte_eth_dev_data to the application space.
> It prevents ABI breakage and gives some more flexibility in the buffer's
> management such as allocation, dynamical size change, reuse buffers on many
> ports or after fail, and so on.
> 
> 
> The following steps illustrate how tx buffers can be used in application:
> 
> 1) Initialization
> 
> a) Allocate memory for a buffer
> 
>    struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
>            RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);
> 
>    RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
>    "size" packets in buffer.
> 
> b) Initialize allocated memory and set up default values. Threshold level
>    must be lower than or equal to the MAX_PKT_BURST from 1a)
> 
>    rte_eth_tx_buffer_init(buffer, threshold);
> 
> 
> c) Set error callback (optional)
> 
>    rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);
> 
> 
> 2) Store packet "pkt" in buffer and send them all to the queue_id on
>    port_id when number of packets reaches threshold level set up in 1b)
> 
>    rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);
> 
> 
> 3) Send all stored packets to the queue_id on port_id
> 
>    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> 
> 
> 4) Flush buffer and free memory
> 
>    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>    ...
>    rte_free(buffer);
> 
> v3 changes:
>  - error counter removed from tx buffer structure, now default behavior is
>    silent drop of unsent packets
>  - some names was changed in tx buffer structure to be more descriptive
>  - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
>    rte_eth_tx_buffer_count_callback
> 
> v2 changes:
>  - reworked to use new buffer model
>  - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
>    so this patch doesn't brake an ABI anymore
>  - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
>  - buffers are not attached to the port-queue
>  - buffers can be allocated dynamically during application work
>  - size of buffer can be changed without port restart
> 
> Tomasz Kulasek (2):
>   ethdev: add buffered tx api
>   examples: rework to use buffered tx
> 
>  examples/l2fwd-jobstats/main.c                     |  104 ++++------
>  examples/l2fwd-keepalive/main.c                    |  100 ++++------
>  examples/l2fwd/main.c                              |  104 ++++------
>  examples/l3fwd-acl/main.c                          |   92 ++++-----
>  examples/l3fwd-power/main.c                        |   89 ++++-----
>  examples/link_status_interrupt/main.c              |  107 ++++------
>  .../client_server_mp/mp_client/client.c            |  101 ++++++----
>  examples/multi_process/l2fwd_fork/main.c           |   97 ++++-----
>  examples/packet_ordering/main.c                    |  122 ++++++++----
>  examples/qos_meter/main.c                          |   61 ++----
>  lib/librte_ether/rte_ethdev.c                      |   46 +++++
>  lib/librte_ether/rte_ethdev.h                      |  205 +++++++++++++++++++-
>  lib/librte_ether/rte_ether_version.map             |   10 +
>  13 files changed, 696 insertions(+), 542 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev
  2016-03-10 11:31     ` [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
@ 2016-03-10 16:01       ` Jastrzebski, MichalX K
  0 siblings, 0 replies; 43+ messages in thread
From: Jastrzebski, MichalX K @ 2016-03-10 16:01 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, March 10, 2016 12:32 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to
> ethdev
> 
> > Many sample apps include internal buffering for single-packet-at-a-time
> > operation. Since this is such a common paradigm, this functionality is
> > better suited to being implemented in the ethdev API.
> >
> > The new APIs in the ethdev library are:
> > * rte_eth_tx_buffer_init - initialize buffer
> > * rte_eth_tx_buffer - buffer up a single packet for future transmission
> > * rte_eth_tx_buffer_flush - flush any unsent buffered packets
> > * rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
> >   case transmitting a buffered burst fails. By default, we just free the
> >   unsent packets.
> >
> > As well as these, an additional reference callbacks are provided, which
> > frees the packets:
> >
> > * rte_eth_tx_buffer_drop_callback - silently drop packets (default
> >   behavior)
> > * rte_eth_tx_buffer_count_callback - drop and update user-provided
> counter
> >   to track the number of dropped packets
> >
> > Due to the feedback from mailing list, that buffer management facilities
> > in the user application are more preferable than API simplicity, we
> decided
> > to move internal buffer table, as well as callback functions and user data,
> > from rte_eth_dev/rte_eth_dev_data to the application space.
> > It prevents ABI breakage and gives some more flexibility in the buffer's
> > management such as allocation, dynamical size change, reuse buffers on
> many
> > ports or after fail, and so on.
> >
> >
> > The following steps illustrate how tx buffers can be used in application:
> >
> > 1) Initialization
> >
> > a) Allocate memory for a buffer
> >
> >    struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
> >            RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);
> >
> >    RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to
> store
> >    "size" packets in buffer.
> >
> > b) Initialize allocated memory and set up default values. Threshold level
> >    must be lower than or equal to the MAX_PKT_BURST from 1a)
> >
> >    rte_eth_tx_buffer_init(buffer, threshold);
> >
> >
> > c) Set error callback (optional)
> >
> >    rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);
> >
> >
> > 2) Store packet "pkt" in buffer and send them all to the queue_id on
> >    port_id when number of packets reaches threshold level set up in 1b)
> >
> >    rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);
> >
> >
> > 3) Send all stored packets to the queue_id on port_id
> >
> >    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >
> >
> > 4) Flush buffer and free memory
> >
> >    rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >    ...
> >    rte_free(buffer);
> >
> > v3 changes:
> >  - error counter removed from tx buffer structure, now default behavior is
> >    silent drop of unsent packets
> >  - some names was changed in tx buffer structure to be more descriptive
> >  - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
> >    rte_eth_tx_buffer_count_callback
> >
> > v2 changes:
> >  - reworked to use new buffer model
> >  - buffer data and callbacks are removed from
> rte_eth_dev/rte_eth_dev_data,
> >    so this patch doesn't brake an ABI anymore
> >  - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
> >  - buffers are not attached to the port-queue
> >  - buffers can be allocated dynamically during application work
> >  - size of buffer can be changed without port restart
> >
> > Tomasz Kulasek (2):
> >   ethdev: add buffered tx api
> >   examples: rework to use buffered tx
> >
> >  examples/l2fwd-jobstats/main.c                     |  104 ++++------
> >  examples/l2fwd-keepalive/main.c                    |  100 ++++------
> >  examples/l2fwd/main.c                              |  104 ++++------
> >  examples/l3fwd-acl/main.c                          |   92 ++++-----
> >  examples/l3fwd-power/main.c                        |   89 ++++-----
> >  examples/link_status_interrupt/main.c              |  107 ++++------
> >  .../client_server_mp/mp_client/client.c            |  101 ++++++----
> >  examples/multi_process/l2fwd_fork/main.c           |   97 ++++-----
> >  examples/packet_ordering/main.c                    |  122 ++++++++----
> >  examples/qos_meter/main.c                          |   61 ++----
> >  lib/librte_ether/rte_ethdev.c                      |   46 +++++
> >  lib/librte_ether/rte_ethdev.h                      |  205 +++++++++++++++++++-
> >  lib/librte_ether/rte_ether_version.map             |   10 +
> >  13 files changed, 696 insertions(+), 542 deletions(-)
> >
> > --
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> > 1.7.9.5

Hi Thomas,
Could You write please does this patch meet Your requirements and 
does it have a green light to be applied?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api
  2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-03-10 16:23       ` Thomas Monjalon
  2016-03-10 17:15         ` Kulasek, TomaszX
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-10 16:23 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

2016-03-10 11:57, Tomasz Kulasek:
> +struct rte_eth_dev_tx_buffer {
> +	buffer_tx_error_fn callback;
> +	void *userdata;

What about renaming this fields as
- error_callback
- error_userdata ?

> +	uint16_t size;           /**< Size of buffer for buffered tx */
> +	uint16_t length;

Maybe a comment "Number of packets in the array" to be sure?

> +	struct rte_mbuf *pkts[];

A comment? "Pending packets to be sent on explicit flush or when full" ?

[...]
> +DPDK_16.04 {
> +	global:
> +
> +	rte_eth_tx_buffer_drop_callback;
> +	rte_eth_tx_buffer_count_callback;
> +	rte_eth_tx_buffer_init;
> +	rte_eth_tx_buffer_set_err_callback;

Please keep alphabetical order.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api
  2016-03-10 16:23       ` Thomas Monjalon
@ 2016-03-10 17:15         ` Kulasek, TomaszX
  0 siblings, 0 replies; 43+ messages in thread
From: Kulasek, TomaszX @ 2016-03-10 17:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, March 10, 2016 17:24
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api
> 
> 2016-03-10 11:57, Tomasz Kulasek:
> > +struct rte_eth_dev_tx_buffer {
> > +	buffer_tx_error_fn callback;
> > +	void *userdata;
> 
> What about renaming this fields as
> - error_callback
> - error_userdata ?
> 
> > +	uint16_t size;           /**< Size of buffer for buffered tx */
> > +	uint16_t length;
> 
> Maybe a comment "Number of packets in the array" to be sure?
> 
> > +	struct rte_mbuf *pkts[];
> 
> A comment? "Pending packets to be sent on explicit flush or when full" ?
> 
> [...]
> > +DPDK_16.04 {
> > +	global:
> > +
> > +	rte_eth_tx_buffer_drop_callback;
> > +	rte_eth_tx_buffer_count_callback;
> > +	rte_eth_tx_buffer_init;
> > +	rte_eth_tx_buffer_set_err_callback;
> 
> Please keep alphabetical order.
> 

Ok, I'll send v4

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v4 0/2] add support for buffered tx to ethdev
  2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
                       ` (2 preceding siblings ...)
  2016-03-10 11:31     ` [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
@ 2016-03-10 17:19     ` Tomasz Kulasek
  2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 1/2] ethdev: add buffered tx api Tomasz Kulasek
                         ` (2 more replies)
  3 siblings, 3 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 17:19 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callbacks are provided, which
frees the packets:

* rte_eth_tx_buffer_drop_callback - silently drop packets (default
  behavior)
* rte_eth_tx_buffer_count_callback - drop and update user-provided counter
  to track the number of dropped packets

Due to the feedback from mailing list, that buffer management facilities
in the user application are more preferable than API simplicity, we decided
to move internal buffer table, as well as callback functions and user data,
from rte_eth_dev/rte_eth_dev_data to the application space.
It prevents ABI breakage and gives some more flexibility in the buffer's
management such as allocation, dynamical size change, reuse buffers on many
ports or after fail, and so on.


The following steps illustrate how tx buffers can be used in application:

1) Initialization

a) Allocate memory for a buffer

   struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
           RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);

   RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
   "size" packets in buffer.

b) Initialize allocated memory and set up default values. Threshold level
   must be lower than or equal to the MAX_PKT_BURST from 1a)

   rte_eth_tx_buffer_init(buffer, threshold);


c) Set error callback (optional)

   rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);


2) Store packet "pkt" in buffer and send them all to the queue_id on
   port_id when number of packets reaches threshold level set up in 1b)

   rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);


3) Send all stored packets to the queue_id on port_id

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);


4) Flush buffer and free memory

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
   ...
   rte_free(buffer);

v4 changes:
 - added comments
 - chaged names of error callback and user data
 - changed order of function names in map file

v3 changes:
 - error counter removed from tx buffer structure, now default behavior is
   silent drop of unsent packets
 - some names was changed in tx buffer structure to be more descriptive
 - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
   rte_eth_tx_buffer_count_callback

v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart

Tomasz Kulasek (2):
  ethdev: add buffered tx api
  examples: rework to use buffered tx

 examples/l2fwd-jobstats/main.c                     |  104 ++++------
 examples/l2fwd-keepalive/main.c                    |  100 ++++------
 examples/l2fwd/main.c                              |  104 ++++------
 examples/l3fwd-acl/main.c                          |   92 ++++-----
 examples/l3fwd-power/main.c                        |   89 ++++-----
 examples/link_status_interrupt/main.c              |  107 ++++------
 .../client_server_mp/mp_client/client.c            |  101 ++++++----
 examples/multi_process/l2fwd_fork/main.c           |   97 ++++-----
 examples/packet_ordering/main.c                    |  122 ++++++++----
 examples/qos_meter/main.c                          |   61 ++----
 lib/librte_ether/rte_ethdev.c                      |   46 +++++
 lib/librte_ether/rte_ethdev.h                      |  206 +++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map             |   10 +
 13 files changed, 697 insertions(+), 542 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] ethdev: add buffered tx api
  2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
@ 2016-03-10 17:19       ` Tomasz Kulasek
  2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 2/2] examples: rework to use buffered tx Tomasz Kulasek
  2016-03-11 16:39       ` [dpdk-dev] [PATCH v4 0/2] add support for buffered tx to ethdev Thomas Monjalon
  2 siblings, 0 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 17:19 UTC (permalink / raw)
  To: dev

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callbacks are provided, which
frees the packets:

* rte_eth_tx_buffer_drop_callback - silently drop packets (default
  behavior)
* rte_eth_tx_buffer_count_callback - drop and update user-provided counter
  to track the number of dropped packets

v4 changes:
 - added comments
 - chaged names of error callback and user data
 - changed order of function names in map file

v3 changes:
 - error counter removed from tx buffer structure, now default behavior is
   silent drop of unsent packets
 - some names was changed in tx buffer structure to be more descriptive
 - two default calbacks are provided: rte_eth_tx_buffer_drop_callback and
   rte_eth_tx_buffer_count_callback

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ether/rte_ethdev.c          |   46 +++++++
 lib/librte_ether/rte_ethdev.h          |  206 +++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ether_version.map |   10 ++
 3 files changed, 261 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5c2b416..98587e1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1271,6 +1271,52 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 }
 
 void
+rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata __rte_unused)
+{
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+}
+
+void
+rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata)
+{
+	uint64_t *count = userdata;
+	unsigned i;
+
+	for (i = 0; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+
+	*count += unsent;
+}
+
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn cbfn, void *userdata)
+{
+	buffer->error_callback = cbfn;
+	buffer->error_userdata = userdata;
+	return 0;
+}
+
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
+{
+	if (buffer == NULL)
+		return -EINVAL;
+
+	buffer->size = size;
+	if (buffer->error_callback == NULL)
+		rte_eth_tx_buffer_set_err_callback(buffer,
+				rte_eth_tx_buffer_drop_callback, NULL);
+
+	return 0;
+}
+
+void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d53e362..2062d6c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -2655,6 +2655,210 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
 }
 
+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata);
+
+/**
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+	buffer_tx_error_fn error_callback;
+	void *error_userdata;
+	uint16_t size;           /**< Size of buffer for buffered tx */
+	uint16_t length;         /**< Number of packets in the array */
+	struct rte_mbuf *pkts[];
+	/**< Pending packets to be sent on explicit flush or when full */
+};
+
+/**
+ * Calculate the size of the tx buffer.
+ *
+ * @param sz
+ *   Number of stored packets.
+ */
+#define RTE_ETH_TX_BUFFER_SIZE(sz) \
+	(sizeof(struct rte_eth_dev_tx_buffer) + (sz) * sizeof(struct rte_mbuf *))
+
+/**
+ * Initialize default values for buffered transmitting
+ *
+ * @param buffer
+ *   Tx buffer to be initialized.
+ * @param size
+ *   Buffer size
+ * @return
+ *   0 if no error
+ */
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size);
+
+/**
+ * Send any packets queued up for transmission on a port and HW queue
+ *
+ * This causes an explicit flush of packets previously buffered via the
+ * rte_eth_tx_buffer() function. It returns the number of packets successfully
+ * sent to the NIC, and calls the error callback for any unsent packets. Unless
+ * explicitly set up otherwise, the default callback simply frees the unsent
+ * packets back to the owning mempool.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer of packets to be transmit.
+ * @return
+ *   The number of packets successfully sent to the Ethernet device. The error
+ *   callback is called for any packets which could not be sent.
+ */
+static inline uint16_t
+rte_eth_tx_buffer_flush(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer)
+{
+	uint16_t sent;
+	uint16_t to_send = buffer->length;
+
+	if (to_send == 0)
+		return 0;
+
+	sent = rte_eth_tx_burst(port_id, queue_id, buffer->pkts, to_send);
+
+	buffer->length = 0;
+
+	/* All packets sent, or to be dealt with by callback below */
+	if (unlikely(sent != to_send))
+		buffer->error_callback(&buffer->pkts[sent], to_send - sent,
+				buffer->error_userdata);
+
+	return sent;
+}
+
+/**
+ * Buffer a single packet for future transmission on a port and queue
+ *
+ * This function takes a single mbuf/packet and buffers it for later
+ * transmission on the particular port and queue specified. Once the buffer is
+ * full of packets, an attempt will be made to transmit all the buffered
+ * packets. In case of error, where not all packets can be transmitted, a
+ * callback is called with the unsent packets as a parameter. If no callback
+ * is explicitly set up, the unsent packets are just freed back to the owning
+ * mempool. The function returns the number of packets actually sent i.e.
+ * 0 if no buffer flush occurred, otherwise the number of packets successfully
+ * flushed
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param buffer
+ *   Buffer used to collect packets to be sent.
+ * @param tx_pkt
+ *   Pointer to the packet mbuf to be sent.
+ * @return
+ *   0 = packet has been buffered for later transmission
+ *   N > 0 = packet has been buffered, and the buffer was subsequently flushed,
+ *     causing N packets to be sent, and the error callback to be called for
+ *     the rest.
+ */
+static inline uint16_t __attribute__((always_inline))
+rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
+		struct rte_eth_dev_tx_buffer *buffer, struct rte_mbuf *tx_pkt)
+{
+	buffer->pkts[buffer->length++] = tx_pkt;
+	if (buffer->length < buffer->size)
+		return 0;
+
+	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
+}
+
+/**
+ * Configure a callback for buffered packets which cannot be sent
+ *
+ * Register a specific callback to be called when an attempt is made to send
+ * all packets buffered on an ethernet port, but not all packets can
+ * successfully be sent. The callback registered here will be called only
+ * from calls to rte_eth_tx_buffer() and rte_eth_tx_buffer_flush() APIs.
+ * The default callback configured for each queue by default just frees the
+ * packets back to the calling mempool. If additional behaviour is required,
+ * for example, to count dropped packets, or to retry transmission of packets
+ * which cannot be sent, this function should be used to register a suitable
+ * callback function to implement the desired behaviour.
+ * The example callback "rte_eth_count_unsent_packet_callback()" is also
+ * provided as reference.
+ *
+ * @param buffer
+ *   The port identifier of the Ethernet device.
+ * @param callback
+ *   The function to be used as the callback.
+ * @param userdata
+ *   Arbitrary parameter to be passed to the callback function
+ * @return
+ *   0 on success, or -1 on error with rte_errno set appropriately
+ */
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+		buffer_tx_error_fn callback, void *userdata);
+
+/**
+ * Callback function for silently dropping unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behavior when buffered packets cannot be sent. This
+ * function drops any unsent packets silently and is used by tx buffered
+ * operations as default behavior.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an uint64_t value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Not used
+ */
+void
+rte_eth_tx_buffer_drop_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata __rte_unused);
+
+/**
+ * Callback function for tracking unsent buffered packets.
+ *
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behavior when buffered packets cannot be sent. This
+ * function drops any unsent packets, but also updates a user-supplied counter
+ * to track the overall number of packets dropped. The counter should be an
+ * uint64_t variable.
+ *
+ * NOTE: this function should not be called directly, instead it should be used
+ *       as a callback for packet buffering.
+ *
+ * NOTE: when configuring this function as a callback with
+ *       rte_eth_tx_buffer_set_err_callback(), the final, userdata parameter
+ *       should point to an uint64_t value.
+ *
+ * @param pkts
+ *   The previously buffered packets which could not be sent
+ * @param unsent
+ *   The number of unsent packets in the pkts array
+ * @param userdata
+ *   Pointer to an uint64_t value, which will be incremented by unsent
+ */
+void
+rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
+		void *userdata);
+
 /**
  * The eth device event type for interrupt, and maybe others in the future.
  */
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d8db24d..4ad934c 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -117,3 +117,13 @@ DPDK_2.2 {
 
 	local: *;
 };
+
+DPDK_16.04 {
+	global:
+
+	rte_eth_tx_buffer_count_callback;
+	rte_eth_tx_buffer_drop_callback;
+	rte_eth_tx_buffer_init;
+	rte_eth_tx_buffer_set_err_callback;
+
+} DPDK_2.2;
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] examples: rework to use buffered tx
  2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
  2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 1/2] ethdev: add buffered tx api Tomasz Kulasek
@ 2016-03-10 17:19       ` Tomasz Kulasek
  2016-03-11 16:39       ` [dpdk-dev] [PATCH v4 0/2] add support for buffered tx to ethdev Thomas Monjalon
  2 siblings, 0 replies; 43+ messages in thread
From: Tomasz Kulasek @ 2016-03-10 17:19 UTC (permalink / raw)
  To: dev

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset also replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

v3 changes
 - updated due to the change of callback name

v2 changes
 - rework synced with tx buffer API changes

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 examples/l2fwd-jobstats/main.c                     |  104 +++++++----------
 examples/l2fwd-keepalive/main.c                    |  100 ++++++----------
 examples/l2fwd/main.c                              |  104 +++++++----------
 examples/l3fwd-acl/main.c                          |   92 ++++++---------
 examples/l3fwd-power/main.c                        |   89 ++++++--------
 examples/link_status_interrupt/main.c              |  107 +++++++----------
 .../client_server_mp/mp_client/client.c            |  101 +++++++++-------
 examples/multi_process/l2fwd_fork/main.c           |   97 +++++++---------
 examples/packet_ordering/main.c                    |  122 ++++++++++++++------
 examples/qos_meter/main.c                          |   61 +++-------
 10 files changed, 436 insertions(+), 541 deletions(-)

diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 6da60e0..d1e9bf7 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,6 +41,7 @@
 #include <rte_alarm.h>
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,18 +98,12 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	uint64_t next_flush_time;
-	unsigned len;
-	struct rte_mbuf *mbufs[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	uint64_t next_flush_time[RTE_MAX_ETHPORTS];
 
 	struct rte_timer rx_timers[MAX_RX_QUEUE_PER_LCORE];
 	struct rte_jobstats port_fwd_jobs[MAX_RX_QUEUE_PER_LCORE];
@@ -123,6 +118,8 @@ struct lcore_queue_conf {
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -373,59 +370,14 @@ show_stats_cb(__rte_unused void *param)
 	rte_eal_alarm_set(timer_period * US_PER_S, show_stats_cb, NULL);
 }
 
-/* Send the burst of packets on an output interface */
-static void
-l2fwd_send_burst(struct lcore_queue_conf *qconf, uint8_t port)
-{
-	struct mbuf_table *m_table;
-	uint16_t ret;
-	uint16_t queueid = 0;
-	uint16_t n;
-
-	m_table = &qconf->tx_mbufs[port];
-	n = m_table->len;
-
-	m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
-	m_table->len = 0;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table->mbufs, n);
-
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table->mbufs[ret]);
-		} while (++ret < n);
-	}
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	const unsigned lcore_id = rte_lcore_id();
-	struct lcore_queue_conf *qconf = &lcore_queue_conf[lcore_id];
-	struct mbuf_table *m_table = &qconf->tx_mbufs[port];
-	uint16_t len = qconf->tx_mbufs[port].len;
-
-	m_table->mbufs[len] = m;
-
-	len++;
-	m_table->len = len;
-
-	/* Enough pkts to be sent. */
-	if (unlikely(len == MAX_PKT_BURST))
-		l2fwd_send_burst(qconf, port);
-
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -437,7 +389,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 static void
@@ -511,8 +466,10 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	uint64_t now;
 	unsigned lcore_id;
 	struct lcore_queue_conf *qconf;
-	struct mbuf_table *m_table;
 	uint8_t portid;
+	unsigned i;
+	uint32_t sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
@@ -522,14 +479,20 @@ l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg)
 	now = rte_get_timer_cycles();
 	lcore_id = rte_lcore_id();
 	qconf = &lcore_queue_conf[lcore_id];
-	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-		m_table = &qconf->tx_mbufs[portid];
-		if (m_table->len == 0 || m_table->next_flush_time <= now)
+
+	for (i = 0; i < qconf->n_rx_port; i++) {
+		portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+
+		if (qconf->next_flush_time[portid] <= now)
 			continue;
 
-		l2fwd_send_burst(qconf, portid);
-	}
+		buffer = tx_buffer[portid];
+		sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+		if (sent)
+			port_statistics[portid].tx += sent;
 
+		qconf->next_flush_time[portid] = rte_get_timer_cycles() + drain_tsc;
+	}
 
 	/* Pass target to indicate that this job is happy of time interwal
 	 * in which it was called. */
@@ -945,6 +908,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index f4d52f2..94b8677 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,21 +98,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -192,58 +188,14 @@ print_stats(__attribute__((unused)) struct rte_timer *ptr_timer,
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid = 0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
+	int sent;
 	unsigned dst_port;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -255,7 +207,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -264,12 +219,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
 	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1)
 		/ US_PER_S * BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -312,13 +269,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			prev_tsc = cur_tsc;
@@ -713,6 +672,23 @@ main(int argc, char **argv)
 				"rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index f35d8a1..e175681 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -49,6 +49,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -99,21 +100,16 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];
 
 static unsigned int l2fwd_rx_queue_per_lcore = 1;
 
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -189,58 +185,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -252,7 +204,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -261,11 +216,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -285,6 +243,7 @@ l2fwd_main_loop(void)
 		portid = qconf->rx_port_list[i];
 		RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
 			portid);
+
 	}
 
 	while (!force_quit) {
@@ -297,13 +256,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -688,6 +649,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f676d14..3a895b7 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -119,11 +119,6 @@ static uint32_t enabled_port_mask;
 static int promiscuous_on; /**< Ports set in promiscuous mode off by default. */
 static int numa_on = 1; /**< NUMA is enabled by default. */
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -187,7 +182,7 @@ static struct rte_mempool *pktmbuf_pool[NB_SOCKETS];
 static inline int
 is_valid_ipv4_pkt(struct ipv4_hdr *pkt, uint32_t link_len);
 #endif
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port);
 
 #define MAX_ACL_RULE_NUM	100000
@@ -1291,56 +1286,26 @@ app_acl_init(void)
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 } __rte_cache_aligned;
 
 static struct lcore_conf lcore_conf[RTE_MAX_LCORE];
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
-static inline int
+static inline void
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
 
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1428,20 +1393,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1936,6 +1893,7 @@ main(int argc, char **argv)
 	unsigned lcore_id;
 	uint32_t n_tx_queue, nb_lcores;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* init EAL */
 	ret = rte_eal_init(argc, argv);
@@ -1968,6 +1926,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "app_acl_init failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -2003,6 +1962,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+					RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+					rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -2032,8 +2007,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f8a2f1b..d4bb7a3 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,7 @@
 #include <rte_common.h>
 #include <rte_byteorder.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -173,11 +174,6 @@ enum freq_scale_hint_t
 	FREQ_HIGHEST  =       2
 };
 
-struct mbuf_table {
-	uint16_t len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
-
 struct lcore_rx_queue {
 	uint8_t port_id;
 	uint8_t queue_id;
@@ -347,8 +343,10 @@ static lookup_struct_t *ipv4_l3fwd_lookup_struct[NB_SOCKETS];
 struct lcore_conf {
 	uint16_t n_rx_queue;
 	struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE];
+	uint16_t n_tx_port;
+	uint16_t tx_port_id[RTE_MAX_ETHPORTS];
 	uint16_t tx_queue_id[RTE_MAX_ETHPORTS];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 	lookup_struct_t * ipv4_lookup_struct;
 	lookup_struct_t * ipv6_lookup_struct;
 } __rte_cache_aligned;
@@ -442,49 +440,19 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim,
 	stats[lcore_id].sleep_time = 0;
 }
 
-/* Send burst of packets on an output interface */
-static inline int
-send_burst(struct lcore_conf *qconf, uint16_t n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	int ret;
-	uint16_t queueid;
-
-	queueid = qconf->tx_queue_id[port];
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, queueid, m_table, n);
-	if (unlikely(ret < n)) {
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
 /* Enqueue a single packet, and send burst if queue is filled */
 static inline int
 send_single_packet(struct rte_mbuf *m, uint8_t port)
 {
 	uint32_t lcore_id;
-	uint16_t len;
 	struct lcore_conf *qconf;
 
 	lcore_id = rte_lcore_id();
-
 	qconf = &lcore_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
 
-	qconf->tx_mbufs[port].len = len;
+	rte_eth_tx_buffer(port, qconf->tx_queue_id[port],
+			qconf->tx_buffer[port], m);
+
 	return 0;
 }
 
@@ -905,20 +873,12 @@ main_loop(__attribute__((unused)) void *dummy)
 		 */
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * This could be optimized (use queueid instead of
-			 * portid), but it is not called so often
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				send_burst(&lcore_conf[lcore_id],
-					qconf->tx_mbufs[portid].len,
-					portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_tx_port; ++i) {
+				portid = qconf->tx_port_id[i];
+				rte_eth_tx_buffer_flush(portid,
+						qconf->tx_queue_id[portid],
+						qconf->tx_buffer[portid]);
 			}
-
 			prev_tsc = cur_tsc;
 		}
 
@@ -1585,6 +1545,7 @@ main(int argc, char **argv)
 	uint32_t n_tx_queue, nb_lcores;
 	uint32_t dev_rxq_num, dev_txq_num;
 	uint8_t portid, nb_rx_queue, queue, socketid;
+	uint8_t nb_tx_port;
 
 	/* catch SIGINT and restore cpufreq governor to ondemand */
 	signal(SIGINT, signal_exit_now);
@@ -1620,6 +1581,7 @@ main(int argc, char **argv)
 		rte_exit(EXIT_FAILURE, "check_port_config failed\n");
 
 	nb_lcores = rte_lcore_count();
+	nb_tx_port = 0;
 
 	/* initialize all ports */
 	for (portid = 0; portid < nb_ports; portid++) {
@@ -1663,6 +1625,22 @@ main(int argc, char **argv)
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "init_mem failed\n");
 
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			if (rte_lcore_is_enabled(lcore_id) == 0)
+				continue;
+
+			/* Initialize TX buffers */
+			qconf = &lcore_conf[lcore_id];
+			qconf->tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+			if (qconf->tx_buffer[portid] == NULL)
+				rte_exit(EXIT_FAILURE, "Can't allocate tx buffer for port %u\n",
+						(unsigned) portid);
+
+			rte_eth_tx_buffer_init(qconf->tx_buffer[portid], MAX_PKT_BURST);
+		}
+
 		/* init one TX queue per couple (lcore,port) */
 		queueid = 0;
 		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -1695,8 +1673,13 @@ main(int argc, char **argv)
 			qconf = &lcore_conf[lcore_id];
 			qconf->tx_queue_id[portid] = queueid;
 			queueid++;
+
+			qconf->n_tx_port = nb_tx_port;
+			qconf->tx_port_id[qconf->n_tx_port] = portid;
 		}
 		printf("\n");
+
+		nb_tx_port++;
 	}
 
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index c57a08a..cbc29bc 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@
 
 #include <rte_common.h>
 #include <rte_log.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_memzone.h>
@@ -97,10 +98,6 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
 #define MAX_PKT_BURST 32
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST];
-};
 
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
@@ -108,11 +105,11 @@ struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
 	unsigned tx_queue_id;
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
 	.rxmode = {
 		.split_hdr_size = 0,
@@ -202,59 +199,14 @@ print_stats(void)
 	printf("\n====================================================\n");
 }
 
-/* Send the packet on an output interface */
-static int
-lsi_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid;
-
-	queueid = (uint16_t) qconf->tx_queue_id;
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-lsi_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		lsi_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port = lsi_dst_ports[portid];
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
@@ -265,7 +217,10 @@ lsi_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&lsi_ports_eth_addr[dst_port], &eth->s_addr);
 
-	lsi_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -275,10 +230,13 @@ lsi_main_loop(void)
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
 	unsigned lcore_id;
+	unsigned sent;
 	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 	timer_tsc = 0;
@@ -310,15 +268,15 @@ lsi_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			/* this could be optimized (use queueid instead of
-			 * portid), but it is not called so often */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				lsi_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = lsi_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 
 			/* if timer is enabled */
@@ -722,6 +680,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup: err=%d,port=%u\n",
 				  ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
@@ -729,6 +704,8 @@ main(int argc, char **argv)
 				  ret, (unsigned) portid);
 		printf("done:\n");
 
+		rte_eth_promiscuous_enable(portid);
+
 		printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n",
 				(unsigned) portid,
 				lsi_ports_eth_addr[portid].addr_bytes[0],
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index bf049a4..d4f9ca3 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -42,6 +42,7 @@
 #include <string.h>
 
 #include <rte_common.h>
+#include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_eal.h>
@@ -72,17 +73,13 @@
  * queue to write to. */
 static uint8_t client_id = 0;
 
-struct mbuf_queue {
 #define MBQ_CAPACITY 32
-	struct rte_mbuf *bufs[MBQ_CAPACITY];
-	uint16_t top;
-};
 
 /* maps input ports to output ports for packets */
 static uint8_t output_ports[RTE_MAX_ETHPORTS];
 
 /* buffers up a set of packet that are ready to send */
-static struct mbuf_queue output_bufs[RTE_MAX_ETHPORTS];
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 /* shared data from server. We update statistics here */
 static volatile struct tx_stats *tx_stats;
@@ -149,11 +146,51 @@ parse_app_args(int argc, char *argv[])
 }
 
 /*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata) {
+	int i;
+	uint8_t port_id = (uintptr_t)userdata;
+
+	tx_stats->tx_drop[port_id] += count;
+
+	/* free the mbufs which failed from transmit */
+	for (i = 0; i < count; i++)
+		rte_pktmbuf_free(unsent[i]);
+
+}
+
+static void
+configure_tx_buffer(uint8_t port_id, uint16_t size)
+{
+	int ret;
+
+	/* Initialize TX buffers */
+	tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(size), 0,
+			rte_eth_dev_socket_id(port_id));
+	if (tx_buffer[port_id] == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+				(unsigned) port_id);
+
+	rte_eth_tx_buffer_init(tx_buffer[port_id], size);
+
+	ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+			flush_tx_error_callback, (void *)(intptr_t)port_id);
+	if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+}
+
+/*
  * set up output ports so that all traffic on port gets sent out
  * its paired port. Index using actual port numbers since that is
  * what comes in the mbuf structure.
  */
-static void configure_output_ports(const struct port_info *ports)
+static void
+configure_output_ports(const struct port_info *ports)
 {
 	int i;
 	if (ports->num_ports > RTE_MAX_ETHPORTS)
@@ -164,41 +201,11 @@ static void configure_output_ports(const struct port_info *ports)
 		uint8_t p2 = ports->id[i+1];
 		output_ports[p1] = p2;
 		output_ports[p2] = p1;
-	}
-}
 
+		configure_tx_buffer(p1, MBQ_CAPACITY);
+		configure_tx_buffer(p2, MBQ_CAPACITY);
 
-static inline void
-send_packets(uint8_t port)
-{
-	uint16_t i, sent;
-	struct mbuf_queue *mbq = &output_bufs[port];
-
-	if (unlikely(mbq->top == 0))
-		return;
-
-	sent = rte_eth_tx_burst(port, client_id, mbq->bufs, mbq->top);
-	if (unlikely(sent < mbq->top)){
-		for (i = sent; i < mbq->top; i++)
-			rte_pktmbuf_free(mbq->bufs[i]);
-		tx_stats->tx_drop[port] += (mbq->top - sent);
 	}
-	tx_stats->tx[port] += sent;
-	mbq->top = 0;
-}
-
-/*
- * Enqueue a packet to be sent on a particular port, but
- * don't send it yet. Only when the buffer is full.
- */
-static inline void
-enqueue_packet(struct rte_mbuf *buf, uint8_t port)
-{
-	struct mbuf_queue *mbq = &output_bufs[port];
-	mbq->bufs[mbq->top++] = buf;
-
-	if (mbq->top == MBQ_CAPACITY)
-		send_packets(port);
 }
 
 /*
@@ -209,10 +216,15 @@ enqueue_packet(struct rte_mbuf *buf, uint8_t port)
 static void
 handle_packet(struct rte_mbuf *buf)
 {
+	int sent;
 	const uint8_t in_port = buf->port;
 	const uint8_t out_port = output_ports[in_port];
+	struct rte_eth_dev_tx_buffer *buffer = tx_buffer[out_port];
+
+	sent = rte_eth_tx_buffer(out_port, client_id, buffer, buf);
+	if (sent)
+		tx_stats->tx[out_port] += sent;
 
-	enqueue_packet(buf, out_port);
 }
 
 /*
@@ -229,6 +241,7 @@ main(int argc, char *argv[])
 	int need_flush = 0; /* indicates whether we have unsent packets */
 	int retval;
 	void *pkts[PKT_READ_SIZE];
+	uint16_t sent;
 
 	if ((retval = rte_eal_init(argc, argv)) < 0)
 		return -1;
@@ -274,8 +287,12 @@ main(int argc, char *argv[])
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
-				for (port = 0; port < ports->num_ports; port++)
-					send_packets(ports->id[port]);
+				for (port = 0; port < ports->num_ports; port++) {
+					sent = rte_eth_tx_buffer_flush(ports->id[port], client_id,
+							tx_buffer[port]);
+					if (unlikely(sent))
+						tx_stats->tx[port] += sent;
+				}
 			need_flush = 0;
 			continue;
 		}
diff --git a/examples/multi_process/l2fwd_fork/main.c b/examples/multi_process/l2fwd_fork/main.c
index f2d7eab..aebf531 100644
--- a/examples/multi_process/l2fwd_fork/main.c
+++ b/examples/multi_process/l2fwd_fork/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -127,11 +127,11 @@ struct mbuf_table {
 struct lcore_queue_conf {
 	unsigned n_rx_port;
 	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
 
+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 struct lcore_resource_struct {
 	int enabled;	/* Only set in case this lcore involved into packet forwarding */
 	int flags; 	    /* Set only slave need to restart or recreate */
@@ -583,58 +583,14 @@ slave_exit_cb(unsigned slaveid, __attribute__((unused))int stat)
 	rte_spinlock_unlock(&res_lock);
 }
 
-/* Send the packet on an output interface */
-static int
-l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Send the packet on an output interface */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
 	struct ether_hdr *eth;
 	void *tmp;
 	unsigned dst_port;
+	int sent;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	dst_port = l2fwd_dst_ports[portid];
 	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -646,7 +602,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 	/* src addr */
 	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);
 
-	l2fwd_send_packet(m, (uint8_t) dst_port);
+	buffer = tx_buffer[dst_port];
+	sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+	if (sent)
+		port_statistics[dst_port].tx += sent;
 }
 
 /* main processing loop */
@@ -655,11 +614,14 @@ l2fwd_main_loop(void)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
 	struct rte_mbuf *m;
+	int sent;
 	unsigned lcore_id;
 	uint64_t prev_tsc, diff_tsc, cur_tsc;
 	unsigned i, j, portid, nb_rx;
 	struct lcore_queue_conf *qconf;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
+	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S *
+			BURST_TX_DRAIN_US;
+	struct rte_eth_dev_tx_buffer *buffer;
 
 	prev_tsc = 0;
 
@@ -699,13 +661,15 @@ l2fwd_main_loop(void)
 		diff_tsc = cur_tsc - prev_tsc;
 		if (unlikely(diff_tsc > drain_tsc)) {
 
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_send_burst(&lcore_queue_conf[lcore_id],
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
+			for (i = 0; i < qconf->n_rx_port; i++) {
+
+				portid = l2fwd_dst_ports[qconf->rx_port_list[i]];
+				buffer = tx_buffer[portid];
+
+				sent = rte_eth_tx_buffer_flush(portid, 0, buffer);
+				if (sent)
+					port_statistics[portid].tx += sent;
+
 			}
 		}
 
@@ -1144,6 +1108,23 @@ main(int argc, char **argv)
 			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
 				ret, (unsigned) portid);
 
+		/* Initialize TX buffers */
+		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
+				rte_eth_dev_socket_id(portid));
+		if (tx_buffer[portid] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) portid);
+
+		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
+				rte_eth_tx_buffer_count_callback,
+				&port_statistics[portid].dropped);
+		if (ret < 0)
+				rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+						"tx buffer on port %u\n", (unsigned) portid);
+
 		/* Start device */
 		ret = rte_eth_dev_start(portid);
 		if (ret < 0)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 1d9a86f..15bb900 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -39,6 +39,7 @@
 #include <rte_errno.h>
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
+#include <rte_malloc.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
@@ -54,7 +55,7 @@
 
 #define RING_SIZE 16384
 
-/* uncommnet below line to enable debug logs */
+/* uncomment below line to enable debug logs */
 /* #define DEBUG */
 
 #ifdef DEBUG
@@ -86,11 +87,6 @@ struct send_thread_args {
 	struct rte_reorder_buffer *buffer;
 };
 
-struct output_buffer {
-	unsigned count;
-	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-};
-
 volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
@@ -235,6 +231,68 @@ parse_args(int argc, char **argv)
 	return 0;
 }
 
+/*
+ * Tx buffer error callback
+ */
+static void
+flush_tx_error_callback(struct rte_mbuf **unsent, uint16_t count,
+		void *userdata __rte_unused) {
+
+	/* free the mbufs which failed from transmit */
+	app_stats.tx.ro_tx_failed_pkts += count;
+	LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
+	pktmbuf_free_bulk(unsent, count);
+
+}
+
+static inline int
+free_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[]) {
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		rte_free(tx_buffer[port_id]);
+	}
+	return 0;
+}
+
+static inline int
+configure_tx_buffers(struct rte_eth_dev_tx_buffer *tx_buffer[])
+{
+	const uint8_t nb_ports = rte_eth_dev_count();
+	unsigned port_id;
+	int ret;
+
+	/* initialize buffers for all ports */
+	for (port_id = 0; port_id < nb_ports; port_id++) {
+		/* skip ports that are not enabled */
+		if ((portmask & (1 << port_id)) == 0)
+			continue;
+
+		/* Initialize TX buffers */
+		tx_buffer[port_id] = rte_zmalloc_socket("tx_buffer",
+				RTE_ETH_TX_BUFFER_SIZE(MAX_PKTS_BURST), 0,
+				rte_eth_dev_socket_id(port_id));
+		if (tx_buffer[port_id] == NULL)
+			rte_exit(EXIT_FAILURE, "Cannot allocate buffer for tx on port %u\n",
+					(unsigned) port_id);
+
+		rte_eth_tx_buffer_init(tx_buffer[port_id], MAX_PKTS_BURST);
+
+		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[port_id],
+				flush_tx_error_callback, NULL);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
+					"tx buffer on port %u\n", (unsigned) port_id);
+	}
+	return 0;
+}
+
 static inline int
 configure_eth_port(uint8_t port_id)
 {
@@ -438,22 +496,6 @@ worker_thread(void *args_ptr)
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.ro_tx_pkts += nb_tx;
-
-	if (unlikely(nb_tx < outbuf->count)) {
-		/* free the mbufs which failed from transmit */
-		app_stats.tx.ro_tx_failed_pkts += (outbuf->count - nb_tx);
-		LOG_DEBUG(REORDERAPP, "%s:Packet loss with tx_burst\n", __func__);
-		pktmbuf_free_bulk(&outbuf->mbufs[nb_tx], outbuf->count - nb_tx);
-	}
-	outbuf->count = 0;
-}
-
 /**
  * Dequeue mbufs from the workers_to_tx ring and reorder them before
  * transmitting.
@@ -465,12 +507,15 @@ send_thread(struct send_thread_args *args)
 	unsigned int i, dret;
 	uint16_t nb_dq_mbufs;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
 	struct rte_mbuf *rombufs[MAX_PKTS_BURST] = {NULL};
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__, rte_lcore_id());
 
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -515,7 +560,7 @@ send_thread(struct send_thread_args *args)
 		dret = rte_reorder_drain(args->buffer, rombufs, MAX_PKTS_BURST);
 		for (i = 0; i < dret; i++) {
 
-			struct output_buffer *outbuf;
+			struct rte_eth_dev_tx_buffer *outbuf;
 			uint8_t outp1;
 
 			outp1 = rombufs[i]->port;
@@ -525,12 +570,15 @@ send_thread(struct send_thread_args *args)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp1];
-			outbuf->mbufs[outbuf->count++] = rombufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp1);
+			outbuf = tx_buffer[outp1];
+			sent = rte_eth_tx_buffer(outp1, 0, outbuf, rombufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
+
+	free_tx_buffers(tx_buffer);
+
 	return 0;
 }
 
@@ -542,12 +590,16 @@ tx_thread(struct rte_ring *ring_in)
 {
 	uint32_t i, dqnum;
 	uint8_t outp;
-	static struct output_buffer tx_buffers[RTE_MAX_ETHPORTS];
+	unsigned sent;
 	struct rte_mbuf *mbufs[MAX_PKTS_BURST];
-	struct output_buffer *outbuf;
+	struct rte_eth_dev_tx_buffer *outbuf;
+	static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
 
 	RTE_LOG(INFO, REORDERAPP, "%s() started on lcore %u\n", __func__,
 							rte_lcore_id());
+
+	configure_tx_buffers(tx_buffer);
+
 	while (!quit_signal) {
 
 		/* deque the mbufs from workers_to_tx ring */
@@ -567,10 +619,10 @@ tx_thread(struct rte_ring *ring_in)
 				continue;
 			}
 
-			outbuf = &tx_buffers[outp];
-			outbuf->mbufs[outbuf->count++] = mbufs[i];
-			if (outbuf->count == MAX_PKTS_BURST)
-				flush_one_port(outbuf, outp);
+			outbuf = tx_buffer[outp];
+			sent = rte_eth_tx_buffer(outp, 0, outbuf, mbufs[i]);
+			if (sent)
+				app_stats.tx.ro_tx_pkts += sent;
 		}
 	}
 
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index 0de5e7f..b968b00 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -36,6 +36,7 @@
 
 #include <rte_common.h>
 #include <rte_eal.h>
+#include <rte_malloc.h>
 #include <rte_mempool.h>
 #include <rte_ethdev.h>
 #include <rte_cycles.h>
@@ -118,9 +119,7 @@ static struct rte_eth_conf port_conf = {
 static uint8_t port_rx;
 static uint8_t port_tx;
 static struct rte_mbuf *pkts_rx[PKT_RX_BURST_MAX];
-static struct rte_mbuf *pkts_tx[PKT_TX_BURST_MAX];
-static uint16_t pkts_tx_len = 0;
-
+struct rte_eth_dev_tx_buffer *tx_buffer;
 
 struct rte_meter_srtcm_params app_srtcm_params[] = {
 	{.cir = 1000000 * 46,  .cbs = 2048, .ebs = 2048},
@@ -188,27 +187,8 @@ main_loop(__attribute__((unused)) void *dummy)
 		current_time = rte_rdtsc();
 		time_diff = current_time - last_time;
 		if (unlikely(time_diff > TIME_TX_DRAIN)) {
-			int ret;
-
-			if (pkts_tx_len == 0) {
-				last_time = current_time;
-
-				continue;
-			}
-
-			/* Write packet burst to NIC TX */
-			ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, pkts_tx_len);
-
-			/* Free buffers for any packets not written successfully */
-			if (unlikely(ret < pkts_tx_len)) {
-				for ( ; ret < pkts_tx_len; ret ++) {
-					rte_pktmbuf_free(pkts_tx[ret]);
-				}
-			}
-
-			/* Empty the output buffer */
-			pkts_tx_len = 0;
-
+			/* Flush tx buffer */
+			rte_eth_tx_buffer_flush(port_tx, NIC_TX_QUEUE, tx_buffer);
 			last_time = current_time;
 		}
 
@@ -222,26 +202,8 @@ main_loop(__attribute__((unused)) void *dummy)
 			/* Handle current packet */
 			if (app_pkt_handle(pkt, current_time) == DROP)
 				rte_pktmbuf_free(pkt);
-			else {
-				pkts_tx[pkts_tx_len] = pkt;
-				pkts_tx_len ++;
-			}
-
-			/* Write packets from output buffer to NIC TX when full burst is available */
-			if (unlikely(pkts_tx_len == PKT_TX_BURST_MAX)) {
-				/* Write packet burst to NIC TX */
-				int ret = rte_eth_tx_burst(port_tx, NIC_TX_QUEUE, pkts_tx, PKT_TX_BURST_MAX);
-
-				/* Free buffers for any packets not written successfully */
-				if (unlikely(ret < PKT_TX_BURST_MAX)) {
-					for ( ; ret < PKT_TX_BURST_MAX; ret ++) {
-						rte_pktmbuf_free(pkts_tx[ret]);
-					}
-				}
-
-				/* Empty the output buffer */
-				pkts_tx_len = 0;
-			}
+			else
+				rte_eth_tx_buffer(port_tx, NIC_TX_QUEUE, tx_buffer, pkt);
 		}
 	}
 }
@@ -397,6 +359,15 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d TX queue setup error (%d)\n", port_tx, ret);
 
+	tx_buffer = rte_zmalloc_socket("tx_buffer",
+			RTE_ETH_TX_BUFFER_SIZE(PKT_TX_BURST_MAX), 0,
+			rte_eth_dev_socket_id(port_tx));
+	if (tx_buffer == NULL)
+		rte_exit(EXIT_FAILURE, "Port %d TX buffer allocation error\n",
+				port_tx);
+
+	rte_eth_tx_buffer_init(tx_buffer, PKT_TX_BURST_MAX);
+
 	ret = rte_eth_dev_start(port_rx);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Port %d start error (%d)\n", port_rx, ret);
-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/2] add support for buffered tx to ethdev
  2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
  2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 1/2] ethdev: add buffered tx api Tomasz Kulasek
  2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 2/2] examples: rework to use buffered tx Tomasz Kulasek
@ 2016-03-11 16:39       ` Thomas Monjalon
  2 siblings, 0 replies; 43+ messages in thread
From: Thomas Monjalon @ 2016-03-11 16:39 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

> Tomasz Kulasek (2):
>   ethdev: add buffered tx api
>   examples: rework to use buffered tx

Applied, thanks

Note: I've removed __rte_unused from callback prototype because
it was confusing doxygen.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2016-03-11 16:41 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15 14:43 [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Tomasz Kulasek
2016-01-15 14:43 ` [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api Tomasz Kulasek
2016-01-15 18:13   ` Stephen Hemminger
2016-01-15 18:14   ` Stephen Hemminger
2016-01-15 18:44   ` Ananyev, Konstantin
2016-02-02 10:00     ` Kulasek, TomaszX
2016-02-02 13:49       ` Ananyev, Konstantin
2016-02-09 17:02         ` Kulasek, TomaszX
2016-02-09 23:56           ` Ananyev, Konstantin
2016-02-12 11:44             ` Ananyev, Konstantin
2016-02-12 16:40               ` Ivan Boule
2016-02-12 17:33                 ` Bruce Richardson
2016-01-15 14:43 ` [dpdk-dev] [PATCH 2/2] examples: sample apps rework to use " Tomasz Kulasek
2016-01-15 18:12 ` [dpdk-dev] [PATCH 0/2] add support for buffered tx to ethdev Stephen Hemminger
2016-02-24 17:08 ` [dpdk-dev] [PATCH v2 " Tomasz Kulasek
2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api Tomasz Kulasek
2016-03-08 22:52     ` Thomas Monjalon
2016-03-09 13:36       ` Ananyev, Konstantin
2016-03-09 14:25         ` Thomas Monjalon
2016-03-09 15:23           ` Ananyev, Konstantin
2016-03-09 15:26             ` Thomas Monjalon
2016-03-09 15:32               ` Kulasek, TomaszX
2016-03-09 15:37                 ` Thomas Monjalon
2016-03-09 15:42               ` Ananyev, Konstantin
2016-03-09 15:52                 ` Thomas Monjalon
2016-03-09 16:17                   ` Ananyev, Konstantin
2016-03-09 16:21                     ` Thomas Monjalon
2016-03-09 16:35       ` Kulasek, TomaszX
2016-03-09 17:06         ` Thomas Monjalon
2016-03-09 18:12           ` Kulasek, TomaszX
2016-02-24 17:08   ` [dpdk-dev] [PATCH v2 2/2] examples: rework to use " Tomasz Kulasek
2016-02-25 16:17   ` [dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
2016-03-10 10:57   ` [dpdk-dev] [PATCH v3 " Tomasz Kulasek
2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add buffered tx api Tomasz Kulasek
2016-03-10 16:23       ` Thomas Monjalon
2016-03-10 17:15         ` Kulasek, TomaszX
2016-03-10 10:57     ` [dpdk-dev] [PATCH v3 2/2] examples: rework to use buffered tx Tomasz Kulasek
2016-03-10 11:31     ` [dpdk-dev] [PATCH v3 0/2] add support for buffered tx to ethdev Ananyev, Konstantin
2016-03-10 16:01       ` Jastrzebski, MichalX K
2016-03-10 17:19     ` [dpdk-dev] [PATCH v4 " Tomasz Kulasek
2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 1/2] ethdev: add buffered tx api Tomasz Kulasek
2016-03-10 17:19       ` [dpdk-dev] [PATCH v4 2/2] examples: rework to use buffered tx Tomasz Kulasek
2016-03-11 16:39       ` [dpdk-dev] [PATCH v4 0/2] add support for buffered tx to ethdev Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).