DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring
@ 2017-01-20 16:01 Billy McFall
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 1/3] ethdev: " Billy McFall
                   ` (4 more replies)
  0 siblings, 5 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-20 16:01 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Based on a request from Damjan Marion and seconded by Keith Wiles, see
dpdk-dev mailing list from 11/21/2016, add a new API to free consumed
buffers on TX ring. This addresses two scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 drivers/net/e1000/e1000_ethdev.h  |   2 +
 drivers/net/e1000/igb_ethdev.c    |   1 +
 drivers/net/e1000/igb_rxtx.c      | 126 ++++++++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c |  11 ++++
 lib/librte_ether/rte_ethdev.c     |  14 +++++
 lib/librte_ether/rte_ethdev.h     |  31 ++++++++++
 6 files changed, 185 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
@ 2017-01-20 16:01 ` Billy McFall
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-20 16:01 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 lib/librte_ether/rte_ethdev.c | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4790faf..10b907e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1253,6 +1253,20 @@ rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
 }
 
 int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+int
 rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 {
 	int ret = 0;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c17bbda..b23886c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1487,6 +1490,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3091,6 +3095,33 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt);
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 1/3] ethdev: " Billy McFall
@ 2017-01-20 16:01 ` Billy McFall
  2017-01-22  3:47   ` Lu, Wenzhuo
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 3/3] net/vhost: vHost " Billy McFall
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-01-20 16:01 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. e1000 igb driver does not implement a
tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
a new function needed to be written.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/e1000/e1000_ethdev.h |   2 +
 drivers/net/e1000/igb_ethdev.c   |   1 +
 drivers/net/e1000/igb_rxtx.c     | 126 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8843dd1..12a1a30 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 45f3f24..00cd2aa 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] net/vhost: vHost support to free consumed buffers
  2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 1/3] ethdev: " Billy McFall
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-01-20 16:01 ` Billy McFall
  2017-01-23 15:25 ` [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-20 16:01 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 848a3da..6674536 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -934,6 +934,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -974,6 +984,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-01-22  3:47   ` Lu, Wenzhuo
  2017-01-23 13:49     ` Billy McFall
  0 siblings, 1 reply; 54+ messages in thread
From: Lu, Wenzhuo @ 2017-01-22  3:47 UTC (permalink / raw)
  To: Billy McFall, thomas.monjalon; +Cc: dev

Hi Billy,

> -----Original Message-----
> From: Billy McFall [mailto:bmcfall@redhat.com]
> Sent: Saturday, January 21, 2017 12:01 AM
> To: thomas.monjalon@6wind.com; Lu, Wenzhuo
> Cc: dev@dpdk.org; Billy McFall
> Subject: [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed
> buffers
> 
> Add support to the e1000 igb driver for the new API to force free consumed
> buffers on Tx ring. e1000 igb driver does not implement a tx_rs_thresh to free
> mbufs, it frees a slot in the ring as needed, so a new function needed to be
> written.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  drivers/net/e1000/e1000_ethdev.h |   2 +
>  drivers/net/e1000/igb_ethdev.c   |   1 +
>  drivers/net/e1000/igb_rxtx.c     | 126
> +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+)
> 
> diff --git a/drivers/net/e1000/e1000_ethdev.h
> b/drivers/net/e1000/e1000_ethdev.h
> index 81a6dbb..39b2f43 100644
> --- a/drivers/net/e1000/e1000_ethdev.h
> +++ b/drivers/net/e1000/e1000_ethdev.h
> @@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t tx_queue_id,
>  		uint16_t nb_tx_desc, unsigned int socket_id,
>  		const struct rte_eth_txconf *tx_conf);
> 
> +int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
> +
>  int eth_igb_rx_init(struct rte_eth_dev *dev);
> 
>  void eth_igb_tx_init(struct rte_eth_dev *dev); diff --git
> a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index
> 8843dd1..12a1a30 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
>  	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
>  	.tx_queue_setup       = eth_igb_tx_queue_setup,
>  	.tx_queue_release     = eth_igb_tx_queue_release,
> +	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
>  	.dev_led_on           = eth_igb_led_on,
>  	.dev_led_off          = eth_igb_led_off,
>  	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
> diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index
> 45f3f24..00cd2aa 100644
> --- a/drivers/net/e1000/igb_rxtx.c
> +++ b/drivers/net/e1000/igb_rxtx.c
> @@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
>  	igb_tx_queue_release(txq);
>  }
> 
> +static int
> +igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt) {
> +	struct igb_tx_entry *sw_ring;
> +	volatile union e1000_adv_tx_desc *txr;
> +	uint16_t tx_first; /* First segment analyzed. */
> +	uint16_t tx_id;    /* Current segment being processed. */
> +	uint16_t tx_last;  /* Last segment in the current packet. */
> +	uint16_t tx_next;  /* First segment of the next packet. */
> +	int count;
> +
> +	if (txq != NULL) {
> +		count = 0;
> +		sw_ring = txq->sw_ring;
> +		txr = txq->tx_ring;
> +
> +		/*
> +		 * tx_tail is the last sent packet on the sw_ring. Goto the end
> +		 * of that packet (the last segment in the packet chain) and
> +		 * then the next segment will be the start of the oldest
> segment
> +		 * in the sw_ring. This is the first packet that will be
> +		 * attempted to be freed.
> +		 */
> +
> +		/* Get last segment in most recently added packet. */
> +		tx_first = sw_ring[txq->tx_tail].last_id;
> +
> +		/* Get the next segment, which is the oldest segment in ring.
> */
> +		tx_first = sw_ring[tx_first].next_id;
> +
> +		/* Set the current index to the first. */
> +		tx_id = tx_first;
> +
> +		/*
> +		 * Loop through each packet. For each packet, verify that an
> +		 * mbuf exists and that the last segment is free. If so, free
> +		 * it and move on.
> +		 */
> +		while (1) {
> +			tx_last = sw_ring[tx_id].last_id;
> +
> +			if (sw_ring[tx_last].mbuf) {
> +				if (txr[tx_last].wb.status &
> +						E1000_TXD_STAT_DD) {
> +					/*
> +					 * Increment the number of packets
> +					 * freed.
> +					 */
> +					count++;
> +
> +					/* Get the start of the next packet. */
> +					tx_next = sw_ring[tx_last].next_id;
> +
> +					/*
> +					 * Loop through all segments in a
> +					 * packet.
> +					 */
> +					do {
> +
> 	rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> +						sw_ring[tx_id].mbuf = NULL;
> +						sw_ring[tx_id].last_id = tx_id;
> +
> +						/* Move to next segemnt. */
> +						tx_id = sw_ring[tx_id].next_id;
> +
> +					} while (tx_id != tx_next);
> +
> +					if (unlikely(count == (int)free_cnt))
> +						break;
> +				} else
> +					/*
> +					 * mbuf still in use, nothing left to
> +					 * free.
> +					 */
> +					break;
> +			} else {
> +				/*
> +				 * There are multiple reasons to be here:
> +				 * 1) All the packets on the ring have been
> +				 *    freed - tx_id is equal to tx_first
> +				 *    and some packets have been freed.
> +				 *    - Done, exit
> +				 * 2) Interfaces has not sent a rings worth of
> +				 *    packets yet, so the segment after tail is
> +				 *    still empty. Or a previous call to this
> +				 *    function freed some of the segments but
> +				 *    not all so there is a hole in the list.
> +				 *    Hopefully this is a rare case.
> +				 *    - Walk the list and find the next mbuf. If
> +				 *      there isn't one, then done.
> +				 */
> +				if (likely((tx_id == tx_first) && (count != 0)))
> +					break;
> +
> +				/*
> +				 * Walk the list and find the next mbuf, if any.
> +				 */
> +				do {
> +					/* Move to next segemnt. */
> +					tx_id = sw_ring[tx_id].next_id;
> +
> +					if (sw_ring[tx_id].mbuf)
> +						break;
> +
> +				} while (tx_id != tx_first);
> +
> +				/*
> +				 * Determine why previous loop bailed. If
> there
> +				 * is not an mbuf, done.
> +				 */
> +				if (sw_ring[tx_id].mbuf == NULL)
> +					break;
> +			}
> +		}
> +	} else
> +		count = -ENODEV;
> +
> +	return count;
> +}
> +
> +int
> +eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt) {
> +	return igb_tx_done_cleanup(txq, free_cnt); }
> +
>  static void
>  igb_reset_tx_queue_stat(struct igb_tx_queue *txq)  {
> --
> 2.9.3
I think the code is good. But I don't understand the whole idea. An API means APP need call it. When should APP call This API? Does APP need to check after sending every packet? Or just call it periodically? 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-22  3:47   ` Lu, Wenzhuo
@ 2017-01-23 13:49     ` Billy McFall
  2017-01-24  0:42       ` Lu, Wenzhuo
  0 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-01-23 13:49 UTC (permalink / raw)
  To: Lu, Wenzhuo; +Cc: thomas.monjalon, dev

There are two scenarios the APP calls this API (there may be others, but
the API was designed with these two in mind):

1) APP receives a packet that needs to be flooded. Instead of making a copy
of the packet (whether it is the whole packet, just the header or a clone
buffer is used, as done today), the APP increments the reference count,
adjusts the header and then sends the packet to one of the destination
ports. The APP then polls this API until the reference count is
decremented, and forwards to the next destination port with any header
modifications needed.

2) An APP like a packet generation completes a run and then wants to reset
for another run. It can call this API to free and mbufs that have been
cache by the driver so it can start another run with the full set of mbufs.
Maybe it is switching interfaces that are on another core and want to
reclaim any mbufs that have been cached.

So this API is intended to be called on an as need basis, not on every
packet. When either a particular mbuf is desired or the mbufs for a
particular interface need to be released.

Hope this helps,
Billy McFall

On Sat, Jan 21, 2017 at 10:47 PM, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:

> Hi Billy,
>
> > -----Original Message-----
> > From: Billy McFall [mailto:bmcfall@redhat.com]
> > Sent: Saturday, January 21, 2017 12:01 AM
> > To: thomas.monjalon@6wind.com; Lu, Wenzhuo
> > Cc: dev@dpdk.org; Billy McFall
> > Subject: [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed
> > buffers
> >
> > Add support to the e1000 igb driver for the new API to force free
> consumed
> > buffers on Tx ring. e1000 igb driver does not implement a tx_rs_thresh
> to free
> > mbufs, it frees a slot in the ring as needed, so a new function needed
> to be
> > written.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  drivers/net/e1000/e1000_ethdev.h |   2 +
> >  drivers/net/e1000/igb_ethdev.c   |   1 +
> >  drivers/net/e1000/igb_rxtx.c     | 126
> > +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 129 insertions(+)
> >
> > diff --git a/drivers/net/e1000/e1000_ethdev.h
> > b/drivers/net/e1000/e1000_ethdev.h
> > index 81a6dbb..39b2f43 100644
> > --- a/drivers/net/e1000/e1000_ethdev.h
> > +++ b/drivers/net/e1000/e1000_ethdev.h
> > @@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t tx_queue_id,
> >               uint16_t nb_tx_desc, unsigned int socket_id,
> >               const struct rte_eth_txconf *tx_conf);
> >
> > +int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > +
> >  int eth_igb_rx_init(struct rte_eth_dev *dev);
> >
> >  void eth_igb_tx_init(struct rte_eth_dev *dev); diff --git
> > a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index
> > 8843dd1..12a1a30 100644
> > --- a/drivers/net/e1000/igb_ethdev.c
> > +++ b/drivers/net/e1000/igb_ethdev.c
> > @@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
> >       .rx_descriptor_done   = eth_igb_rx_descriptor_done,
> >       .tx_queue_setup       = eth_igb_tx_queue_setup,
> >       .tx_queue_release     = eth_igb_tx_queue_release,
> > +     .tx_done_cleanup      = eth_igb_tx_done_cleanup,
> >       .dev_led_on           = eth_igb_led_on,
> >       .dev_led_off          = eth_igb_led_off,
> >       .flow_ctrl_get        = eth_igb_flow_ctrl_get,
> > diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
> index
> > 45f3f24..00cd2aa 100644
> > --- a/drivers/net/e1000/igb_rxtx.c
> > +++ b/drivers/net/e1000/igb_rxtx.c
> > @@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
> >       igb_tx_queue_release(txq);
> >  }
> >
> > +static int
> > +igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt) {
> > +     struct igb_tx_entry *sw_ring;
> > +     volatile union e1000_adv_tx_desc *txr;
> > +     uint16_t tx_first; /* First segment analyzed. */
> > +     uint16_t tx_id;    /* Current segment being processed. */
> > +     uint16_t tx_last;  /* Last segment in the current packet. */
> > +     uint16_t tx_next;  /* First segment of the next packet. */
> > +     int count;
> > +
> > +     if (txq != NULL) {
> > +             count = 0;
> > +             sw_ring = txq->sw_ring;
> > +             txr = txq->tx_ring;
> > +
> > +             /*
> > +              * tx_tail is the last sent packet on the sw_ring. Goto
> the end
> > +              * of that packet (the last segment in the packet chain)
> and
> > +              * then the next segment will be the start of the oldest
> > segment
> > +              * in the sw_ring. This is the first packet that will be
> > +              * attempted to be freed.
> > +              */
> > +
> > +             /* Get last segment in most recently added packet. */
> > +             tx_first = sw_ring[txq->tx_tail].last_id;
> > +
> > +             /* Get the next segment, which is the oldest segment in
> ring.
> > */
> > +             tx_first = sw_ring[tx_first].next_id;
> > +
> > +             /* Set the current index to the first. */
> > +             tx_id = tx_first;
> > +
> > +             /*
> > +              * Loop through each packet. For each packet, verify that
> an
> > +              * mbuf exists and that the last segment is free. If so,
> free
> > +              * it and move on.
> > +              */
> > +             while (1) {
> > +                     tx_last = sw_ring[tx_id].last_id;
> > +
> > +                     if (sw_ring[tx_last].mbuf) {
> > +                             if (txr[tx_last].wb.status &
> > +                                             E1000_TXD_STAT_DD) {
> > +                                     /*
> > +                                      * Increment the number of packets
> > +                                      * freed.
> > +                                      */
> > +                                     count++;
> > +
> > +                                     /* Get the start of the next
> packet. */
> > +                                     tx_next = sw_ring[tx_last].next_id;
> > +
> > +                                     /*
> > +                                      * Loop through all segments in a
> > +                                      * packet.
> > +                                      */
> > +                                     do {
> > +
> >       rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > +                                             sw_ring[tx_id].mbuf = NULL;
> > +                                             sw_ring[tx_id].last_id =
> tx_id;
> > +
> > +                                             /* Move to next segemnt. */
> > +                                             tx_id =
> sw_ring[tx_id].next_id;
> > +
> > +                                     } while (tx_id != tx_next);
> > +
> > +                                     if (unlikely(count ==
> (int)free_cnt))
> > +                                             break;
> > +                             } else
> > +                                     /*
> > +                                      * mbuf still in use, nothing left
> to
> > +                                      * free.
> > +                                      */
> > +                                     break;
> > +                     } else {
> > +                             /*
> > +                              * There are multiple reasons to be here:
> > +                              * 1) All the packets on the ring have been
> > +                              *    freed - tx_id is equal to tx_first
> > +                              *    and some packets have been freed.
> > +                              *    - Done, exit
> > +                              * 2) Interfaces has not sent a rings
> worth of
> > +                              *    packets yet, so the segment after
> tail is
> > +                              *    still empty. Or a previous call to
> this
> > +                              *    function freed some of the segments
> but
> > +                              *    not all so there is a hole in the
> list.
> > +                              *    Hopefully this is a rare case.
> > +                              *    - Walk the list and find the next
> mbuf. If
> > +                              *      there isn't one, then done.
> > +                              */
> > +                             if (likely((tx_id == tx_first) && (count
> != 0)))
> > +                                     break;
> > +
> > +                             /*
> > +                              * Walk the list and find the next mbuf,
> if any.
> > +                              */
> > +                             do {
> > +                                     /* Move to next segemnt. */
> > +                                     tx_id = sw_ring[tx_id].next_id;
> > +
> > +                                     if (sw_ring[tx_id].mbuf)
> > +                                             break;
> > +
> > +                             } while (tx_id != tx_first);
> > +
> > +                             /*
> > +                              * Determine why previous loop bailed. If
> > there
> > +                              * is not an mbuf, done.
> > +                              */
> > +                             if (sw_ring[tx_id].mbuf == NULL)
> > +                                     break;
> > +                     }
> > +             }
> > +     } else
> > +             count = -ENODEV;
> > +
> > +     return count;
> > +}
> > +
> > +int
> > +eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt) {
> > +     return igb_tx_done_cleanup(txq, free_cnt); }
> > +
> >  static void
> >  igb_reset_tx_queue_stat(struct igb_tx_queue *txq)  {
> > --
> > 2.9.3
> I think the code is good. But I don't understand the whole idea. An API
> means APP need call it. When should APP call This API? Does APP need to
> check after sending every packet? Or just call it periodically?
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring
  2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
                   ` (2 preceding siblings ...)
  2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 3/3] net/vhost: vHost " Billy McFall
@ 2017-01-23 15:25 ` Thomas Monjalon
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
  4 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-01-23 15:25 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, dev

Hi,

Thanks for bringing a new convenient API.

2017-01-20 11:01, Billy McFall:
> Based on a request from Damjan Marion and seconded by Keith Wiles, see
> dpdk-dev mailing list from 11/21/2016,

Generally we use links to ML archives to show prior discussions.

> add a new API to free consumed
> buffers on TX ring. This addresses two scenarios:
> 1) Flooding a packet and want to reuse existing mbuf to avoid a packet
> copy. Increment the reference count of the packet and poll new API until
> reference count is decremented.
> 2) Application runs out of mbufs, or resets and is preparing for
> additional run, call API to free consumed packets so processing can
> continue.
> 
> API will return the number of packets freed (0-n) or error code if
> feature not supported (-ENOTSUP) or input invalid (-ENODEV).
> 
> API for e1000 igb driver and vHost driver have been implemented. Other
> drivers can be implemented over time. Some drivers implement a Tx done
> flush routine that should be reused where possible. e1000 igb driver
> and vHost driver do not have such functions.

Before considering to use such API, user apps may check how well it
is supported by drivers in this table:
	http://dpdk.org/doc/guides/nics/overview.html

Please add this feature in the appropriate files:
	doc/guides/nics/features/default.ini in patch 1
	doc/guides/nics/features/e1000.ini in patch 2
	doc/guides/nics/features/vhost.ini in patch 3

Then it will be introduced in the beginning of the 17.05 cycle.
So more drivers could be implemented before the 17.05 release.
Does it sound good?

PS: please use --in-reply-to when sending new versions to keep track
of versions changes and discussions.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v4 0/3] new API to free consumed buffers in Tx ring
  2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
                   ` (3 preceding siblings ...)
  2017-01-23 15:25 ` [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
@ 2017-01-23 21:13 ` Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 1/3] ethdev: " Billy McFall
                     ` (3 more replies)
  4 siblings, 4 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-23 21:13 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

See request from 11/21/2016:
  http://dpdk.org/ml/archives/dev/2016-November/050585.html

Add a new API to free consumed buffers on TX ring. This addresses two
scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

---
v4:
* Added new API to documentation of per nic supported features.


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 doc/guides/nics/features/default.ini |   1 +
 doc/guides/nics/features/e1000.ini   |   1 +
 doc/guides/nics/features/vhost.ini   |   1 +
 drivers/net/e1000/e1000_ethdev.h     |   2 +
 drivers/net/e1000/igb_ethdev.c       |   1 +
 drivers/net/e1000/igb_rxtx.c         | 126 +++++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c    |  11 +++
 lib/librte_ether/rte_ethdev.c        |  14 ++++
 lib/librte_ether/rte_ethdev.h        |  31 +++++++++
 9 files changed, 188 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
@ 2017-01-23 21:13   ` Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-23 21:13 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/default.ini |  1 +
 lib/librte_ether/rte_ethdev.c        | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h        | 31 +++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 6e4a043..e2d8c83 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -55,6 +55,7 @@ FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
+Free TX ring buffers =
 BSD nic_uio          =
 Linux UIO            =
 Linux VFIO           =
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 4790faf..10b907e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1253,6 +1253,20 @@ rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
 }
 
 int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+int
 rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 {
 	int ret = 0;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c17bbda..b23886c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1487,6 +1490,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3091,6 +3095,33 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt);
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 1/3] ethdev: " Billy McFall
@ 2017-01-23 21:13   ` Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 3/3] net/vhost: vHost " Billy McFall
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
  3 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-23 21:13 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. e1000 igb driver does not implement a
tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
a new function needed to be written.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/e1000.ini |   1 +
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c     |   1 +
 drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/doc/guides/nics/features/e1000.ini b/doc/guides/nics/features/e1000.ini
index 7f6d55c..d4dc7e5 100644
--- a/doc/guides/nics/features/e1000.ini
+++ b/doc/guides/nics/features/e1000.ini
@@ -21,6 +21,7 @@ QinQ offload         = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
 Basic stats          = Y
+Free TX ring buffers = Y
 BSD nic_uio          = Y
 Linux UIO            = Y
 Linux VFIO           = Y
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8843dd1..12a1a30 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 45f3f24..00cd2aa 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] net/vhost: vHost support to free consumed buffers
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 1/3] ethdev: " Billy McFall
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-01-23 21:13   ` Billy McFall
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
  3 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-23 21:13 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/vhost.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/features/vhost.ini b/doc/guides/nics/features/vhost.ini
index 23166fb..83fea03 100644
--- a/doc/guides/nics/features/vhost.ini
+++ b/doc/guides/nics/features/vhost.ini
@@ -9,5 +9,6 @@ Link status event    = Y
 Queue status event   = Y
 Basic stats          = Y
 Extended stats       = Y
+Free TX ring buffers = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 848a3da..6674536 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -934,6 +934,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -974,6 +984,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-23 13:49     ` Billy McFall
@ 2017-01-24  0:42       ` Lu, Wenzhuo
  0 siblings, 0 replies; 54+ messages in thread
From: Lu, Wenzhuo @ 2017-01-24  0:42 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, dev

Hi Billy,
Thanks for your explanation. It makes a picture to me ☺


Best regards
Wenzhuo Lu

From: Billy McFall [mailto:bmcfall@redhat.com]
Sent: Monday, January 23, 2017 9:49 PM
To: Lu, Wenzhuo
Cc: thomas.monjalon@6wind.com; dev@dpdk.org
Subject: Re: [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers

There are two scenarios the APP calls this API (there may be others, but the API was designed with these two in mind):

1) APP receives a packet that needs to be flooded. Instead of making a copy of the packet (whether it is the whole packet, just the header or a clone buffer is used, as done today), the APP increments the reference count, adjusts the header and then sends the packet to one of the destination ports. The APP then polls this API until the reference count is decremented, and forwards to the next destination port with any header modifications needed.

2) An APP like a packet generation completes a run and then wants to reset for another run. It can call this API to free and mbufs that have been cache by the driver so it can start another run with the full set of mbufs. Maybe it is switching interfaces that are on another core and want to reclaim any mbufs that have been cached.

So this API is intended to be called on an as need basis, not on every packet. When either a particular mbuf is desired or the mbufs for a particular interface need to be released.

Hope this helps,
Billy McFall

On Sat, Jan 21, 2017 at 10:47 PM, Lu, Wenzhuo <wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com>> wrote:
Hi Billy,

> -----Original Message-----
> From: Billy McFall [mailto:bmcfall@redhat.com<mailto:bmcfall@redhat.com>]
> Sent: Saturday, January 21, 2017 12:01 AM
> To: thomas.monjalon@6wind.com<mailto:thomas.monjalon@6wind.com>; Lu, Wenzhuo
> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Billy McFall
> Subject: [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed
> buffers
>
> Add support to the e1000 igb driver for the new API to force free consumed
> buffers on Tx ring. e1000 igb driver does not implement a tx_rs_thresh to free
> mbufs, it frees a slot in the ring as needed, so a new function needed to be
> written.
>
> Signed-off-by: Billy McFall <bmcfall@redhat.com<mailto:bmcfall@redhat.com>>
> ---
>  drivers/net/e1000/e1000_ethdev.h |   2 +
>  drivers/net/e1000/igb_ethdev.c   |   1 +
>  drivers/net/e1000/igb_rxtx.c     | 126
> +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+)
>
> diff --git a/drivers/net/e1000/e1000_ethdev.h
> b/drivers/net/e1000/e1000_ethdev.h
> index 81a6dbb..39b2f43 100644
> --- a/drivers/net/e1000/e1000_ethdev.h
> +++ b/drivers/net/e1000/e1000_ethdev.h
> @@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t tx_queue_id,
>               uint16_t nb_tx_desc, unsigned int socket_id,
>               const struct rte_eth_txconf *tx_conf);
>
> +int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
> +
>  int eth_igb_rx_init(struct rte_eth_dev *dev);
>
>  void eth_igb_tx_init(struct rte_eth_dev *dev); diff --git
> a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index
> 8843dd1..12a1a30 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
>       .rx_descriptor_done   = eth_igb_rx_descriptor_done,
>       .tx_queue_setup       = eth_igb_tx_queue_setup,
>       .tx_queue_release     = eth_igb_tx_queue_release,
> +     .tx_done_cleanup      = eth_igb_tx_done_cleanup,
>       .dev_led_on           = eth_igb_led_on,
>       .dev_led_off          = eth_igb_led_off,
>       .flow_ctrl_get        = eth_igb_flow_ctrl_get,
> diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index
> 45f3f24..00cd2aa 100644
> --- a/drivers/net/e1000/igb_rxtx.c
> +++ b/drivers/net/e1000/igb_rxtx.c
> @@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
>       igb_tx_queue_release(txq);
>  }
>
> +static int
> +igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt) {
> +     struct igb_tx_entry *sw_ring;
> +     volatile union e1000_adv_tx_desc *txr;
> +     uint16_t tx_first; /* First segment analyzed. */
> +     uint16_t tx_id;    /* Current segment being processed. */
> +     uint16_t tx_last;  /* Last segment in the current packet. */
> +     uint16_t tx_next;  /* First segment of the next packet. */
> +     int count;
> +
> +     if (txq != NULL) {
> +             count = 0;
> +             sw_ring = txq->sw_ring;
> +             txr = txq->tx_ring;
> +
> +             /*
> +              * tx_tail is the last sent packet on the sw_ring. Goto the end
> +              * of that packet (the last segment in the packet chain) and
> +              * then the next segment will be the start of the oldest
> segment
> +              * in the sw_ring. This is the first packet that will be
> +              * attempted to be freed.
> +              */
> +
> +             /* Get last segment in most recently added packet. */
> +             tx_first = sw_ring[txq->tx_tail].last_id;
> +
> +             /* Get the next segment, which is the oldest segment in ring.
> */
> +             tx_first = sw_ring[tx_first].next_id;
> +
> +             /* Set the current index to the first. */
> +             tx_id = tx_first;
> +
> +             /*
> +              * Loop through each packet. For each packet, verify that an
> +              * mbuf exists and that the last segment is free. If so, free
> +              * it and move on.
> +              */
> +             while (1) {
> +                     tx_last = sw_ring[tx_id].last_id;
> +
> +                     if (sw_ring[tx_last].mbuf) {
> +                             if (txr[tx_last].wb.status &
> +                                             E1000_TXD_STAT_DD) {
> +                                     /*
> +                                      * Increment the number of packets
> +                                      * freed.
> +                                      */
> +                                     count++;
> +
> +                                     /* Get the start of the next packet. */
> +                                     tx_next = sw_ring[tx_last].next_id;
> +
> +                                     /*
> +                                      * Loop through all segments in a
> +                                      * packet.
> +                                      */
> +                                     do {
> +
>       rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> +                                             sw_ring[tx_id].mbuf = NULL;
> +                                             sw_ring[tx_id].last_id = tx_id;
> +
> +                                             /* Move to next segemnt. */
> +                                             tx_id = sw_ring[tx_id].next_id;
> +
> +                                     } while (tx_id != tx_next);
> +
> +                                     if (unlikely(count == (int)free_cnt))
> +                                             break;
> +                             } else
> +                                     /*
> +                                      * mbuf still in use, nothing left to
> +                                      * free.
> +                                      */
> +                                     break;
> +                     } else {
> +                             /*
> +                              * There are multiple reasons to be here:
> +                              * 1) All the packets on the ring have been
> +                              *    freed - tx_id is equal to tx_first
> +                              *    and some packets have been freed.
> +                              *    - Done, exit
> +                              * 2) Interfaces has not sent a rings worth of
> +                              *    packets yet, so the segment after tail is
> +                              *    still empty. Or a previous call to this
> +                              *    function freed some of the segments but
> +                              *    not all so there is a hole in the list.
> +                              *    Hopefully this is a rare case.
> +                              *    - Walk the list and find the next mbuf. If
> +                              *      there isn't one, then done.
> +                              */
> +                             if (likely((tx_id == tx_first) && (count != 0)))
> +                                     break;
> +
> +                             /*
> +                              * Walk the list and find the next mbuf, if any.
> +                              */
> +                             do {
> +                                     /* Move to next segemnt. */
> +                                     tx_id = sw_ring[tx_id].next_id;
> +
> +                                     if (sw_ring[tx_id].mbuf)
> +                                             break;
> +
> +                             } while (tx_id != tx_first);
> +
> +                             /*
> +                              * Determine why previous loop bailed. If
> there
> +                              * is not an mbuf, done.
> +                              */
> +                             if (sw_ring[tx_id].mbuf == NULL)
> +                                     break;
> +                     }
> +             }
> +     } else
> +             count = -ENODEV;
> +
> +     return count;
> +}
> +
> +int
> +eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt) {
> +     return igb_tx_done_cleanup(txq, free_cnt); }
> +
>  static void
>  igb_reset_tx_queue_stat(struct igb_tx_queue *txq)  {
> --
> 2.9.3
I think the code is good. But I don't understand the whole idea. An API means APP need call it. When should APP call This API? Does APP need to check after sending every packet? Or just call it periodically?


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring
  2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
                     ` (2 preceding siblings ...)
  2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 3/3] net/vhost: vHost " Billy McFall
@ 2017-01-27 18:37   ` Billy McFall
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
                       ` (4 more replies)
  3 siblings, 5 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-27 18:37 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

See request from 11/21/2016:
  http://dpdk.org/ml/archives/dev/2016-November/050585.html

Add a new API to free consumed buffers on TX ring. This addresses two
scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

---
v4:
* Added new API to documentation of per nic supported features.

---
v5:
* Added documentation to the Programmer's Guide.


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 doc/guides/nics/features/default.ini  |   1 +
 doc/guides/nics/features/e1000.ini    |   1 +
 doc/guides/nics/features/vhost.ini    |   1 +
 doc/guides/prog_guide/mempool_lib.rst |  29 ++++++++
 drivers/net/e1000/e1000_ethdev.h      |   2 +
 drivers/net/e1000/igb_ethdev.c        |   1 +
 drivers/net/e1000/igb_rxtx.c          | 126 ++++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c     |  11 +++
 lib/librte_ether/rte_ethdev.c         |  14 ++++
 lib/librte_ether/rte_ethdev.h         |  31 +++++++++
 10 files changed, 217 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
@ 2017-01-27 18:37     ` Billy McFall
  2017-02-27 13:48       ` Thomas Monjalon
  2017-03-07 16:42       ` Mcnamara, John
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                       ` (3 subsequent siblings)
  4 siblings, 2 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-27 18:37 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/default.ini  |  1 +
 doc/guides/prog_guide/mempool_lib.rst | 29 +++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.c         | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 6e4a043..e2d8c83 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -55,6 +55,7 @@ FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
+Free TX ring buffers =
 BSD nic_uio          =
 Linux UIO            =
 Linux VFIO           =
diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
index ffdc109..92c6fd5 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -132,6 +132,35 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
 The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
 In contrast to the default caches, user-owned caches can be used by non-EAL threads too.
 
+
+Driver Cache
+~~~~~~~~~~~~
+
+In addition to the a core’s local cache, many of the drivers don't release the mbuf back to the mempool, or local cache,
+immediately after the packet has been transmitted.
+Instead, they leave the mbuf in their txRing and either perform a bulk release when the tx_rs_thresh has been crossed
+or free the mbuf when a slot in the txRing is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use, independent of whether or not the tx_rs_thresh
+has been crossed.
+There are two scenarios when an application may want the mbuf back immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3 multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference count
+  on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+
+* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
+  The application may want to reset to a clean state.
+  In this case, it may want to call the ``rte_eth_tx_done_cleanup()`` API to request each destination interface it has been
+  using to release all of its used mbufs.
+
+To determine if a driver supports this API, check for the *Free TX ring buffers* feature in the *Network Interface
+Controller Drivers* document.
+
+
 Mempool Handlers
 ------------------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 61f44e2..1cbf6d0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1253,6 +1253,20 @@ rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
 }
 
 int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+int
 rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 {
 	int ret = 0;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c17bbda..b23886c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1487,6 +1490,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3091,6 +3095,33 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt);
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
@ 2017-01-27 18:37     ` Billy McFall
  2017-02-27 13:49       ` Thomas Monjalon
  2017-02-28  1:07       ` Lu, Wenzhuo
  2017-01-27 18:38     ` [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost " Billy McFall
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 54+ messages in thread
From: Billy McFall @ 2017-01-27 18:37 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. e1000 igb driver does not implement a
tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
a new function needed to be written.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/e1000.ini |   1 +
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c     |   1 +
 drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/doc/guides/nics/features/e1000.ini b/doc/guides/nics/features/e1000.ini
index 7f6d55c..d4dc7e5 100644
--- a/doc/guides/nics/features/e1000.ini
+++ b/doc/guides/nics/features/e1000.ini
@@ -21,6 +21,7 @@ QinQ offload         = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
 Basic stats          = Y
+Free TX ring buffers = Y
 BSD nic_uio          = Y
 Linux UIO            = Y
 Linux VFIO           = Y
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8843dd1..12a1a30 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 45f3f24..00cd2aa 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1277,6 +1277,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost support to free consumed buffers
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-01-27 18:38     ` Billy McFall
  2017-02-27 13:50       ` Thomas Monjalon
  2017-03-07 21:59     ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
  4 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-01-27 18:38 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/vhost.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/features/vhost.ini b/doc/guides/nics/features/vhost.ini
index 23166fb..83fea03 100644
--- a/doc/guides/nics/features/vhost.ini
+++ b/doc/guides/nics/features/vhost.ini
@@ -9,5 +9,6 @@ Link status event    = Y
 Queue status event   = Y
 Basic stats          = Y
 Extended stats       = Y
+Free TX ring buffers = Y
 x86-32               = Y
 x86-64               = Y
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 848a3da..6674536 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -934,6 +934,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -974,6 +984,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
@ 2017-02-27 13:48       ` Thomas Monjalon
  2017-03-07 14:29         ` Billy McFall
  2017-03-07 16:42       ` Mcnamara, John
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2017-02-27 13:48 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev

2017-01-27 13:37, Billy McFall:
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -55,6 +55,7 @@ FW version           =
>  EEPROM dump          =
>  Registers dump       =
>  Multiprocess aware   =
> +Free TX ring buffers =

I'm afraid this wording will be confusing, because every drivers
free their buffers :)
What about "Free Tx mbuf on demand" ?

And please, move this line upper, just after "Rx interrupt".

We also need to carefully review the doc you provided (thanks).
First quick comment, please wrap lines shorter in the doc.

About the function prototype, I've seen a double space :)
I think you could use rte_errno (while keeping negative return codes).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-02-27 13:49       ` Thomas Monjalon
  2017-02-28  1:07       ` Lu, Wenzhuo
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-02-27 13:49 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: Billy McFall, olivier.matz, dev

2017-01-27 13:37, Billy McFall:
> Add support to the e1000 igb driver for the new API to force free
> consumed buffers on Tx ring. e1000 igb driver does not implement a
> tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
> a new function needed to be written.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  doc/guides/nics/features/e1000.ini |   1 +
>  drivers/net/e1000/e1000_ethdev.h   |   2 +
>  drivers/net/e1000/igb_ethdev.c     |   1 +
>  drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 130 insertions(+)

Please maintainers, we need review here.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost support to free consumed buffers
  2017-01-27 18:38     ` [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost " Billy McFall
@ 2017-02-27 13:50       ` Thomas Monjalon
  2017-02-28  6:41         ` Yuanhan Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2017-02-27 13:50 UTC (permalink / raw)
  To: yuanhan.liu, Maxime Coquelin; +Cc: Billy McFall, wenzhuo.lu, olivier.matz, dev

2017-01-27 13:38, Billy McFall:
> Add support to the vHostdriver for the new API to force free consumed
> buffers on Tx ring. vHost does not cache the mbufs so there is no work
> to do.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>

Yuanhan, Maxime, do you agree with this empty function?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
  2017-02-27 13:49       ` Thomas Monjalon
@ 2017-02-28  1:07       ` Lu, Wenzhuo
  1 sibling, 0 replies; 54+ messages in thread
From: Lu, Wenzhuo @ 2017-02-28  1:07 UTC (permalink / raw)
  To: Billy McFall, thomas.monjalon, olivier.matz; +Cc: dev

Hi Billy,

> -----Original Message-----
> From: Billy McFall [mailto:bmcfall@redhat.com]
> Sent: Saturday, January 28, 2017 2:38 AM
> To: thomas.monjalon@6wind.com; Lu, Wenzhuo; olivier.matz@6wind.com
> Cc: dev@dpdk.org; Billy McFall
> Subject: [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers
> 
> Add support to the e1000 igb driver for the new API to force free consumed
> buffers on Tx ring. e1000 igb driver does not implement a tx_rs_thresh to free
> mbufs, it frees a slot in the ring as needed, so a new function needed to be
> written.
There's another question about this commit log. As we know tx_rs_thresh is not supported by igb. But I think it can be implemented.
Your idea is to add an API, so APP can free the buffers explicitly.  It's a different idea with threshold, right? Seems this log is a little misleading.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost support to free consumed buffers
  2017-02-27 13:50       ` Thomas Monjalon
@ 2017-02-28  6:41         ` Yuanhan Liu
  2017-03-01 10:15           ` Maxime Coquelin
  0 siblings, 1 reply; 54+ messages in thread
From: Yuanhan Liu @ 2017-02-28  6:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Maxime Coquelin, Billy McFall, wenzhuo.lu, olivier.matz, dev

On Mon, Feb 27, 2017 at 02:50:54PM +0100, Thomas Monjalon wrote:
> 2017-01-27 13:38, Billy McFall:
> > Add support to the vHostdriver for the new API to force free consumed
> > buffers on Tx ring. vHost does not cache the mbufs so there is no work
> > to do.
> > 
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> 
> Yuanhan, Maxime, do you agree with this empty function?

For this particular (vhost) implementation, I think it's okay.

	--yliu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost support to free consumed buffers
  2017-02-28  6:41         ` Yuanhan Liu
@ 2017-03-01 10:15           ` Maxime Coquelin
  0 siblings, 0 replies; 54+ messages in thread
From: Maxime Coquelin @ 2017-03-01 10:15 UTC (permalink / raw)
  To: Yuanhan Liu, Thomas Monjalon; +Cc: Billy McFall, wenzhuo.lu, olivier.matz, dev



On 02/28/2017 07:41 AM, Yuanhan Liu wrote:
> On Mon, Feb 27, 2017 at 02:50:54PM +0100, Thomas Monjalon wrote:
>> 2017-01-27 13:38, Billy McFall:
>>> Add support to the vHostdriver for the new API to force free consumed
>>> buffers on Tx ring. vHost does not cache the mbufs so there is no work
>>> to do.
>>>
>>> Signed-off-by: Billy McFall <bmcfall@redhat.com>
>>
>> Yuanhan, Maxime, do you agree with this empty function?
>
> For this particular (vhost) implementation, I think it's okay.

Agree with Yuanhan, it looks good for vhost case:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-02-27 13:48       ` Thomas Monjalon
@ 2017-03-07 14:29         ` Billy McFall
  2017-03-07 16:03           ` Thomas Monjalon
  2017-03-07 16:35           ` Mcnamara, John
  0 siblings, 2 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-07 14:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: wenzhuo.lu, olivier.matz, dev

Thomas,

Thanks for your comments. See inline.

On Mon, Feb 27, 2017 at 8:48 AM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:

> 2017-01-27 13:37, Billy McFall:
> > --- a/doc/guides/nics/features/default.ini
> > +++ b/doc/guides/nics/features/default.ini
> > @@ -55,6 +55,7 @@ FW version           =
> >  EEPROM dump          =
> >  Registers dump       =
> >  Multiprocess aware   =
> > +Free TX ring buffers =
>
> I'm afraid this wording will be confusing, because every drivers
> free their buffers :)
> What about "Free Tx mbuf on demand" ?
>
>
I definitely like your wording of the feature better than mine. All the
existing features were under 20 characters and I was trying to stay under
that.


> And please, move this line upper, just after "Rx interrupt".
>
> Done


> We also need to carefully review the doc you provided (thanks).
> First quick comment, please wrap lines shorter in the doc.
>
> Done


> About the function prototype, I've seen a double space :)
>

Done


> I think you could use rte_errno (while keeping negative return codes).
>

I can do that if you want, but if I understand your comment, it will make
the implementation of the function not as clean. I cannot use the existing
RTE_ETH_VALID_PORTID_OR_ERR_RET(..) and RTE_FUNC_PTR_OR_ERR_RET(..) MACROs
because they are handling the return on error. Or am I missing something?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-07 14:29         ` Billy McFall
@ 2017-03-07 16:03           ` Thomas Monjalon
  2017-03-09 15:49             ` Billy McFall
  2017-03-07 16:35           ` Mcnamara, John
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-07 16:03 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev, adrien.mazarguil

2017-03-07 09:29, Billy McFall:
> On Mon, Feb 27, 2017 at 8:48 AM, Thomas Monjalon <thomas.monjalon@6wind.com>
> wrote:
> > I think you could use rte_errno (while keeping negative return codes).
> >
> 
> I can do that if you want, but if I understand your comment, it will make
> the implementation of the function not as clean. I cannot use the existing
> RTE_ETH_VALID_PORTID_OR_ERR_RET(..) and RTE_FUNC_PTR_OR_ERR_RET(..) MACROs
> because they are handling the return on error. Or am I missing something?

Yes. Maybe we need new macros for basic error management with rte_errno.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-07 14:29         ` Billy McFall
  2017-03-07 16:03           ` Thomas Monjalon
@ 2017-03-07 16:35           ` Mcnamara, John
  1 sibling, 0 replies; 54+ messages in thread
From: Mcnamara, John @ 2017-03-07 16:35 UTC (permalink / raw)
  To: Billy McFall, Thomas Monjalon; +Cc: Lu, Wenzhuo, olivier.matz, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Billy McFall
> Sent: Tuesday, March 7, 2017 2:30 PM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; olivier.matz@6wind.com;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed
> buffers in Tx ring
> 
> Thomas,
> 
> Thanks for your comments. See inline.
> 
> On Mon, Feb 27, 2017 at 8:48 AM, Thomas Monjalon
> <thomas.monjalon@6wind.com>
> wrote:
> 
> > 2017-01-27 13:37, Billy McFall:
> > > --- a/doc/guides/nics/features/default.ini
> > > +++ b/doc/guides/nics/features/default.ini
> > > @@ -55,6 +55,7 @@ FW version           =
> > >  EEPROM dump          =
> > >  Registers dump       =
> > >  Multiprocess aware   =
> > > +Free TX ring buffers =
> >
> > I'm afraid this wording will be confusing, because every drivers free
> > their buffers :) What about "Free Tx mbuf on demand" ?
> >
> >
> I definitely like your wording of the feature better than mine. All the
> existing features were under 20 characters and I was trying to stay under
> that.

Hi Billy,

"Free Tx mbuf on demand" looks clearer to me as well. You will probably have to
modify doc/guides/conf.py to support labels longer than 20 characters. Like the
following, or better still a global variable for feature width.


$ git diff
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 34c62de..3f3ce00 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -300,7 +300,7 @@ def print_table_body(outfile, num_cols, ini_files, ini_data, default_features):
 def print_table_row(outfile, feature, line):
     """ Print a single row of the table with fixed formatting. """
     line = line.rstrip()
-    print('   {:<20}{}'.format(feature, line), file=outfile)
+    print('   {:<25}{}'.format(feature, line), file=outfile)
 
 
 def print_table_divider(outfile, num_cols):
@@ -309,7 +309,7 @@ def print_table_divider(outfile, num_cols):
     column_dividers = ['='] * num_cols
     line += ' '.join(column_dividers)
 
-    feature = '=' * 20
+    feature = '=' * 25
 
     print_table_row(outfile, feature, line)

John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
  2017-02-27 13:48       ` Thomas Monjalon
@ 2017-03-07 16:42       ` Mcnamara, John
  1 sibling, 0 replies; 54+ messages in thread
From: Mcnamara, John @ 2017-03-07 16:42 UTC (permalink / raw)
  To: Billy McFall, thomas.monjalon, Lu, Wenzhuo, olivier.matz; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Billy McFall
> Sent: Friday, January 27, 2017 6:38 PM
> To: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; Billy McFall <bmcfall@redhat.com>
> Subject: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed
> buffers in Tx ring
>

Hi Billy,

Thanks for this. Some minor comments below.


>
>
> +Driver Cache
> +~~~~~~~~~~~~

I think this could be at the same level as the "Local Cache" earlier in
the doc.


> +
> +In addition to the a core’s local cache, many of the drivers don't

The apostrophe after core is non-ascii. That can mess with the PDF output.


> +release the mbuf back to the mempool, or local cache, immediately after
> the packet has been transmitted.
> +Instead, they leave the mbuf in their txRing and either perform a bulk

I'd suggest s/txRing/Tx ring/ here and below.


> +release when the tx_rs_thresh has been crossed or free the mbuf when a

You should fixed width quote ``tx_rs_thresh`` here and below.


> slot in the txRing is needed.
> +
> +An application can request the driver to release used mbufs with the
> ``rte_eth_tx_done_cleanup()`` API.
> +This API requests the driver to release mbufs that are no longer in
> +use, independent of whether or not the tx_rs_thresh has been crossed.
> +There are two scenarios when an application may want the mbuf back
> immediately:

s/back/released/ maybe?

> +
> +* When a given packet needs to be sent to multiple destination interfaces
> (either for Layer 2 flooding or Layer 3 multi-cast).
> +  One option is to make a copy of the packet or a copy of the header
> portion that needs to manipulated.

s/to/to be/


John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
                       ` (2 preceding siblings ...)
  2017-01-27 18:38     ` [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost " Billy McFall
@ 2017-03-07 21:59     ` Thomas Monjalon
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
  4 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-07 21:59 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev

2017-01-27 13:37, Billy McFall:
> Add a new API to free consumed buffers on TX ring. This addresses two
> scenarios:
> 1) Flooding a packet and want to reuse existing mbuf to avoid a packet
> copy. Increment the reference count of the packet and poll new API until
> reference count is decremented.
> 2) Application runs out of mbufs, or resets and is preparing for
> additional run, call API to free consumed packets so processing can
> continue.

Please add an entry in the release notes.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-07 16:03           ` Thomas Monjalon
@ 2017-03-09 15:49             ` Billy McFall
  2017-03-09 17:11               ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-09 15:49 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: wenzhuo.lu, olivier.matz, dev, adrien.mazarguil

On Tue, Mar 7, 2017 at 11:03 AM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:

> 2017-03-07 09:29, Billy McFall:
> > On Mon, Feb 27, 2017 at 8:48 AM, Thomas Monjalon <
> thomas.monjalon@6wind.com>
> > wrote:
> > > I think you could use rte_errno (while keeping negative return codes).
> > >
> >
> > I can do that if you want, but if I understand your comment, it will make
> > the implementation of the function not as clean. I cannot use the
> existing
> > RTE_ETH_VALID_PORTID_OR_ERR_RET(..) and RTE_FUNC_PTR_OR_ERR_RET(..)
> MACROs
> > because they are handling the return on error. Or am I missing something?
>
> Yes. Maybe we need new macros for basic error management with rte_errno.
>
> Looking at the code. Do you think we need new MACROs or just set rte_errno
in the existing? What would be the down side to setting rte_errno for all
the existing calls?

Looks like RTE_ETH_VALID_PORTID_OR_ERR_RET(..) is being called ~135 times.
Most calls are with retval set to either -ENODEV or -EINVAL. A few
instances of 0 and -1, but not many.

Looks like RTE_FUNC_PTR_OR_ERR_RET(..) is being called ~100 times. Most
calls are with retval set to -ENOTSUP. A few instances of 0, but not many.

I was thinking:
/* Macros to check for valid port */
#define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do { \
if (!rte_eth_dev_is_valid_port(port_id)) { \
RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
+ if (retval < 0) \
+ rte_errno = -retval; \
return retval; \
} \
} while (0)

Thanks,
Billy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-09 15:49             ` Billy McFall
@ 2017-03-09 17:11               ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-09 17:11 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev, adrien.mazarguil

2017-03-09 10:49, Billy McFall:
> On Tue, Mar 7, 2017 at 11:03 AM, Thomas Monjalon <thomas.monjalon@6wind.com>
> wrote:
> > 2017-03-07 09:29, Billy McFall:
> > > On Mon, Feb 27, 2017 at 8:48 AM, Thomas Monjalon <
> > thomas.monjalon@6wind.com>
> > > wrote:
> > > > I think you could use rte_errno (while keeping negative return codes).
> > > >
> > >
> > > I can do that if you want, but if I understand your comment, it will make
> > > the implementation of the function not as clean. I cannot use the
> > existing
> > > RTE_ETH_VALID_PORTID_OR_ERR_RET(..) and RTE_FUNC_PTR_OR_ERR_RET(..)
> > MACROs
> > > because they are handling the return on error. Or am I missing something?
> >
> > Yes. Maybe we need new macros for basic error management with rte_errno.
> >
> > Looking at the code. Do you think we need new MACROs or just set rte_errno
> in the existing? What would be the down side to setting rte_errno for all
> the existing calls?
> 
> Looks like RTE_ETH_VALID_PORTID_OR_ERR_RET(..) is being called ~135 times.
> Most calls are with retval set to either -ENODEV or -EINVAL. A few
> instances of 0 and -1, but not many.
> 
> Looks like RTE_FUNC_PTR_OR_ERR_RET(..) is being called ~100 times. Most
> calls are with retval set to -ENOTSUP. A few instances of 0, but not many.
> 
> I was thinking:
> /* Macros to check for valid port */
> #define RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, retval) do { \
> if (!rte_eth_dev_is_valid_port(port_id)) { \
> RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); \
> + if (retval < 0) \
> + rte_errno = -retval; \
> return retval; \
> } \
> } while (0)

Yes it seems acceptable.
This rework may be done separately.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v6 0/3] new API to free consumed buffers in Tx ring
  2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
                       ` (3 preceding siblings ...)
  2017-03-07 21:59     ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
@ 2017-03-09 20:51     ` Billy McFall
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
                         ` (3 more replies)
  4 siblings, 4 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-09 20:51 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

See request from 11/21/2016:
  http://dpdk.org/ml/archives/dev/2016-November/050585.html

Add a new API to free consumed buffers on TX ring. This addresses two
scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

---
v4:
* Added new API to documentation of per nic supported features.

---
v5:
* Added documentation to the Programmer's Guide.

---
v6:
* Rebase to master.
* Added Release Note.
* Addressed comments on documentation to the Programmer's Guide.
* Renamed feature string to "Free Tx mbuf on demand" and modified
  feature string processing code to extend the maximum string length
  from 20 characters to 25 characters.
* Reworded the commit log for PATCH 2/3 to attempt to clearify that the
  API is independent of tx_rs_thresh.  
* Will address the rte_errno comment in a separate merge.


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 doc/guides/conf.py                     |   7 +-
 doc/guides/nics/features/default.ini   |   4 +-
 doc/guides/nics/features/e1000.ini     |   1 +
 doc/guides/nics/features/vhost.ini     |   1 +
 doc/guides/prog_guide/mempool_lib.rst  |  26 +++++++
 doc/guides/rel_notes/release_17_05.rst |   7 +-
 drivers/net/e1000/e1000_ethdev.h       |   2 +
 drivers/net/e1000/igb_ethdev.c         |   1 +
 drivers/net/e1000/igb_rxtx.c           | 126 +++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c      |  11 +++
 lib/librte_ether/rte_ethdev.c          |  14 ++++
 lib/librte_ether/rte_ethdev.h          |  31 ++++++++
 12 files changed, 227 insertions(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v6 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
@ 2017-03-09 20:51       ` Billy McFall
  2017-03-15 10:29         ` Olivier Matz
  2017-03-15 10:30         ` Thomas Monjalon
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-09 20:51 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Please double check my python coding in conf.py and make sure I
implemented 'feature_str_len' properly.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/conf.py                     |  7 +++++--
 doc/guides/nics/features/default.ini   |  4 +++-
 doc/guides/prog_guide/mempool_lib.rst  | 26 ++++++++++++++++++++++++++
 doc/guides/rel_notes/release_17_05.rst |  7 ++++++-
 lib/librte_ether/rte_ethdev.c          | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 31 +++++++++++++++++++++++++++++++
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 34c62de..4cac26d 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -64,6 +64,9 @@
 
 master_doc = 'index'
 
+# Maximum feature description string length
+feature_str_len = 25
+
 # Figures, tables and code-blocks automatically numbered if they have caption
 numfig = True
 
@@ -300,7 +303,7 @@ def print_table_body(outfile, num_cols, ini_files, ini_data, default_features):
 def print_table_row(outfile, feature, line):
     """ Print a single row of the table with fixed formatting. """
     line = line.rstrip()
-    print('   {:<20}{}'.format(feature, line), file=outfile)
+    print('   {:<{}}{}'.format(feature, feature_str_len, line), file=outfile)
 
 
 def print_table_divider(outfile, num_cols):
@@ -309,7 +312,7 @@ def print_table_divider(outfile, num_cols):
     column_dividers = ['='] * num_cols
     line += ' '.join(column_dividers)
 
-    feature = '=' * 20
+    feature = '=' * feature_str_len
 
     print_table_row(outfile, feature, line)
 
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 9e363ff..12a0782 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -3,7 +3,8 @@
 ;
 ; This file defines the features that are valid for inclusion in
 ; the other driver files and also the order that they appear in
-; the features table in the documentation.
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
 ;
 [Features]
 Speed capabilities   =
@@ -11,6 +12,7 @@ Link status          =
 Link status event    =
 Queue status event   =
 Rx interrupt         =
+Free Tx mbuf on demand =
 Queue start/stop     =
 MTU update           =
 Jumbo frame          =
diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
index ffdc109..a117881 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -132,6 +132,32 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
 The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
 In contrast to the default caches, user-owned caches can be used by non-EAL threads too.
 
+In addition to a core's local cache, many of the drivers don't release the mbuf back to the mempool, or local cache,
+immediately after the packet has been transmitted.
+Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
+crossed or free the mbuf when a slot in the Tx ring is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
+``tx_rs_thresh`` has been crossed.
+There are two scenarios when an application may want the mbuf released immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
+  multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
+  count on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+
+* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
+  The application may want to reset to a clean state.
+  In this case, it may want to call the ``rte_eth_tx_done_cleanup()`` API to request each destination interface it has
+  been using to release all of its used mbufs.
+
+To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature in the *Network Interface
+Controller Drivers* document.
+
+
 Mempool Handlers
 ------------------------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..7b9c92c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,11 +41,16 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
   sPAPR IOMMU based pci probing enabled for vfio-pci devices.
 
+* **Added free Tx mbuf on demand API.**
+
+  Added a new function ``rte_eth_tx_done_cleanup()`` which allows an application
+  to request the driver to release mbufs from their Tx ring that are no longer
+  in use, independent of whether or not the ``tx_rs_thresh`` has been crossed.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index eb0a94a..203ec21 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1259,6 +1259,20 @@ rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
 }
 
 int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+int
 rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 {
 	int ret = 0;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4be217c..6455fd5 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1488,6 +1491,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3097,6 +3101,33 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
@ 2017-03-09 20:51       ` Billy McFall
  2017-03-13  3:17         ` Lu, Wenzhuo
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost " Billy McFall
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
  3 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-09 20:51 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. This API is independent of the tx_rs_thresh
setting. With this API, buffers should be free even if tx_rs_thresh is
not met.

e1000 igb driver does not implement a tx_rs_thresh to free mbufs, it
frees a slot in the ring as needed. However, it could be implemented at
some future date.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/e1000.ini |   1 +
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c     |   1 +
 drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/doc/guides/nics/features/e1000.ini b/doc/guides/nics/features/e1000.ini
index 7f6d55c..3aed7d7 100644
--- a/doc/guides/nics/features/e1000.ini
+++ b/doc/guides/nics/features/e1000.ini
@@ -7,6 +7,7 @@
 Link status          = Y
 Link status event    = Y
 Rx interrupt         = Y
+Free Tx mbuf on demand = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index a112b38..71d05a9 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 1bb4d85..cba3704 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1283,6 +1283,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost support to free consumed buffers
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-03-09 20:51       ` Billy McFall
  2017-03-15 10:27         ` Thomas Monjalon
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
  3 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-09 20:51 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/nics/features/vhost.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/features/vhost.ini b/doc/guides/nics/features/vhost.ini
index 23166fb..dffd1f4 100644
--- a/doc/guides/nics/features/vhost.ini
+++ b/doc/guides/nics/features/vhost.ini
@@ -6,6 +6,7 @@
 [Features]
 Link status          = Y
 Link status event    = Y
+Free Tx mbuf on demand = Y
 Queue status event   = Y
 Basic stats          = Y
 Extended stats       = Y
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index e98cffd..abe91c7 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -960,6 +960,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -1001,6 +1011,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-03-13  3:17         ` Lu, Wenzhuo
  0 siblings, 0 replies; 54+ messages in thread
From: Lu, Wenzhuo @ 2017-03-13  3:17 UTC (permalink / raw)
  To: Billy McFall, thomas.monjalon, olivier.matz; +Cc: dev

Hi,

> -----Original Message-----
> From: Billy McFall [mailto:bmcfall@redhat.com]
> Sent: Friday, March 10, 2017 4:51 AM
> To: thomas.monjalon@6wind.com; Lu, Wenzhuo; olivier.matz@6wind.com
> Cc: dev@dpdk.org; Billy McFall
> Subject: [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers
> 
> Add support to the e1000 igb driver for the new API to force free consumed
> buffers on Tx ring. This API is independent of the tx_rs_thresh setting. With this
> API, buffers should be free even if tx_rs_thresh is not met.
> 
> e1000 igb driver does not implement a tx_rs_thresh to free mbufs, it frees a slot
> in the ring as needed. However, it could be implemented at some future date.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost support to free consumed buffers
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost " Billy McFall
@ 2017-03-15 10:27         ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-15 10:27 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev

2017-03-09 15:51, Billy McFall:
> Add support to the vHostdriver for the new API to force free consumed
> buffers on Tx ring. vHost does not cache the mbufs so there is no work
> to do.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>

Your forgot a acked-by from Maxime here.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
@ 2017-03-15 10:29         ` Olivier Matz
  2017-03-15 15:01           ` Billy McFall
  2017-03-15 10:30         ` Thomas Monjalon
  1 sibling, 1 reply; 54+ messages in thread
From: Olivier Matz @ 2017-03-15 10:29 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

Hi Billy,

On Thu,  9 Mar 2017 15:51:17 -0500, Billy McFall <bmcfall@redhat.com> wrote:
> Add a new API to force free consumed buffers on Tx ring. API will return
> the number of packets freed (0-n) or error code if feature not supported
> (-ENOTSUP) or input invalid (-ENODEV).
> 
> Please double check my python coding in conf.py and make sure I
> implemented 'feature_str_len' properly.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>

[...]

> diff --git a/doc/guides/prog_guide/mempool_lib.rst b/doc/guides/prog_guide/mempool_lib.rst
> index ffdc109..a117881 100644
> --- a/doc/guides/prog_guide/mempool_lib.rst
> +++ b/doc/guides/prog_guide/mempool_lib.rst
> @@ -132,6 +132,32 @@ These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`
>  The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
>  In contrast to the default caches, user-owned caches can be used by non-EAL threads too.
>  
> +In addition to a core's local cache, many of the drivers don't release the mbuf back to the mempool, or local cache,
> +immediately after the packet has been transmitted.
> +Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
> +crossed or free the mbuf when a slot in the Tx ring is needed.
> +
> +An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
> +This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
> +``tx_rs_thresh`` has been crossed.
> +There are two scenarios when an application may want the mbuf released immediately:
> +
> +* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
> +  multi-cast).
> +  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
> +  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
> +  count on the packet is decremented.
> +  Then the same packet can be transmitted to the next destination interface.
> +
> +* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
> +  The application may want to reset to a clean state.
> +  In this case, it may want to call the ``rte_eth_tx_done_cleanup()`` API to request each destination interface it has
> +  been using to release all of its used mbufs.
> +
> +To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature in the *Network Interface
> +Controller Drivers* document.

I'm not sure the mempool documentation is the proper place to document
this API. I'll suggest to move it in ethdev documentation (poll_mode_drv.rst).

You can keep a small paragraph in mempool lib, but keep in mind that
mempool is not necessarily a pool of mbuf (it can be any kind of object).


Thanks,
Olivier

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
  2017-03-15 10:29         ` Olivier Matz
@ 2017-03-15 10:30         ` Thomas Monjalon
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-15 10:30 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev

2017-03-09 15:51, Billy McFall:
> @@ -3097,6 +3101,33 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>  }
>  
>  /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use. It is the
> + * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
> + * called if needed.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);
> +
> +/**
>   * Configure a callback for buffered packets which cannot be sent
>   *
>   * Register a specific callback to be called when an attempt is made to send
> 

Please move this function below the tx_buffer functions (it is currently
between rte_eth_tx_buffer and rte_eth_tx_buffer_set_err_callback).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-15 10:29         ` Olivier Matz
@ 2017-03-15 15:01           ` Billy McFall
  0 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-15 15:01 UTC (permalink / raw)
  To: Olivier Matz; +Cc: thomas.monjalon, wenzhuo.lu, dev

On Wed, Mar 15, 2017 at 6:29 AM, Olivier Matz <olivier.matz@6wind.com>
wrote:

> Hi Billy,
>
> On Thu,  9 Mar 2017 15:51:17 -0500, Billy McFall <bmcfall@redhat.com>
> wrote:
> > Add a new API to force free consumed buffers on Tx ring. API will return
> > the number of packets freed (0-n) or error code if feature not supported
> > (-ENOTSUP) or input invalid (-ENODEV).
> >
> > Please double check my python coding in conf.py and make sure I
> > implemented 'feature_str_len' properly.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
>
> [...]
>
> > diff --git a/doc/guides/prog_guide/mempool_lib.rst
> b/doc/guides/prog_guide/mempool_lib.rst
> > index ffdc109..a117881 100644
> > --- a/doc/guides/prog_guide/mempool_lib.rst
> > +++ b/doc/guides/prog_guide/mempool_lib.rst
> > @@ -132,6 +132,32 @@ These user-owned caches can be explicitly passed to
> ``rte_mempool_generic_put()`
> >  The ``rte_mempool_default_cache()`` call returns the default internal
> cache if any.
> >  In contrast to the default caches, user-owned caches can be used by
> non-EAL threads too.
> >
> > +In addition to a core's local cache, many of the drivers don't release
> the mbuf back to the mempool, or local cache,
> > +immediately after the packet has been transmitted.
> > +Instead, they leave the mbuf in their Tx ring and either perform a bulk
> release when the ``tx_rs_thresh`` has been
> > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > +
> > +An application can request the driver to release used mbufs with the
> ``rte_eth_tx_done_cleanup()`` API.
> > +This API requests the driver to release mbufs that are no longer in
> use, independent of whether or not the
> > +``tx_rs_thresh`` has been crossed.
> > +There are two scenarios when an application may want the mbuf released
> immediately:
> > +
> > +* When a given packet needs to be sent to multiple destination
> interfaces (either for Layer 2 flooding or Layer 3
> > +  multi-cast).
> > +  One option is to make a copy of the packet or a copy of the header
> portion that needs to be manipulated.
> > +  A second option is to transmit the packet and then poll the
> ``rte_eth_tx_done_cleanup()`` API until the reference
> > +  count on the packet is decremented.
> > +  Then the same packet can be transmitted to the next destination
> interface.
> > +
> > +* If an application is designed to make multiple runs, like a packet
> generator, and one run has completed.
> > +  The application may want to reset to a clean state.
> > +  In this case, it may want to call the ``rte_eth_tx_done_cleanup()``
> API to request each destination interface it has
> > +  been using to release all of its used mbufs.
> > +
> > +To determine if a driver supports this API, check for the *Free Tx mbuf
> on demand* feature in the *Network Interface
> > +Controller Drivers* document.
>
> I'm not sure the mempool documentation is the proper place to document
> this API. I'll suggest to move it in ethdev documentation
> (poll_mode_drv.rst).
>
> You can keep a small paragraph in mempool lib, but keep in mind that
> mempool is not necessarily a pool of mbuf (it can be any kind of object).
>
> Not sure this API needs to be documented in multiple places. I'll move the
content to the poll_mode_drv.rst file.

>
> Thanks,
> Olivier
>
> Billy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring
  2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
                         ` (2 preceding siblings ...)
  2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost " Billy McFall
@ 2017-03-15 18:02       ` Billy McFall
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 1/3] ethdev: " Billy McFall
                           ` (4 more replies)
  3 siblings, 5 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-15 18:02 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

See request from 11/21/2016:
  http://dpdk.org/ml/archives/dev/2016-November/050585.html

Add a new API to free consumed buffers on TX ring. This addresses two
scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

---
v4:
* Added new API to documentation of per nic supported features.

---
v5:
* Added documentation to the Programmer's Guide.

---
v6:
* Rebase to master.
* Added Release Note.
* Addressed comments on documentation to the Programmer's Guide.
* Renamed feature string to "Free Tx mbuf on demand" and modified
  feature string processing code to extend the maximum string length
  from 20 characters to 25 characters.
* Reworded the commit log for PATCH 2/3 to attempt to clearify that the
  API is independent of tx_rs_thresh.  
* Will address the rte_errno comment in a separate merge.

---
v7:
* New API was located between rte_eth_tx_buffer and
  rte_eth_tx_buffer_set_err_callback in the file. Moved this function
  below the tx_buffer functions.
* Added previously acked-by tags to merge comments.
* Moved documentation content from the mempool documentation to the
  ethdev documentation.


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 doc/guides/conf.py                      |   7 +-
 doc/guides/nics/features/default.ini    |   4 +-
 doc/guides/nics/features/e1000.ini      |   1 +
 doc/guides/nics/features/vhost.ini      |   1 +
 doc/guides/prog_guide/poll_mode_drv.rst |  28 +++++++
 doc/guides/rel_notes/release_17_05.rst  |   7 +-
 drivers/net/e1000/e1000_ethdev.h        |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/e1000/igb_rxtx.c            | 126 ++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c       |  11 +++
 lib/librte_ether/rte_ethdev.c           |  14 ++++
 lib/librte_ether/rte_ethdev.h           |  31 ++++++++
 12 files changed, 229 insertions(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
@ 2017-03-15 18:02         ` Billy McFall
  2017-03-23 10:37           ` Olivier MATZ
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-15 18:02 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 doc/guides/conf.py                      |  7 +++++--
 doc/guides/nics/features/default.ini    |  4 +++-
 doc/guides/prog_guide/poll_mode_drv.rst | 28 ++++++++++++++++++++++++++++
 doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
 lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h           | 31 +++++++++++++++++++++++++++++++
 6 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 34c62de..4cac26d 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -64,6 +64,9 @@
 
 master_doc = 'index'
 
+# Maximum feature description string length
+feature_str_len = 25
+
 # Figures, tables and code-blocks automatically numbered if they have caption
 numfig = True
 
@@ -300,7 +303,7 @@ def print_table_body(outfile, num_cols, ini_files, ini_data, default_features):
 def print_table_row(outfile, feature, line):
     """ Print a single row of the table with fixed formatting. """
     line = line.rstrip()
-    print('   {:<20}{}'.format(feature, line), file=outfile)
+    print('   {:<{}}{}'.format(feature, feature_str_len, line), file=outfile)
 
 
 def print_table_divider(outfile, num_cols):
@@ -309,7 +312,7 @@ def print_table_divider(outfile, num_cols):
     column_dividers = ['='] * num_cols
     line += ' '.join(column_dividers)
 
-    feature = '=' * 20
+    feature = '=' * feature_str_len
 
     print_table_row(outfile, feature, line)
 
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 299078f..0135c0c 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -3,7 +3,8 @@
 ;
 ; This file defines the features that are valid for inclusion in
 ; the other driver files and also the order that they appear in
-; the features table in the documentation.
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
 ;
 [Features]
 Speed capabilities   =
@@ -11,6 +12,7 @@ Link status          =
 Link status event    =
 Queue status event   =
 Rx interrupt         =
+Free Tx mbuf on demand =
 Queue start/stop     =
 MTU update           =
 Jumbo frame          =
diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index d4c92ea..21f7a9d 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
 
     When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
 
+Free Tx mbuf on Demand
+~~~~~~~~~~~~~~~~~~~~~~
+
+Many of the drivers don't release the mbuf back to the mempool, or local cache, immediately after the packet has been
+transmitted.
+Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
+crossed or free the mbuf when a slot in the Tx ring is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
+``tx_rs_thresh`` has been crossed.
+There are two scenarios when an application may want the mbuf released immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
+  multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
+  count on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+
+* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
+  The application may want to reset to a clean state.
+  In this case, it may want to call the ``rte_eth_tx_done_cleanup()`` API to request each destination interface it has
+  been using to release all of its used mbufs.
+
+To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature in the *Network Interface
+Controller Drivers* document.
+
 Hardware Offload
 ~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..7b9c92c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,11 +41,16 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
   sPAPR IOMMU based pci probing enabled for vfio-pci devices.
 
+* **Added free Tx mbuf on demand API.**
+
+  Added a new function ``rte_eth_tx_done_cleanup()`` which allows an application
+  to request the driver to release mbufs from their Tx ring that are no longer
+  in use, independent of whether or not the ``tx_rs_thresh`` has been crossed.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index eb0a94a..b796e7d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1275,6 +1275,20 @@ rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 	return ret;
 }
 
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
 void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4be217c..b3ee872 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1488,6 +1491,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3178,6 +3182,33 @@ rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata);
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);
+
+/**
  * The eth device event type for interrupt, and maybe others in the future.
  */
 enum rte_eth_event_type {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v7 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 1/3] ethdev: " Billy McFall
@ 2017-03-15 18:02         ` Billy McFall
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 3/3] net/vhost: vHost " Billy McFall
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-15 18:02 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. This API is independent of the tx_rs_thresh
setting. With this API, buffers should be free even if tx_rs_thresh is
not met.

e1000 igb driver does not implement a tx_rs_thresh to free mbufs, it
frees a slot in the ring as needed. However, it could be implemented at
some future date.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/nics/features/e1000.ini |   1 +
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c     |   1 +
 drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/doc/guides/nics/features/e1000.ini b/doc/guides/nics/features/e1000.ini
index 7f6d55c..3aed7d7 100644
--- a/doc/guides/nics/features/e1000.ini
+++ b/doc/guides/nics/features/e1000.ini
@@ -7,6 +7,7 @@
 Link status          = Y
 Link status event    = Y
 Rx interrupt         = Y
+Free Tx mbuf on demand = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index a112b38..71d05a9 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 1bb4d85..cba3704 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1283,6 +1283,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v7 3/3] net/vhost: vHost support to free consumed buffers
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 1/3] ethdev: " Billy McFall
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-03-15 18:02         ` Billy McFall
  2017-03-15 20:25         ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Wiles, Keith
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-15 18:02 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/nics/features/vhost.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/features/vhost.ini b/doc/guides/nics/features/vhost.ini
index 23166fb..dffd1f4 100644
--- a/doc/guides/nics/features/vhost.ini
+++ b/doc/guides/nics/features/vhost.ini
@@ -6,6 +6,7 @@
 [Features]
 Link status          = Y
 Link status event    = Y
+Free Tx mbuf on demand = Y
 Queue status event   = Y
 Basic stats          = Y
 Extended stats       = Y
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index e98cffd..abe91c7 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -960,6 +960,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -1001,6 +1011,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
                           ` (2 preceding siblings ...)
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 3/3] net/vhost: vHost " Billy McFall
@ 2017-03-15 20:25         ` Wiles, Keith
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
  4 siblings, 0 replies; 54+ messages in thread
From: Wiles, Keith @ 2017-03-15 20:25 UTC (permalink / raw)
  To: Billy McFall; +Cc: Thomas Monjalon, Lu, Wenzhuo, olivier.matz, dev


> On Mar 16, 2017, at 2:02 AM, Billy McFall <bmcfall@redhat.com> wrote:
> 
> See request from 11/21/2016:
>  http://dpdk.org/ml/archives/dev/2016-November/050585.html
> 
> Add a new API to free consumed buffers on TX ring. This addresses two
> scenarios:
> 1) Flooding a packet and want to reuse existing mbuf to avoid a packet
> copy. Increment the reference count of the packet and poll new API until
> reference count is decremented.
> 2) Application runs out of mbufs, or resets and is preparing for
> additional run, call API to free consumed packets so processing can
> continue.
> 
> API will return the number of packets freed (0-n) or error code if
> feature not supported (-ENOTSUP) or input invalid (-ENODEV).
> 
> API for e1000 igb driver and vHost driver have been implemented. Other
> drivers can be implemented over time. Some drivers implement a Tx done
> flush routine that should be reused where possible. e1000 igb driver
> and vHost driver do not have such functions.
> 
> ---
> v2:
> * Removed rte_eth_tx_buffer_flush() call and associated parameters
>  from new rte_eth_tx_done_cleanup() API.
> 
> * Remaining open issue is whether this should be a new API or overload 
>  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
>  0. My concern is that overloading existing API will not provided
>  enough feedback to application. Application needs to know if feature
>  is supported and driver is attempting to free mbufs or not.
> 
> * If new API is supported, second open issue is if parameter free_cnt
>  should be included. It was in the original feature request.
> 
> ---
> v3:
> * Removed extra white space in rte_ethdev.h.
> * Removed inline of new API function in rte_ethdev.h.
> 
> ---
> v4:
> * Added new API to documentation of per nic supported features.
> 
> ---
> v5:
> * Added documentation to the Programmer's Guide.
> 
> ---
> v6:
> * Rebase to master.
> * Added Release Note.
> * Addressed comments on documentation to the Programmer's Guide.
> * Renamed feature string to "Free Tx mbuf on demand" and modified
>  feature string processing code to extend the maximum string length
>  from 20 characters to 25 characters.
> * Reworded the commit log for PATCH 2/3 to attempt to clearify that the
>  API is independent of tx_rs_thresh.  
> * Will address the rte_errno comment in a separate merge.
> 
> ---
> v7:
> * New API was located between rte_eth_tx_buffer and
>  rte_eth_tx_buffer_set_err_callback in the file. Moved this function
>  below the tx_buffer functions.
> * Added previously acked-by tags to merge comments.
> * Moved documentation content from the mempool documentation to the
>  ethdev documentation.
> 
> 
> Billy McFall (3):
>  ethdev: new API to free consumed buffers in Tx ring
>  net/e1000: e1000 igb support to free consumed buffers
>  net/vhost: vHost support to free consumed buffers
> 
> doc/guides/conf.py                      |   7 +-
> doc/guides/nics/features/default.ini    |   4 +-
> doc/guides/nics/features/e1000.ini      |   1 +
> doc/guides/nics/features/vhost.ini      |   1 +
> doc/guides/prog_guide/poll_mode_drv.rst |  28 +++++++
> doc/guides/rel_notes/release_17_05.rst  |   7 +-
> drivers/net/e1000/e1000_ethdev.h        |   2 +
> drivers/net/e1000/igb_ethdev.c          |   1 +
> drivers/net/e1000/igb_rxtx.c            | 126 ++++++++++++++++++++++++++++++++
> drivers/net/vhost/rte_eth_vhost.c       |  11 +++
> lib/librte_ether/rte_ethdev.c           |  14 ++++
> lib/librte_ether/rte_ethdev.h           |  31 ++++++++
> 12 files changed, 229 insertions(+), 4 deletions(-)
> 
> -- 
> 2.9.3
> 

Acked-by: Keith.Wiles@intel.com for the series.

Regards,
Keith

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 1/3] ethdev: " Billy McFall
@ 2017-03-23 10:37           ` Olivier MATZ
  2017-03-23 13:32             ` Billy McFall
  0 siblings, 1 reply; 54+ messages in thread
From: Olivier MATZ @ 2017-03-23 10:37 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

Hi Billy,

On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> Add a new API to force free consumed buffers on Tx ring. API will return
> the number of packets freed (0-n) or error code if feature not supported
> (-ENOTSUP) or input invalid (-ENODEV).
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  doc/guides/conf.py                      |  7 +++++--
>  doc/guides/nics/features/default.ini    |  4 +++-
>  doc/guides/prog_guide/poll_mode_drv.rst | 28 ++++++++++++++++++++++++++++
>  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
>  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
>  lib/librte_ether/rte_ethdev.h           | 31 +++++++++++++++++++++++++++++++
>  6 files changed, 87 insertions(+), 4 deletions(-)
> 

[...]

> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
>  
>      When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
>  
> +Free Tx mbuf on Demand
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +Many of the drivers don't release the mbuf back to the mempool, or local cache, immediately after the packet has been
> +transmitted.
> +Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
> +crossed or free the mbuf when a slot in the Tx ring is needed.
> +
> +An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
> +This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
> +``tx_rs_thresh`` has been crossed.
> +There are two scenarios when an application may want the mbuf released immediately:
> +
> +* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
> +  multi-cast).
> +  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
> +  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
> +  count on the packet is decremented.
> +  Then the same packet can be transmitted to the next destination interface.

By reading this paragraph, it's not so clear to me that the packet
that will be transmitted on all interfaces will be different from
one port to another.

Maybe it could be reworded to insist on that?


> +
> +* If an application is designed to make multiple runs, like a packet generator, and one run has completed.
> +  The application may want to reset to a clean state.

I'd reword into:

Some applications are designed to make multiple runs, like a packet generator.
Between each run, the application may want to reset to a clean state.

What do you mean by "clean state"? All mbufs returned into the mempools?
Why would a packet generator need that? For performance?

Also, do we want to ensure that all packets are actually transmitted?
Can we do that with this API or should we use another API like
rte_eth_tx_descriptor_status() [1] ?

[1] http://dpdk.org/dev/patchwork/patch/21549/


Thanks,
Olivier

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-23 10:37           ` Olivier MATZ
@ 2017-03-23 13:32             ` Billy McFall
  2017-03-24 12:46               ` Olivier Matz
  0 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-23 13:32 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: thomas.monjalon, wenzhuo.lu, dev

Thank you for your comments. See inline.

On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
wrote:

> Hi Billy,
>
> On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> wrote:
> > Add a new API to force free consumed buffers on Tx ring. API will return
> > the number of packets freed (0-n) or error code if feature not supported
> > (-ENOTSUP) or input invalid (-ENODEV).
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  doc/guides/conf.py                      |  7 +++++--
> >  doc/guides/nics/features/default.ini    |  4 +++-
> >  doc/guides/prog_guide/poll_mode_drv.rst | 28
> ++++++++++++++++++++++++++++
> >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> >  lib/librte_ether/rte_ethdev.h           | 31
> +++++++++++++++++++++++++++++++
> >  6 files changed, 87 insertions(+), 4 deletions(-)
> >
>
> [...]
>
> > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel
> to avoid a hardware race con
> >
> >      When configuring for DCB operation, at port initialization, both
> the number of transmit queues and the number of receive queues must be set
> to 128.
> >
> > +Free Tx mbuf on Demand
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Many of the drivers don't release the mbuf back to the mempool, or
> local cache, immediately after the packet has been
> > +transmitted.
> > +Instead, they leave the mbuf in their Tx ring and either perform a bulk
> release when the ``tx_rs_thresh`` has been
> > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > +
> > +An application can request the driver to release used mbufs with the
> ``rte_eth_tx_done_cleanup()`` API.
> > +This API requests the driver to release mbufs that are no longer in
> use, independent of whether or not the
> > +``tx_rs_thresh`` has been crossed.
> > +There are two scenarios when an application may want the mbuf released
> immediately:
> > +
> > +* When a given packet needs to be sent to multiple destination
> interfaces (either for Layer 2 flooding or Layer 3
> > +  multi-cast).
> > +  One option is to make a copy of the packet or a copy of the header
> portion that needs to be manipulated.
> > +  A second option is to transmit the packet and then poll the
> ``rte_eth_tx_done_cleanup()`` API until the reference
> > +  count on the packet is decremented.
> > +  Then the same packet can be transmitted to the next destination
> interface.
>
> By reading this paragraph, it's not so clear to me that the packet
> that will be transmitted on all interfaces will be different from
> one port to another.
>
> Maybe it could be reworded to insist on that?
>
>
What if I add the following sentence:

  Then the same packet can be transmitted to the next destination interface.
+ The application is still responsible for managing any packet
manipulations needed between the different destination
+ interfaces, but a packet copy can be avoided.


>
> > +
> > +* If an application is designed to make multiple runs, like a packet
> generator, and one run has completed.
> > +  The application may want to reset to a clean state.
>
> I'd reword into:
>
> Some applications are designed to make multiple runs, like a packet
> generator.
> Between each run, the application may want to reset to a clean state.
>
> What do you mean by "clean state"? All mbufs returned into the mempools?
> Why would a packet generator need that? For performance?
>
> Reworded as you suggested, then attempted to explain a 'clean state'.
Also reworded the last sentence a little.

+ * Some applications are designed to make multiple runs, like a packet
generator.
+   For performance reasons and consistency between runs, the application
may want to reset back to an initial state
+   between each run, where all mbufs are returned to the mempool.
+   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
each destination interface it has been using
+   to request it to release of all its used mbufs.


> Also, do we want to ensure that all packets are actually transmitted?
>

Added an additional sentence to indicate that this API doesn't manage
whether or not the packet has been transmitted.

  Then the same packet can be transmitted to the next destination interface.
  The application is still responsible for managing any packet
manipulations needed between the different destination
  interface, but a packet copy can be avoided.
+  This API is independent of whether the packet was transmitted or
dropped, only that the mbuf is no longer in use by
+  the interface.


> Can we do that with this API or should we use another API like
> rte_eth_tx_descriptor_status() [1] ?
>
> [1] http://dpdk.org/dev/patchwork/patch/21549/
>
> I read through this patch. This API doesn't indicate if the packet was
transmitted or dropped (I think that is what you were asking). This API
could be used by the application to determine if the mbuf has been
freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
in value. Did I miss your point?

>
> Thanks,
> Olivier
>

Thanks,
Billy McFall

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-23 13:32             ` Billy McFall
@ 2017-03-24 12:46               ` Olivier Matz
  2017-03-24 13:18                 ` Billy McFall
  0 siblings, 1 reply; 54+ messages in thread
From: Olivier Matz @ 2017-03-24 12:46 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

Hi Billy,

On Thu, 23 Mar 2017 09:32:14 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> Thank you for your comments. See inline.
> 
> On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
> wrote:
> 
> > Hi Billy,
> >
> > On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> > wrote:  
> > > Add a new API to force free consumed buffers on Tx ring. API will return
> > > the number of packets freed (0-n) or error code if feature not supported
> > > (-ENOTSUP) or input invalid (-ENODEV).
> > >
> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > ---
> > >  doc/guides/conf.py                      |  7 +++++--
> > >  doc/guides/nics/features/default.ini    |  4 +++-
> > >  doc/guides/prog_guide/poll_mode_drv.rst | 28  
> > ++++++++++++++++++++++++++++  
> > >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> > >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> > >  lib/librte_ether/rte_ethdev.h           | 31  
> > +++++++++++++++++++++++++++++++  
> > >  6 files changed, 87 insertions(+), 4 deletions(-)
> > >  
> >
> > [...]
> >  
> > > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a sentinel  
> > to avoid a hardware race con  
> > >
> > >      When configuring for DCB operation, at port initialization, both  
> > the number of transmit queues and the number of receive queues must be set
> > to 128.  
> > >
> > > +Free Tx mbuf on Demand
> > > +~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Many of the drivers don't release the mbuf back to the mempool, or  
> > local cache, immediately after the packet has been  
> > > +transmitted.
> > > +Instead, they leave the mbuf in their Tx ring and either perform a bulk  
> > release when the ``tx_rs_thresh`` has been  
> > > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > > +
> > > +An application can request the driver to release used mbufs with the  
> > ``rte_eth_tx_done_cleanup()`` API.  
> > > +This API requests the driver to release mbufs that are no longer in  
> > use, independent of whether or not the  
> > > +``tx_rs_thresh`` has been crossed.
> > > +There are two scenarios when an application may want the mbuf released  
> > immediately:  
> > > +
> > > +* When a given packet needs to be sent to multiple destination  
> > interfaces (either for Layer 2 flooding or Layer 3  
> > > +  multi-cast).
> > > +  One option is to make a copy of the packet or a copy of the header  
> > portion that needs to be manipulated.  
> > > +  A second option is to transmit the packet and then poll the  
> > ``rte_eth_tx_done_cleanup()`` API until the reference  
> > > +  count on the packet is decremented.
> > > +  Then the same packet can be transmitted to the next destination  
> > interface.
> >
> > By reading this paragraph, it's not so clear to me that the packet
> > that will be transmitted on all interfaces will be different from
> > one port to another.
> >
> > Maybe it could be reworded to insist on that?
> >
> >  
> What if I add the following sentence:
> 
>   Then the same packet can be transmitted to the next destination interface.
> + The application is still responsible for managing any packet
> manipulations needed between the different destination
> + interfaces, but a packet copy can be avoided.

looks good, thanks.



> > > +
> > > +* If an application is designed to make multiple runs, like a packet  
> > generator, and one run has completed.  
> > > +  The application may want to reset to a clean state.  
> >
> > I'd reword into:
> >
> > Some applications are designed to make multiple runs, like a packet
> > generator.
> > Between each run, the application may want to reset to a clean state.
> >
> > What do you mean by "clean state"? All mbufs returned into the mempools?
> > Why would a packet generator need that? For performance?
> >
> > Reworded as you suggested, then attempted to explain a 'clean state'.  
> Also reworded the last sentence a little.
> 
> + * Some applications are designed to make multiple runs, like a packet
> generator.
> +   For performance reasons and consistency between runs, the application
> may want to reset back to an initial state
> +   between each run, where all mbufs are returned to the mempool.
> +   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
> each destination interface it has been using
> +   to request it to release of all its used mbufs.

ok, looks clearer to me, thanks


> > Also, do we want to ensure that all packets are actually transmitted?
> >  
> 
> Added an additional sentence to indicate that this API doesn't manage
> whether or not the packet has been transmitted.
> 
>   Then the same packet can be transmitted to the next destination interface.
>   The application is still responsible for managing any packet
> manipulations needed between the different destination
>   interface, but a packet copy can be avoided.
> +  This API is independent of whether the packet was transmitted or
> dropped, only that the mbuf is no longer in use by
> +  the interface.

ok


> > Can we do that with this API or should we use another API like
> > rte_eth_tx_descriptor_status() [1] ?
> >
> > [1] http://dpdk.org/dev/patchwork/patch/21549/
> >
> > I read through this patch. This API doesn't indicate if the packet was  
> transmitted or dropped (I think that is what you were asking). This API
> could be used by the application to determine if the mbuf has been
> freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> in value. Did I miss your point?

Maybe my question was not clear :)
Let me try to reword it.

For a traffic generator use-case, a dummy algorithm may be:

1/ send packets in a loop until a condition is met (ex: packet count reached)
2/ call rte_eth_tx_done_cleanup()
3/ read stats for report

I think there is something missing between 1/ and 2/, to ensure that
all packets that were in the tx queue are processed (either transmitted
or dropped). If that's not the case, both steps 2/ and 3/ will not
behave as expected:
- all mbufs won't be returned to the pool
- statistics may be wrong

Maybe a simple wait() could do the job.
Using a combination of rte_eth_tx_done_cleanup() + rte_eth_tx_descriptor_status()
is probably also a solution.

Do you confirm rte_eth_tx_done_cleanup() does not check that?

Thanks
Olivier

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-24 12:46               ` Olivier Matz
@ 2017-03-24 13:18                 ` Billy McFall
  2017-03-24 13:30                   ` Olivier Matz
  0 siblings, 1 reply; 54+ messages in thread
From: Billy McFall @ 2017-03-24 13:18 UTC (permalink / raw)
  To: Olivier Matz; +Cc: thomas.monjalon, wenzhuo.lu, dev

On Fri, Mar 24, 2017 at 8:46 AM, Olivier Matz <olivier.matz@6wind.com>
wrote:

> Hi Billy,
>
> On Thu, 23 Mar 2017 09:32:14 -0400, Billy McFall <bmcfall@redhat.com>
> wrote:
> > Thank you for your comments. See inline.
> >
> > On Thu, Mar 23, 2017 at 6:37 AM, Olivier MATZ <olivier.matz@6wind.com>
> > wrote:
> >
> > > Hi Billy,
> > >
> > > On Wed, 15 Mar 2017 14:02:24 -0400, Billy McFall <bmcfall@redhat.com>
> > > wrote:
> > > > Add a new API to force free consumed buffers on Tx ring. API will
> return
> > > > the number of packets freed (0-n) or error code if feature not
> supported
> > > > (-ENOTSUP) or input invalid (-ENODEV).
> > > >
> > > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > > ---
> > > >  doc/guides/conf.py                      |  7 +++++--
> > > >  doc/guides/nics/features/default.ini    |  4 +++-
> > > >  doc/guides/prog_guide/poll_mode_drv.rst | 28
> > > ++++++++++++++++++++++++++++
> > > >  doc/guides/rel_notes/release_17_05.rst  |  7 ++++++-
> > > >  lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
> > > >  lib/librte_ether/rte_ethdev.h           | 31
> > > +++++++++++++++++++++++++++++++
> > > >  6 files changed, 87 insertions(+), 4 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > > > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > > > @@ -249,6 +249,34 @@ One descriptor in the TX ring is used as a
> sentinel
> > > to avoid a hardware race con
> > > >
> > > >      When configuring for DCB operation, at port initialization, both
> > > the number of transmit queues and the number of receive queues must be
> set
> > > to 128.
> > > >
> > > > +Free Tx mbuf on Demand
> > > > +~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Many of the drivers don't release the mbuf back to the mempool, or
> > > local cache, immediately after the packet has been
> > > > +transmitted.
> > > > +Instead, they leave the mbuf in their Tx ring and either perform a
> bulk
> > > release when the ``tx_rs_thresh`` has been
> > > > +crossed or free the mbuf when a slot in the Tx ring is needed.
> > > > +
> > > > +An application can request the driver to release used mbufs with the
> > > ``rte_eth_tx_done_cleanup()`` API.
> > > > +This API requests the driver to release mbufs that are no longer in
> > > use, independent of whether or not the
> > > > +``tx_rs_thresh`` has been crossed.
> > > > +There are two scenarios when an application may want the mbuf
> released
> > > immediately:
> > > > +
> > > > +* When a given packet needs to be sent to multiple destination
> > > interfaces (either for Layer 2 flooding or Layer 3
> > > > +  multi-cast).
> > > > +  One option is to make a copy of the packet or a copy of the header
> > > portion that needs to be manipulated.
> > > > +  A second option is to transmit the packet and then poll the
> > > ``rte_eth_tx_done_cleanup()`` API until the reference
> > > > +  count on the packet is decremented.
> > > > +  Then the same packet can be transmitted to the next destination
> > > interface.
> > >
> > > By reading this paragraph, it's not so clear to me that the packet
> > > that will be transmitted on all interfaces will be different from
> > > one port to another.
> > >
> > > Maybe it could be reworded to insist on that?
> > >
> > >
> > What if I add the following sentence:
> >
> >   Then the same packet can be transmitted to the next destination
> interface.
> > + The application is still responsible for managing any packet
> > manipulations needed between the different destination
> > + interfaces, but a packet copy can be avoided.
>
> looks good, thanks.
>
>
>
> > > > +
> > > > +* If an application is designed to make multiple runs, like a packet
> > > generator, and one run has completed.
> > > > +  The application may want to reset to a clean state.
> > >
> > > I'd reword into:
> > >
> > > Some applications are designed to make multiple runs, like a packet
> > > generator.
> > > Between each run, the application may want to reset to a clean state.
> > >
> > > What do you mean by "clean state"? All mbufs returned into the
> mempools?
> > > Why would a packet generator need that? For performance?
> > >
> > > Reworded as you suggested, then attempted to explain a 'clean state'.
> > Also reworded the last sentence a little.
> >
> > + * Some applications are designed to make multiple runs, like a packet
> > generator.
> > +   For performance reasons and consistency between runs, the application
> > may want to reset back to an initial state
> > +   between each run, where all mbufs are returned to the mempool.
> > +   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for
> > each destination interface it has been using
> > +   to request it to release of all its used mbufs.
>
> ok, looks clearer to me, thanks
>
>
> > > Also, do we want to ensure that all packets are actually transmitted?
> > >
> >
> > Added an additional sentence to indicate that this API doesn't manage
> > whether or not the packet has been transmitted.
> >
> >   Then the same packet can be transmitted to the next destination
> interface.
> >   The application is still responsible for managing any packet
> > manipulations needed between the different destination
> >   interface, but a packet copy can be avoided.
> > +  This API is independent of whether the packet was transmitted or
> > dropped, only that the mbuf is no longer in use by
> > +  the interface.
>
> ok
>
>
> > > Can we do that with this API or should we use another API like
> > > rte_eth_tx_descriptor_status() [1] ?
> > >
> > > [1] http://dpdk.org/dev/patchwork/patch/21549/
> > >
> > > I read through this patch. This API doesn't indicate if the packet was
> > transmitted or dropped (I think that is what you were asking). This API
> > could be used by the application to determine if the mbuf has been
> > freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> > in value. Did I miss your point?
>
> Maybe my question was not clear :)
> Let me try to reword it.
>
> For a traffic generator use-case, a dummy algorithm may be:
>
> 1/ send packets in a loop until a condition is met (ex: packet count
> reached)
> 2/ call rte_eth_tx_done_cleanup()
> 3/ read stats for report
>
> I think there is something missing between 1/ and 2/, to ensure that
> all packets that were in the tx queue are processed (either transmitted
> or dropped). If that's not the case, both steps 2/ and 3/ will not
> behave as expected:
> - all mbufs won't be returned to the pool
> - statistics may be wrong
>
> Maybe a simple wait() could do the job.
> Using a combination of rte_eth_tx_done_cleanup() +
> rte_eth_tx_descriptor_status()
> is probably also a solution.
>
> Do you confirm rte_eth_tx_done_cleanup() does not check that?
>
> Confirm.  rte_eth_tx_done_cleanup() does not check that. In the flooding
case,
the applications is expected to poll rte_eth_tx_done_cleanup() until some
condition
is met, like ref_count of given packet is decremented. So on the packetGen
case, the
application would need to wait some time and/or call
rte_eth_tx_descriptor_status()
as you suggested.

My original patch returned RTE_DONE (no more packets pending),
RTE_PROCESSING (freed what I could but there are still packets in the queue)
or -ERRNO for error. Then packets freed count was returned via a pointer in
the param list.
That would have solved what you are asking, but that was shot down as being
overkill.

Should I add another sentence to the packet generator bullet indicating
that it is the
application's job to make sure no more packets are pending? Like:

  In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for each
destination interface it has been using
  to request it to release of all its used mbufs.
+ It is the application's responsibility to ensure all packets have been
processed by the destination interface.
+ Use rte_eth_tx_descriptor_status() to obtain the status of the transmit
queue,

Thanks
> Olivier
>



-- 
*Billy McFall*
SDN Group
Office of Technology
*Red Hat*

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-24 13:18                 ` Billy McFall
@ 2017-03-24 13:30                   ` Olivier Matz
  0 siblings, 0 replies; 54+ messages in thread
From: Olivier Matz @ 2017-03-24 13:30 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

On Fri, 24 Mar 2017 09:18:54 -0400, Billy McFall <bmcfall@redhat.com> wrote:
> On Fri, Mar 24, 2017 at 8:46 AM, Olivier Matz <olivier.matz@6wind.com>
> wrote:

[...]

> > > I read through this patch. This API doesn't indicate if the packet was  
> > > transmitted or dropped (I think that is what you were asking). This API
> > > could be used by the application to determine if the mbuf has been
> > > freed, as opposed to polling the rte_mbuf_refcnt_read() for a change
> > > in value. Did I miss your point?  
> >
> > Maybe my question was not clear :)
> > Let me try to reword it.
> >
> > For a traffic generator use-case, a dummy algorithm may be:
> >
> > 1/ send packets in a loop until a condition is met (ex: packet count
> > reached)
> > 2/ call rte_eth_tx_done_cleanup()
> > 3/ read stats for report
> >
> > I think there is something missing between 1/ and 2/, to ensure that
> > all packets that were in the tx queue are processed (either transmitted
> > or dropped). If that's not the case, both steps 2/ and 3/ will not
> > behave as expected:
> > - all mbufs won't be returned to the pool
> > - statistics may be wrong
> >
> > Maybe a simple wait() could do the job.
> > Using a combination of rte_eth_tx_done_cleanup() +
> > rte_eth_tx_descriptor_status()
> > is probably also a solution.
> >
> > Do you confirm rte_eth_tx_done_cleanup() does not check that?
> >
> Confirm.  rte_eth_tx_done_cleanup() does not check that. In the flooding  
> case,
> the applications is expected to poll rte_eth_tx_done_cleanup() until some
> condition
> is met, like ref_count of given packet is decremented. So on the packetGen
> case, the
> application would need to wait some time and/or call
> rte_eth_tx_descriptor_status()
> as you suggested.
> 
> My original patch returned RTE_DONE (no more packets pending),
> RTE_PROCESSING (freed what I could but there are still packets in the queue)
> or -ERRNO for error. Then packets freed count was returned via a pointer in
> the param list.
> That would have solved what you are asking, but that was shot down as being
> overkill.
> 
> Should I add another sentence to the packet generator bullet indicating
> that it is the
> application's job to make sure no more packets are pending? Like:
> 
>   In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for each
> destination interface it has been using
>   to request it to release of all its used mbufs.
> + It is the application's responsibility to ensure all packets have been
> processed by the destination interface.
> + Use rte_eth_tx_descriptor_status() to obtain the status of the transmit
> queue,

Thanks for the clarification.
Not sure the sentence is required, since rte_eth_tx_descriptor_status()
is not included yet.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring
  2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
                           ` (3 preceding siblings ...)
  2017-03-15 20:25         ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Wiles, Keith
@ 2017-03-24 18:55         ` Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 1/3] ethdev: " Billy McFall
                             ` (4 more replies)
  4 siblings, 5 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-24 18:55 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

See request from 11/21/2016:
  http://dpdk.org/ml/archives/dev/2016-November/050585.html

Add a new API to free consumed buffers on TX ring. This addresses two
scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.

---
v3:
* Removed extra white space in rte_ethdev.h.
* Removed inline of new API function in rte_ethdev.h.

---
v4:
* Added new API to documentation of per nic supported features.

---
v5:
* Added documentation to the Programmer's Guide.

---
v6:
* Rebase to master.
* Added Release Note.
* Addressed comments on documentation to the Programmer's Guide.
* Renamed feature string to "Free Tx mbuf on demand" and modified
  feature string processing code to extend the maximum string length
  from 20 characters to 25 characters.
* Reworded the commit log for PATCH 2/3 to attempt to clearify that the
  API is independent of tx_rs_thresh.  
* Will address the rte_errno comment in a separate merge.

---
v7:
* New API was located between rte_eth_tx_buffer and
  rte_eth_tx_buffer_set_err_callback in the file. Moved this function
  below the tx_buffer functions.
* Added previous acked-by tags to merge comments.
* Moved documentation content from the mempool documentation to the
  ethdev documentation.

---
v8:
* Update Program's Guide documentation per patch comments.


Acked-by: Keith Wiles <Keith.Wiles@intel.com>


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 doc/guides/conf.py                      |   7 +-
 doc/guides/nics/features/default.ini    |   4 +-
 doc/guides/nics/features/e1000.ini      |   1 +
 doc/guides/nics/features/vhost.ini      |   1 +
 doc/guides/prog_guide/poll_mode_drv.rst |  33 +++++++++
 doc/guides/rel_notes/release_17_05.rst  |   6 ++
 drivers/net/e1000/e1000_ethdev.h        |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/e1000/igb_rxtx.c            | 126 ++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c       |  11 +++
 lib/librte_ether/rte_ethdev.c           |  14 ++++
 lib/librte_ether/rte_ethdev.h           |  31 ++++++++
 12 files changed, 234 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v8 1/3] ethdev: new API to free consumed buffers in Tx ring
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
@ 2017-03-24 18:55           ` Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-24 18:55 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Keith Wiles <Keith.Wiles@intel.com>
---
 doc/guides/conf.py                      |  7 +++++--
 doc/guides/nics/features/default.ini    |  4 +++-
 doc/guides/prog_guide/poll_mode_drv.rst | 33 +++++++++++++++++++++++++++++++++
 doc/guides/rel_notes/release_17_05.rst  |  6 ++++++
 lib/librte_ether/rte_ethdev.c           | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h           | 31 +++++++++++++++++++++++++++++++
 6 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 34c62de..4cac26d 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -64,6 +64,9 @@
 
 master_doc = 'index'
 
+# Maximum feature description string length
+feature_str_len = 25
+
 # Figures, tables and code-blocks automatically numbered if they have caption
 numfig = True
 
@@ -300,7 +303,7 @@ def print_table_body(outfile, num_cols, ini_files, ini_data, default_features):
 def print_table_row(outfile, feature, line):
     """ Print a single row of the table with fixed formatting. """
     line = line.rstrip()
-    print('   {:<20}{}'.format(feature, line), file=outfile)
+    print('   {:<{}}{}'.format(feature, feature_str_len, line), file=outfile)
 
 
 def print_table_divider(outfile, num_cols):
@@ -309,7 +312,7 @@ def print_table_divider(outfile, num_cols):
     column_dividers = ['='] * num_cols
     line += ' '.join(column_dividers)
 
-    feature = '=' * 20
+    feature = '=' * feature_str_len
 
     print_table_row(outfile, feature, line)
 
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 299078f..0135c0c 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -3,7 +3,8 @@
 ;
 ; This file defines the features that are valid for inclusion in
 ; the other driver files and also the order that they appear in
-; the features table in the documentation.
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
 ;
 [Features]
 Speed capabilities   =
@@ -11,6 +12,7 @@ Link status          =
 Link status event    =
 Queue status event   =
 Rx interrupt         =
+Free Tx mbuf on demand =
 Queue start/stop     =
 MTU update           =
 Jumbo frame          =
diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index d4c92ea..e714bbe 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -249,6 +249,39 @@ One descriptor in the TX ring is used as a sentinel to avoid a hardware race con
 
     When configuring for DCB operation, at port initialization, both the number of transmit queues and the number of receive queues must be set to 128.
 
+Free Tx mbuf on Demand
+~~~~~~~~~~~~~~~~~~~~~~
+
+Many of the drivers don't release the mbuf back to the mempool, or local cache, immediately after the packet has been
+transmitted.
+Instead, they leave the mbuf in their Tx ring and either perform a bulk release when the ``tx_rs_thresh`` has been
+crossed or free the mbuf when a slot in the Tx ring is needed.
+
+An application can request the driver to release used mbufs with the ``rte_eth_tx_done_cleanup()`` API.
+This API requests the driver to release mbufs that are no longer in use, independent of whether or not the
+``tx_rs_thresh`` has been crossed.
+There are two scenarios when an application may want the mbuf released immediately:
+
+* When a given packet needs to be sent to multiple destination interfaces (either for Layer 2 flooding or Layer 3
+  multi-cast).
+  One option is to make a copy of the packet or a copy of the header portion that needs to be manipulated.
+  A second option is to transmit the packet and then poll the ``rte_eth_tx_done_cleanup()`` API until the reference
+  count on the packet is decremented.
+  Then the same packet can be transmitted to the next destination interface.
+  The application is still responsible for managing any packet manipulations needed between the different destination
+  interface, but a packet copy can be avoided.
+  This API is independent of whether the packet was transmitted or dropped, only that the mbuf is no longer in use by
+  the interface.
+
+* Some applications are designed to make multiple runs, like a packet generator.
+  For performance reasons and consistency between runs, the application may want to reset back to an initial state
+  between each run, where all mbufs are returned to the mempool.
+  In this case, it can call the ``rte_eth_tx_done_cleanup()`` API for each destination interface it has been using
+  to request it to release of all its used mbufs.
+
+To determine if a driver supports this API, check for the *Free Tx mbuf on demand* feature in the *Network Interface
+Controller Drivers* document.
+
 Hardware Offload
 ~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 918f483..25ae319 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -49,6 +49,12 @@ New Features
 
   sPAPR IOMMU based pci probing enabled for vfio-pci devices.
 
+* **Added free Tx mbuf on demand API.**
+
+  Added a new function ``rte_eth_tx_done_cleanup()`` which allows an application
+  to request the driver to release mbufs from their Tx ring that are no longer
+  in use, independent of whether or not the ``tx_rs_thresh`` has been crossed.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index eb0a94a..b796e7d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1275,6 +1275,20 @@ rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
 	return ret;
 }
 
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
 void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4be217c..b3ee872 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1183,6 +1183,9 @@ typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
 				     char *fw_version, size_t fw_size);
 /**< @internal Get firmware information of an Ethernet device. */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1488,6 +1491,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
 	eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
@@ -3178,6 +3182,33 @@ rte_eth_tx_buffer_count_callback(struct rte_mbuf **pkts, uint16_t unsent,
 		void *userdata);
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id, uint32_t free_cnt);
+
+/**
  * The eth device event type for interrupt, and maybe others in the future.
  */
 enum rte_eth_event_type {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v8 2/3] net/e1000: e1000 igb support to free consumed buffers
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 1/3] ethdev: " Billy McFall
@ 2017-03-24 18:55           ` Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 3/3] net/vhost: vHost " Billy McFall
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-24 18:55 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. This API is independent of the tx_rs_thresh
setting. With this API, buffers should be free even if tx_rs_thresh is
not met.

e1000 igb driver does not implement a tx_rs_thresh to free mbufs, it
frees a slot in the ring as needed. However, it could be implemented at
some future date.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Acked-by: Keith Wiles <Keith.Wiles@intel.com>
---
 doc/guides/nics/features/e1000.ini |   1 +
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c     |   1 +
 drivers/net/e1000/igb_rxtx.c       | 126 +++++++++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/doc/guides/nics/features/e1000.ini b/doc/guides/nics/features/e1000.ini
index 7f6d55c..3aed7d7 100644
--- a/doc/guides/nics/features/e1000.ini
+++ b/doc/guides/nics/features/e1000.ini
@@ -7,6 +7,7 @@
 Link status          = Y
 Link status event    = Y
 Rx interrupt         = Y
+Free Tx mbuf on demand = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 81a6dbb..39b2f43 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -315,6 +315,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index a112b38..71d05a9 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -408,6 +408,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 1bb4d85..cba3704 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1283,6 +1283,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [dpdk-dev] [PATCH v8 3/3] net/vhost: vHost support to free consumed buffers
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 1/3] ethdev: " Billy McFall
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
@ 2017-03-24 18:55           ` Billy McFall
  2017-03-27 15:20           ` [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
  2017-04-19 16:25           ` Ferruh Yigit
  4 siblings, 0 replies; 54+ messages in thread
From: Billy McFall @ 2017-03-24 18:55 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Keith Wiles <Keith.Wiles@intel.com>
---
 doc/guides/nics/features/vhost.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c  | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/features/vhost.ini b/doc/guides/nics/features/vhost.ini
index 23166fb..dffd1f4 100644
--- a/doc/guides/nics/features/vhost.ini
+++ b/doc/guides/nics/features/vhost.ini
@@ -6,6 +6,7 @@
 [Features]
 Link status          = Y
 Link status event    = Y
+Free Tx mbuf on demand = Y
 Queue status event   = Y
 Basic stats          = Y
 Extended stats       = Y
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index e98cffd..abe91c7 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -960,6 +960,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -1001,6 +1011,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
                             ` (2 preceding siblings ...)
  2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 3/3] net/vhost: vHost " Billy McFall
@ 2017-03-27 15:20           ` Thomas Monjalon
  2017-04-19 16:25           ` Ferruh Yigit
  4 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2017-03-27 15:20 UTC (permalink / raw)
  To: Billy McFall; +Cc: wenzhuo.lu, olivier.matz, dev, ferruh.yigit

2017-03-24 14:55, Billy McFall:
> See request from 11/21/2016:
>   http://dpdk.org/ml/archives/dev/2016-November/050585.html
> 
> Add a new API to free consumed buffers on TX ring. This addresses two
> scenarios:
> 1) Flooding a packet and want to reuse existing mbuf to avoid a packet
> copy. Increment the reference count of the packet and poll new API until
> reference count is decremented.
> 2) Application runs out of mbufs, or resets and is preparing for
> additional run, call API to free consumed packets so processing can
> continue.
> 
> API will return the number of packets freed (0-n) or error code if
> feature not supported (-ENOTSUP) or input invalid (-ENODEV).
> 
> API for e1000 igb driver and vHost driver have been implemented. Other
> drivers can be implemented over time. Some drivers implement a Tx done
> flush routine that should be reused where possible. e1000 igb driver
> and vHost driver do not have such functions.

Applied, thanks.

Hope the API will be implemented in other drivers.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring
  2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
                             ` (3 preceding siblings ...)
  2017-03-27 15:20           ` [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
@ 2017-04-19 16:25           ` Ferruh Yigit
  4 siblings, 0 replies; 54+ messages in thread
From: Ferruh Yigit @ 2017-04-19 16:25 UTC (permalink / raw)
  To: Billy McFall, thomas.monjalon, wenzhuo.lu, olivier.matz; +Cc: dev

On 3/24/2017 6:55 PM, Billy McFall wrote:
> See request from 11/21/2016:
>   http://dpdk.org/ml/archives/dev/2016-November/050585.html
> 
> Add a new API to free consumed buffers on TX ring. This addresses two
> scenarios:
> 1) Flooding a packet and want to reuse existing mbuf to avoid a packet
> copy. Increment the reference count of the packet and poll new API until
> reference count is decremented.
> 2) Application runs out of mbufs, or resets and is preparing for
> additional run, call API to free consumed packets so processing can
> continue.
> 
> API will return the number of packets freed (0-n) or error code if
> feature not supported (-ENOTSUP) or input invalid (-ENODEV).
> 
> API for e1000 igb driver and vHost driver have been implemented. Other
> drivers can be implemented over time. Some drivers implement a Tx done
> flush routine that should be reused where possible. e1000 igb driver
> and vHost driver do not have such functions.

Another reminder for PMDs. This is new eth_dev_ops, described above:
"tx_done_cleanup"

Currently only implemented by igb and vhost. Other PMDs feel free to
implement new ops.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2017-04-19 16:25 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20 16:01 [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Billy McFall
2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 1/3] ethdev: " Billy McFall
2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-01-22  3:47   ` Lu, Wenzhuo
2017-01-23 13:49     ` Billy McFall
2017-01-24  0:42       ` Lu, Wenzhuo
2017-01-20 16:01 ` [dpdk-dev] [PATCH v3 3/3] net/vhost: vHost " Billy McFall
2017-01-23 15:25 ` [dpdk-dev] [PATCH v3 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
2017-01-23 21:13 ` [dpdk-dev] [PATCH v4 " Billy McFall
2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 1/3] ethdev: " Billy McFall
2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-01-23 21:13   ` [dpdk-dev] [PATCH v4 3/3] net/vhost: vHost " Billy McFall
2017-01-27 18:37   ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Billy McFall
2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 1/3] ethdev: " Billy McFall
2017-02-27 13:48       ` Thomas Monjalon
2017-03-07 14:29         ` Billy McFall
2017-03-07 16:03           ` Thomas Monjalon
2017-03-09 15:49             ` Billy McFall
2017-03-09 17:11               ` Thomas Monjalon
2017-03-07 16:35           ` Mcnamara, John
2017-03-07 16:42       ` Mcnamara, John
2017-01-27 18:37     ` [dpdk-dev] [PATCH v5 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-02-27 13:49       ` Thomas Monjalon
2017-02-28  1:07       ` Lu, Wenzhuo
2017-01-27 18:38     ` [dpdk-dev] [PATCH v5 3/3] net/vhost: vHost " Billy McFall
2017-02-27 13:50       ` Thomas Monjalon
2017-02-28  6:41         ` Yuanhan Liu
2017-03-01 10:15           ` Maxime Coquelin
2017-03-07 21:59     ` [dpdk-dev] [PATCH v5 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
2017-03-09 20:51     ` [dpdk-dev] [PATCH v6 " Billy McFall
2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 1/3] ethdev: " Billy McFall
2017-03-15 10:29         ` Olivier Matz
2017-03-15 15:01           ` Billy McFall
2017-03-15 10:30         ` Thomas Monjalon
2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-03-13  3:17         ` Lu, Wenzhuo
2017-03-09 20:51       ` [dpdk-dev] [PATCH v6 3/3] net/vhost: vHost " Billy McFall
2017-03-15 10:27         ` Thomas Monjalon
2017-03-15 18:02       ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Billy McFall
2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 1/3] ethdev: " Billy McFall
2017-03-23 10:37           ` Olivier MATZ
2017-03-23 13:32             ` Billy McFall
2017-03-24 12:46               ` Olivier Matz
2017-03-24 13:18                 ` Billy McFall
2017-03-24 13:30                   ` Olivier Matz
2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-03-15 18:02         ` [dpdk-dev] [PATCH v7 3/3] net/vhost: vHost " Billy McFall
2017-03-15 20:25         ` [dpdk-dev] [PATCH v7 0/3] new API to free consumed buffers in Tx ring Wiles, Keith
2017-03-24 18:55         ` [dpdk-dev] [PATCH v8 " Billy McFall
2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 1/3] ethdev: " Billy McFall
2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 2/3] net/e1000: e1000 igb support to free consumed buffers Billy McFall
2017-03-24 18:55           ` [dpdk-dev] [PATCH v8 3/3] net/vhost: vHost " Billy McFall
2017-03-27 15:20           ` [dpdk-dev] [PATCH v8 0/3] new API to free consumed buffers in Tx ring Thomas Monjalon
2017-04-19 16:25           ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).