DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring
@ 2016-12-16 12:48 Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Billy McFall @ 2016-12-16 12:48 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Based on a request from Damjan Marion and seconded by Keith Wiles, see
dpdk-dev mailling list from 11/21/2016, add a new API to free consumed
buffers on TX ring. This addresses two scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs so call API to free consumed packets so
processing can continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a TX done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

Billy McFall (3):
  ethdev: New API to free consumed buffers in TX ring
  driver: e1000 igb support to free consumed buffers
  driver: vHost support to free consumed buffers

 drivers/net/e1000/e1000_ethdev.h  |   2 +
 drivers/net/e1000/igb_ethdev.c    |   1 +
 drivers/net/e1000/igb_rxtx.c      | 126 ++++++++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c |  11 ++++
 lib/librte_ether/rte_ethdev.h     |  56 +++++++++++++++++
 5 files changed, 196 insertions(+)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-16 12:48 [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring Billy McFall
@ 2016-12-16 12:48 ` Billy McFall
  2016-12-16 16:28   ` Stephen Hemminger
  2016-12-20 11:27   ` Adrien Mazarguil
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 2/3] driver: e1000 igb support to free consumed buffers Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 3/3] driver: vHost " Billy McFall
  2 siblings, 2 replies; 12+ messages in thread
From: Billy McFall @ 2016-12-16 12:48 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add a new API to force free consumed buffers on TX ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Because rte_eth_tx_buffer() may be used, and mbufs may still be held
in local buffer, the API also accepts *buffer and *sent. Before
attempting to free, rte_eth_tx_buffer_flush() is called to make sure
all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
if threshold is not met.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..e3f2be4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1467,6 +1470,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable;
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
 	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
 	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
@@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @param buffer
+ *   Buffer used to collect packets to be sent. If provided, the buffer will
+ *   be flushed, even if the current length is less than buffer->size. Pass NULL
+ *   if buffer has already been flushed.
+ * @param sent
+ *   Pointer to return number of packets sent if buffer has packets to be sent.
+ *   If *buffer is supplied, *sent must also be supplied.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+
+static inline int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
+		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/*
+	 * If transmit buffer is provided and there are still packets to be
+	 * sent, then send them before attempting to free pending mbufs.
+	 */
+	if (buffer && sent)
+		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH 2/3] driver: e1000 igb support to free consumed buffers
  2016-12-16 12:48 [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
@ 2016-12-16 12:48 ` Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 3/3] driver: vHost " Billy McFall
  2 siblings, 0 replies; 12+ messages in thread
From: Billy McFall @ 2016-12-16 12:48 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the e1000 igb driver for the new API to force free
consumed buffers on TX ring. e1000 igb driver does not implement a
tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
a new function needed to be written.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/e1000/e1000_ethdev.h |   2 +
 drivers/net/e1000/igb_ethdev.c   |   1 +
 drivers/net/e1000/igb_rxtx.c     | 126 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..fce0278 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -308,6 +308,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 2fddf0c..4010dc4 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -402,6 +402,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..1d4fbcb 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1227,6 +1227,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH 3/3] driver: vHost support to free consumed buffers
  2016-12-16 12:48 [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 2/3] driver: e1000 igb support to free consumed buffers Billy McFall
@ 2016-12-16 12:48 ` Billy McFall
  2016-12-16 16:24   ` Stephen Hemminger
  2 siblings, 1 reply; 12+ messages in thread
From: Billy McFall @ 2016-12-16 12:48 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Add support to the vHostdriver for the new API to force free consumed
buffers on TX ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 766d4ef..6493d56 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -939,6 +939,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] driver: vHost support to free consumed buffers
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 3/3] driver: vHost " Billy McFall
@ 2016-12-16 16:24   ` Stephen Hemminger
  2017-01-11 19:54     ` Billy McFall
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2016-12-16 16:24 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

On Fri, 16 Dec 2016 07:48:51 -0500
Billy McFall <bmcfall@redhat.com> wrote:

> Add support to the vHostdriver for the new API to force free consumed
> buffers on TX ring. vHost does not cache the mbufs so there is no work
> to do.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index 766d4ef..6493d56 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -939,6 +939,16 @@ eth_queue_release(void *q)
>  }
>  
>  static int
> +eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
> +{
> +	/*
> +	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
> +	 * and releases mbuf, so nothing to cleanup.
> +	 */
> +	return 0;
> +}
> +
> +static int
>  eth_link_update(struct rte_eth_dev *dev __rte_unused,
>  		int wait_to_complete __rte_unused)
>  {
> @@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
>  	.tx_queue_setup = eth_tx_queue_setup,
>  	.rx_queue_release = eth_queue_release,
>  	.tx_queue_release = eth_queue_release,
> +	.tx_done_cleanup = eth_tx_done_cleanup,
>  	.link_update = eth_link_update,
>  	.stats_get = eth_stats_get,
>  	.stats_reset = eth_stats_reset,

Rather than having to change every drive, why not make this the default
behavior?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
@ 2016-12-16 16:28   ` Stephen Hemminger
  2016-12-20 11:27   ` Adrien Mazarguil
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2016-12-16 16:28 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev

On Fri, 16 Dec 2016 07:48:49 -0500
Billy McFall <bmcfall@redhat.com> wrote:

> /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @param buffer
> + *   Buffer used to collect packets to be sent. If provided, the buffer will
> + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> + *   if buffer has already been flushed.
> + * @param sent
> + *   Pointer to return number of packets sent if buffer has packets to be sent.
> + *   If *buffer is supplied, *sent must also be supplied.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +
> +static inline int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)


This API is more complex than it needs to be.
For the typical use case of OOM kind of cleanup, this is overkill.
There is no need for:
  free_cnt - device driver should just free all
  buffer/param - the application should not care.

The DPDK model is that once mbuf's are passed to device, the device "owns"
the mbuf. I think changing that model is just going to break things for
no gain.  It does make sense to have a "please cleanup your mbufs" call.
If application is using special mbuf's then it can use the normal callback
on done model.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
  2016-12-16 16:28   ` Stephen Hemminger
@ 2016-12-20 11:27   ` Adrien Mazarguil
  2016-12-20 12:17     ` Ananyev, Konstantin
  1 sibling, 1 reply; 12+ messages in thread
From: Adrien Mazarguil @ 2016-12-20 11:27 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev, Stephen Hemminger

Hi Billy,

On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> Add a new API to force free consumed buffers on TX ring. API will return
> the number of packets freed (0-n) or error code if feature not supported
> (-ENOTSUP) or input invalid (-ENODEV).
> 
> Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> in local buffer, the API also accepts *buffer and *sent. Before
> attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> if threshold is not met.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 9678179..e3f2be4 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>  /**< @internal Check DD bit of specific RX descriptor */
>  
> +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> +/**< @internal Force mbufs to be from TX ring. */
> +
>  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>  
> @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>  	eth_rx_disable_intr_t      rx_queue_intr_disable;
>  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>  }
>  
>  /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @param buffer
> + *   Buffer used to collect packets to be sent. If provided, the buffer will
> + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> + *   if buffer has already been flushed.
> + * @param sent
> + *   Pointer to return number of packets sent if buffer has packets to be sent.
> + *   If *buffer is supplied, *sent must also be supplied.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +
> +static inline int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	/* Validate Input Data. Bail if not valid or not supported. */
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> +
> +	/*
> +	 * If transmit buffer is provided and there are still packets to be
> +	 * sent, then send them before attempting to free pending mbufs.
> +	 */
> +	if (buffer && sent)
> +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> +
> +	/* Call driver to free pending mbufs. */
> +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> +			free_cnt);
> +}
> +
> +/**
>   * Configure a callback for buffered packets which cannot be sent
>   *
>   * Register a specific callback to be called when an attempt is made to send

Just a thought to follow-up on Stephen's comment to further simplify this
API, how about not adding any new eth_dev_ops but instead defining what
should happen during an empty TX burst call (tx_burst() with 0 packets).

Several PMDs already have a check for this scenario and start by cleaning up
completed packets anyway, they effectively partially implement this
definition for free already.

The main difference with this API would be that you wouldn't know how many
mbufs were freed and wouldn't collect them into an array. However most
applications have one mbuf pool and/or know where they come from, so they
can just query the pool or attempt to re-allocate from it after doing empty
bursts in case of starvation.

[1] http://dpdk.org/ml/archives/dev/2016-December/052469.html

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-20 11:27   ` Adrien Mazarguil
@ 2016-12-20 12:17     ` Ananyev, Konstantin
  2016-12-20 12:58       ` Adrien Mazarguil
  0 siblings, 1 reply; 12+ messages in thread
From: Ananyev, Konstantin @ 2016-12-20 12:17 UTC (permalink / raw)
  To: Adrien Mazarguil, Billy McFall
  Cc: thomas.monjalon, Lu, Wenzhuo, dev, Stephen Hemminger



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Tuesday, December 20, 2016 11:28 AM
> To: Billy McFall <bmcfall@redhat.com>
> Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> <stephen@networkplumber.org>
> Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> 
> Hi Billy,
> 
> On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> > Add a new API to force free consumed buffers on TX ring. API will return
> > the number of packets freed (0-n) or error code if feature not supported
> > (-ENOTSUP) or input invalid (-ENODEV).
> >
> > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> > in local buffer, the API also accepts *buffer and *sent. Before
> > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> > if threshold is not met.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 9678179..e3f2be4 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >  /**< @internal Check DD bit of specific RX descriptor */
> >
> > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> > +/**< @internal Force mbufs to be from TX ring. */
> > +
> >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >
> > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >  	eth_rx_disable_intr_t      rx_queue_intr_disable;
> >  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> > +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >  }
> >
> >  /**
> > + * Request the driver to free mbufs currently cached by the driver. The
> > + * driver will only free the mbuf if it is no longer in use.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *   The index of the transmit queue through which output packets must be
> > + *   sent.
> > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > + *   to rte_eth_dev_configure().
> > + * @param free_cnt
> > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> > + *   should be freed. Note that a packet may be using multiple mbufs.
> > + * @param buffer
> > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> > + *   if buffer has already been flushed.
> > + * @param sent
> > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> > + *   If *buffer is supplied, *sent must also be supplied.
> > + * @return
> > + *   Failure: < 0
> > + *     -ENODEV: Invalid interface
> > + *     -ENOTSUP: Driver does not support function
> > + *   Success: >= 0
> > + *     0-n: Number of packets freed. More packets may still remain in ring that
> > + *     are in use.
> > + */
> > +
> > +static inline int
> > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> > +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +	/* Validate Input Data. Bail if not valid or not supported. */
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> > +
> > +	/*
> > +	 * If transmit buffer is provided and there are still packets to be
> > +	 * sent, then send them before attempting to free pending mbufs.
> > +	 */
> > +	if (buffer && sent)
> > +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> > +
> > +	/* Call driver to free pending mbufs. */
> > +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> > +			free_cnt);
> > +}
> > +
> > +/**
> >   * Configure a callback for buffered packets which cannot be sent
> >   *
> >   * Register a specific callback to be called when an attempt is made to send
> 
> Just a thought to follow-up on Stephen's comment to further simplify this
> API, how about not adding any new eth_dev_ops but instead defining what
> should happen during an empty TX burst call (tx_burst() with 0 packets).
> 
> Several PMDs already have a check for this scenario and start by cleaning up
> completed packets anyway, they effectively partially implement this
> definition for free already.

Many PMDs  start by cleaning up only when number of free entries
drop below some point.
Also in that case the author would have to modify (and test) all existing TX routinies.
So I think a separate API call seems more plausible.
Though I am agree with previous comment from Stephen that last two parameters
are redundant and would just overcomplicate things.
tin

> 
> The main difference with this API would be that you wouldn't know how many
> mbufs were freed and wouldn't collect them into an array. However most
> applications have one mbuf pool and/or know where they come from, so they
> can just query the pool or attempt to re-allocate from it after doing empty
> bursts in case of starvation.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-20 12:17     ` Ananyev, Konstantin
@ 2016-12-20 12:58       ` Adrien Mazarguil
  2016-12-20 14:15         ` Billy McFall
  0 siblings, 1 reply; 12+ messages in thread
From: Adrien Mazarguil @ 2016-12-20 12:58 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Billy McFall, thomas.monjalon, Lu, Wenzhuo, dev, Stephen Hemminger

On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Tuesday, December 20, 2016 11:28 AM
> > To: Billy McFall <bmcfall@redhat.com>
> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> > <stephen@networkplumber.org>
> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> > 
> > Hi Billy,
> > 
> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> > > Add a new API to force free consumed buffers on TX ring. API will return
> > > the number of packets freed (0-n) or error code if feature not supported
> > > (-ENOTSUP) or input invalid (-ENODEV).
> > >
> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> > > in local buffer, the API also accepts *buffer and *sent. Before
> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> > > if threshold is not met.
> > >
> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > > index 9678179..e3f2be4 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> > >  /**< @internal Check DD bit of specific RX descriptor */
> > >
> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> > > +/**< @internal Force mbufs to be from TX ring. */
> > > +
> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> > >  	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> > >
> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> > >  	eth_rx_disable_intr_t      rx_queue_intr_disable;
> > >  	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> > >  	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> > > +	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> > >  	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> > >  	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> > >  	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> > >  }
> > >
> > >  /**
> > > + * Request the driver to free mbufs currently cached by the driver. The
> > > + * driver will only free the mbuf if it is no longer in use.
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param queue_id
> > > + *   The index of the transmit queue through which output packets must be
> > > + *   sent.
> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> > > + *   to rte_eth_dev_configure().
> > > + * @param free_cnt
> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> > > + * @param buffer
> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> > > + *   if buffer has already been flushed.
> > > + * @param sent
> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> > > + *   If *buffer is supplied, *sent must also be supplied.
> > > + * @return
> > > + *   Failure: < 0
> > > + *     -ENODEV: Invalid interface
> > > + *     -ENOTSUP: Driver does not support function
> > > + *   Success: >= 0
> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> > > + *     are in use.
> > > + */
> > > +
> > > +static inline int
> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> > > +		struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> > > +{
> > > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > +
> > > +	/* Validate Input Data. Bail if not valid or not supported. */
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> > > +
> > > +	/*
> > > +	 * If transmit buffer is provided and there are still packets to be
> > > +	 * sent, then send them before attempting to free pending mbufs.
> > > +	 */
> > > +	if (buffer && sent)
> > > +		*sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> > > +
> > > +	/* Call driver to free pending mbufs. */
> > > +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> > > +			free_cnt);
> > > +}
> > > +
> > > +/**
> > >   * Configure a callback for buffered packets which cannot be sent
> > >   *
> > >   * Register a specific callback to be called when an attempt is made to send
> > 
> > Just a thought to follow-up on Stephen's comment to further simplify this
> > API, how about not adding any new eth_dev_ops but instead defining what
> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> > 
> > Several PMDs already have a check for this scenario and start by cleaning up
> > completed packets anyway, they effectively partially implement this
> > definition for free already.
> 
> Many PMDs  start by cleaning up only when number of free entries
> drop below some point.
> Also in that case the author would have to modify (and test) all existing TX routinies.
> So I think a separate API call seems more plausible.

Not necessarily, as I understand this API in its current form only suggests
that a PMD should release a few mbufs from a queue if possible, without any
guarantee, PMDs are not forced to comply.

I think the threshold you mention is a valid reason not to release them, and
it wouldn't change a thing to existing tx_burst() implementations in the
meantime (only documentation).

This threshold could also be bypassed rather painlessly in the
"if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
way or another.

> Though I am agree with previous comment from Stephen that last two parameters
> are redundant and would just overcomplicate things.
> tin
> 
> > 
> > The main difference with this API would be that you wouldn't know how many
> > mbufs were freed and wouldn't collect them into an array. However most
> > applications have one mbuf pool and/or know where they come from, so they
> > can just query the pool or attempt to re-allocate from it after doing empty
> > bursts in case of starvation.
> > 
> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-20 12:58       ` Adrien Mazarguil
@ 2016-12-20 14:15         ` Billy McFall
  2016-12-23  9:45           ` Adrien Mazarguil
  0 siblings, 1 reply; 12+ messages in thread
From: Billy McFall @ 2016-12-20 14:15 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Ananyev, Konstantin, thomas.monjalon, Lu, Wenzhuo, dev,
	Stephen Hemminger

Thank you for your responses, see inline.

On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
>>
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
>> > Sent: Tuesday, December 20, 2016 11:28 AM
>> > To: Billy McFall <bmcfall@redhat.com>
>> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
>> > <stephen@networkplumber.org>
>> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
>> >
>> > Hi Billy,
>> >
>> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
>> > > Add a new API to force free consumed buffers on TX ring. API will return
>> > > the number of packets freed (0-n) or error code if feature not supported
>> > > (-ENOTSUP) or input invalid (-ENODEV).
>> > >
>> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
>> > > in local buffer, the API also accepts *buffer and *sent. Before
>> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
>> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
>> > > if threshold is not met.
>> > >
>> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
>> > > ---
>> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 56 insertions(+)
>> > >
>> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> > > index 9678179..e3f2be4 100644
>> > > --- a/lib/librte_ether/rte_ethdev.h
>> > > +++ b/lib/librte_ether/rte_ethdev.h
>> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>> > >  /**< @internal Check DD bit of specific RX descriptor */
>> > >
>> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
>> > > +/**< @internal Force mbufs to be from TX ring. */
>> > > +
>> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>> > >
>> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
>> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
>> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>> > >  }
>> > >
>> > >  /**
>> > > + * Request the driver to free mbufs currently cached by the driver. The
>> > > + * driver will only free the mbuf if it is no longer in use.
>> > > + *
>> > > + * @param port_id
>> > > + *   The port identifier of the Ethernet device.
>> > > + * @param queue_id
>> > > + *   The index of the transmit queue through which output packets must be
>> > > + *   sent.
>> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
>> > > + *   to rte_eth_dev_configure().
>> > > + * @param free_cnt
>> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
>> > > + *   should be freed. Note that a packet may be using multiple mbufs.
>> > > + * @param buffer
>> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
>> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
>> > > + *   if buffer has already been flushed.
>> > > + * @param sent
>> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
>> > > + *   If *buffer is supplied, *sent must also be supplied.
>> > > + * @return
>> > > + *   Failure: < 0
>> > > + *     -ENODEV: Invalid interface
>> > > + *     -ENOTSUP: Driver does not support function
>> > > + *   Success: >= 0
>> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
>> > > + *     are in use.
>> > > + */
>> > > +
>> > > +static inline int
>> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
>> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
>> > > +{
>> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> > > +
>> > > + /* Validate Input Data. Bail if not valid or not supported. */
>> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
>> > > +
>> > > + /*
>> > > +  * If transmit buffer is provided and there are still packets to be
>> > > +  * sent, then send them before attempting to free pending mbufs.
>> > > +  */
>> > > + if (buffer && sent)
>> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>> > > +
>> > > + /* Call driver to free pending mbufs. */
>> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
>> > > +                 free_cnt);
>> > > +}
>> > > +
>> > > +/**
>> > >   * Configure a callback for buffered packets which cannot be sent
>> > >   *
>> > >   * Register a specific callback to be called when an attempt is made to send
>> >

I will remove the buffer/sent parameters. It will be the applications
responsibility
to make sure rte_eth_tx_buffer_flush() is called.

I don't feel strongly about the free_cnt parameter. It was in the
original request
so that if there was a large ring buffer, the API could bail early
without having
to go through all the entire ring. It might be a little unrealistic
for the application
to truly know how many mbufs it wants freed. Also, as an example, the I40e
driver already has a i40e_tx_free_bufs(...) function, so by dropping
the free_cnt
parameter, this function could be reused without having to account for
the free_cnt.

>> > Just a thought to follow-up on Stephen's comment to further simplify this
>> > API, how about not adding any new eth_dev_ops but instead defining what
>> > should happen during an empty TX burst call (tx_burst() with 0 packets).
>> >

In the original API request thread, see dpdk-dev mailing list from 11/21/2016
with subject "Adding API to force freeing consumed buffers in TX ring",
overloading the existing API with nb_pkts == 0 was suggested and consensus
was to go with new API. I lean towards a new API since this is a special case
most applications won't use, but I will go with the community on whether to
enhance the existing burst functionality or add a new API.

>> > Several PMDs already have a check for this scenario and start by cleaning up
>> > completed packets anyway, they effectively partially implement this
>> > definition for free already.
>>
>> Many PMDs  start by cleaning up only when number of free entries
>> drop below some point.

True, but the original request for this API was for the scenario where packets
are being flooded and the application wanted to reuse mbuf to avoid a packet
copy. So the API was to request the driver to free "done" mbufs outside of any
threshold.

>> Also in that case the author would have to modify (and test) all existing TX routinies.
>> So I think a separate API call seems more plausible.
>
> Not necessarily, as I understand this API in its current form only suggests
> that a PMD should release a few mbufs from a queue if possible, without any
> guarantee, PMDs are not forced to comply.
>
> I think the threshold you mention is a valid reason not to release them, and
> it wouldn't change a thing to existing tx_burst() implementations in the
> meantime (only documentation).
>
> This threshold could also be bypassed rather painlessly in the
> "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> way or another.
>
>> Though I am agree with previous comment from Stephen that last two parameters
>> are redundant and would just overcomplicate things.
>> tin
>>
>> >
>> > The main difference with this API would be that you wouldn't know how many
>> > mbufs were freed and wouldn't collect them into an array. However most
>> > applications have one mbuf pool and/or know where they come from, so they
>> > can just query the pool or attempt to re-allocate from it after doing empty
>> > bursts in case of starvation.
>> >
>> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
>
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  2016-12-20 14:15         ` Billy McFall
@ 2016-12-23  9:45           ` Adrien Mazarguil
  0 siblings, 0 replies; 12+ messages in thread
From: Adrien Mazarguil @ 2016-12-23  9:45 UTC (permalink / raw)
  To: Billy McFall
  Cc: Ananyev, Konstantin, thomas.monjalon, Lu, Wenzhuo, dev,
	Stephen Hemminger

Hi Billy,

On Tue, Dec 20, 2016 at 09:15:50AM -0500, Billy McFall wrote:
> Thank you for your responses, see inline.
> 
> On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> >> > Sent: Tuesday, December 20, 2016 11:28 AM
> >> > To: Billy McFall <bmcfall@redhat.com>
> >> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> >> > <stephen@networkplumber.org>
> >> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> >> >
> >> > Hi Billy,
> >> >
> >> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> >> > > Add a new API to force free consumed buffers on TX ring. API will return
> >> > > the number of packets freed (0-n) or error code if feature not supported
> >> > > (-ENOTSUP) or input invalid (-ENODEV).
> >> > >
> >> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> >> > > in local buffer, the API also accepts *buffer and *sent. Before
> >> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> >> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> >> > > if threshold is not met.
> >> > >
> >> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> >> > > ---
> >> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >> > >  1 file changed, 56 insertions(+)
> >> > >
> >> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >> > > index 9678179..e3f2be4 100644
> >> > > --- a/lib/librte_ether/rte_ethdev.h
> >> > > +++ b/lib/librte_ether/rte_ethdev.h
> >> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >> > >  /**< @internal Check DD bit of specific RX descriptor */
> >> > >
> >> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> >> > > +/**< @internal Force mbufs to be from TX ring. */
> >> > > +
> >> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >> > >
> >> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
> >> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> >> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> >> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >> > >  }
> >> > >
> >> > >  /**
> >> > > + * Request the driver to free mbufs currently cached by the driver. The
> >> > > + * driver will only free the mbuf if it is no longer in use.
> >> > > + *
> >> > > + * @param port_id
> >> > > + *   The port identifier of the Ethernet device.
> >> > > + * @param queue_id
> >> > > + *   The index of the transmit queue through which output packets must be
> >> > > + *   sent.
> >> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> >> > > + *   to rte_eth_dev_configure().
> >> > > + * @param free_cnt
> >> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> >> > > + * @param buffer
> >> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> >> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> >> > > + *   if buffer has already been flushed.
> >> > > + * @param sent
> >> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> >> > > + *   If *buffer is supplied, *sent must also be supplied.
> >> > > + * @return
> >> > > + *   Failure: < 0
> >> > > + *     -ENODEV: Invalid interface
> >> > > + *     -ENOTSUP: Driver does not support function
> >> > > + *   Success: >= 0
> >> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> >> > > + *     are in use.
> >> > > + */
> >> > > +
> >> > > +static inline int
> >> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> >> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> >> > > +{
> >> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >> > > +
> >> > > + /* Validate Input Data. Bail if not valid or not supported. */
> >> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> >> > > +
> >> > > + /*
> >> > > +  * If transmit buffer is provided and there are still packets to be
> >> > > +  * sent, then send them before attempting to free pending mbufs.
> >> > > +  */
> >> > > + if (buffer && sent)
> >> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >> > > +
> >> > > + /* Call driver to free pending mbufs. */
> >> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> >> > > +                 free_cnt);
> >> > > +}
> >> > > +
> >> > > +/**
> >> > >   * Configure a callback for buffered packets which cannot be sent
> >> > >   *
> >> > >   * Register a specific callback to be called when an attempt is made to send
> >> >
> 
> I will remove the buffer/sent parameters. It will be the applications
> responsibility
> to make sure rte_eth_tx_buffer_flush() is called.
> 
> I don't feel strongly about the free_cnt parameter. It was in the
> original request
> so that if there was a large ring buffer, the API could bail early
> without having
> to go through all the entire ring. It might be a little unrealistic
> for the application
> to truly know how many mbufs it wants freed. Also, as an example, the I40e
> driver already has a i40e_tx_free_bufs(...) function, so by dropping
> the free_cnt
> parameter, this function could be reused without having to account for
> the free_cnt.
> 
> >> > Just a thought to follow-up on Stephen's comment to further simplify this
> >> > API, how about not adding any new eth_dev_ops but instead defining what
> >> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> >> >
> 
> In the original API request thread, see dpdk-dev mailing list from 11/21/2016
> with subject "Adding API to force freeing consumed buffers in TX ring",
> overloading the existing API with nb_pkts == 0 was suggested and consensus
> was to go with new API. I lean towards a new API since this is a special case
> most applications won't use, but I will go with the community on whether to
> enhance the existing burst functionality or add a new API.

OK, I've just read the original thread.

> >> > Several PMDs already have a check for this scenario and start by cleaning up
> >> > completed packets anyway, they effectively partially implement this
> >> > definition for free already.
> >>
> >> Many PMDs  start by cleaning up only when number of free entries
> >> drop below some point.
> 
> True, but the original request for this API was for the scenario where packets
> are being flooded and the application wanted to reuse mbuf to avoid a packet
> copy. So the API was to request the driver to free "done" mbufs outside of any
> threshold.

Understood, so it's more than just a polite suggestion to PMDs that
implement this call. In my opinion it's still better to avoid adding a new
callback for that purpose since applications cannot rely on a specific
outcome, it cannot guarantee any mbuf would be freed, not unlike calling
tx_burst() with 0 packets.

That's a separate discussion, however perhaps making struct eth_dev_ops part
of the public API was not such a good idea after all. We're unable to
maintain ABI compatibility across releases because of it.

New callbacks would be met with less resistance (at least on my side) if
this whole ABI compat thing was not an issue.

> >> Also in that case the author would have to modify (and test) all existing TX routinies.
> >> So I think a separate API call seems more plausible.
> >
> > Not necessarily, as I understand this API in its current form only suggests
> > that a PMD should release a few mbufs from a queue if possible, without any
> > guarantee, PMDs are not forced to comply.
> >
> > I think the threshold you mention is a valid reason not to release them, and
> > it wouldn't change a thing to existing tx_burst() implementations in the
> > meantime (only documentation).
> >
> > This threshold could also be bypassed rather painlessly in the
> > "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> > way or another.
> >
> >> Though I am agree with previous comment from Stephen that last two parameters
> >> are redundant and would just overcomplicate things.
> >> tin
> >>
> >> >
> >> > The main difference with this API would be that you wouldn't know how many
> >> > mbufs were freed and wouldn't collect them into an array. However most
> >> > applications have one mbuf pool and/or know where they come from, so they
> >> > can just query the pool or attempt to re-allocate from it after doing empty
> >> > bursts in case of starvation.
> >> >
> >> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> >
> > --
> > Adrien Mazarguil
> > 6WIND

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] driver: vHost support to free consumed buffers
  2016-12-16 16:24   ` Stephen Hemminger
@ 2017-01-11 19:54     ` Billy McFall
  0 siblings, 0 replies; 12+ messages in thread
From: Billy McFall @ 2017-01-11 19:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: thomas.monjalon, wenzhuo.lu, dev

This new API is attempting to address two scenarios:
1) Application wants to reuse existing mbuf to avoid a packet copy
(example: Flooding a packet to multiple ports). The application increments
the reference count of the packet and then polls new API until the reference
count for the given mbuf is decremented.
2) Application runs out of mbufs, or some application like pktgen finishes
a run and is preparing for an additional run, calls API to free consumed
packets so processing can continue.

With the current design, the application calls the new API, if rval >= 0,
assume mubfs are being freed and can call multiple times if need be (to
either get enough mbufs to continue or to get the specific one freed). If
rval < 0, take some other action, like make a copy of packet in the
flooding case or whatever is currently done in the application today.

If the default behavior is to return 0, the application can't take any
additional actions.

Submitting V2 of the patch with the rte_eth_tx_buffer_flush() call and
associated parameters removed and to continue the discussion on new API or
not.

Thanks,
Billy McFall


On Fri, Dec 16, 2016 at 11:24 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Fri, 16 Dec 2016 07:48:51 -0500
> Billy McFall <bmcfall@redhat.com> wrote:
>
> > Add support to the vHostdriver for the new API to force free consumed
> > buffers on TX ring. vHost does not cache the mbufs so there is no work
> > to do.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > index 766d4ef..6493d56 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -939,6 +939,16 @@ eth_queue_release(void *q)
> >  }
> >
> >  static int
> > +eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt
> __rte_unused)
> > +{
> > +     /*
> > +      * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
> > +      * and releases mbuf, so nothing to cleanup.
> > +      */
> > +     return 0;
> > +}
> > +
> > +static int
> >  eth_link_update(struct rte_eth_dev *dev __rte_unused,
> >               int wait_to_complete __rte_unused)
> >  {
> > @@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
> >       .tx_queue_setup = eth_tx_queue_setup,
> >       .rx_queue_release = eth_queue_release,
> >       .tx_queue_release = eth_queue_release,
> > +     .tx_done_cleanup = eth_tx_done_cleanup,
> >       .link_update = eth_link_update,
> >       .stats_get = eth_stats_get,
> >       .stats_reset = eth_stats_reset,
>
> Rather than having to change every drive, why not make this the default
> behavior?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-11 19:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16 12:48 [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring Billy McFall
2016-12-16 12:48 ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
2016-12-16 16:28   ` Stephen Hemminger
2016-12-20 11:27   ` Adrien Mazarguil
2016-12-20 12:17     ` Ananyev, Konstantin
2016-12-20 12:58       ` Adrien Mazarguil
2016-12-20 14:15         ` Billy McFall
2016-12-23  9:45           ` Adrien Mazarguil
2016-12-16 12:48 ` [dpdk-dev] [PATCH 2/3] driver: e1000 igb support to free consumed buffers Billy McFall
2016-12-16 12:48 ` [dpdk-dev] [PATCH 3/3] driver: vHost " Billy McFall
2016-12-16 16:24   ` Stephen Hemminger
2017-01-11 19:54     ` Billy McFall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).