From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bmcfall@redhat.com>
Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com
 [209.85.218.48]) by dpdk.org (Postfix) with ESMTP id AFF34FB92
 for <dev@dpdk.org>; Tue, 20 Dec 2016 15:15:51 +0100 (CET)
Received: by mail-oi0-f48.google.com with SMTP id v84so178307749oie.3
 for <dev@dpdk.org>; Tue, 20 Dec 2016 06:15:51 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=i1U1USCIUMUgmvo51umRwUat2RWSY4N+XvI7LCyEUMw=;
 b=KOQ+aE14XqrxA9SwROI109Su0j2oMgbAEdqSrIPGl/EpZksL+ddRRnA9JcGfcfVJ9T
 V3ApygzOIC8KBTfXhRjV57DlBRbVXbwAU3oTAfzi7ARVfoR4ElgVa5Pg5/738VM8Eg51
 tnyuzbmLbIkXWjZ3zZaU1lpT5YX3cKKi5bsG7W3y8pmU0sBm1vyo380bvXvJI/rCla+r
 r0OkR3JW1Le5jp2aqgjGXLF9jPU+ClJ8cKjWucJ/R+q13iSPx8/qSQQgX9ZxiCZeFEMF
 OIPHNsyIHUJMX6D5hMe2sF9QTSzNg36meSO0X6XeDi/AMGzZyH/ltqlPs/gSBU1PXAAx
 +J5w==
X-Gm-Message-State: AIkVDXIf54ArJivssmUXJCzMayhPJp8NKQo8iUGwkcGzg91lyeOk5MoEwnMkv33r+nUwzYtjXAvc0sH174Z9rtLj
X-Received: by 10.202.213.13 with SMTP id m13mr12694390oig.104.1482243351065; 
 Tue, 20 Dec 2016 06:15:51 -0800 (PST)
MIME-Version: 1.0
Received: by 10.202.75.212 with HTTP; Tue, 20 Dec 2016 06:15:50 -0800 (PST)
In-Reply-To: <20161220125823.GU10340@6wind.com>
References: <20161216124851.2640-1-bmcfall@redhat.com>
 <20161216124851.2640-2-bmcfall@redhat.com>
 <20161220112746.GT10340@6wind.com>
 <2601191342CEEE43887BDE71AB9772583F0F2D8A@irsmsx105.ger.corp.intel.com>
 <20161220125823.GU10340@6wind.com>
From: Billy McFall <bmcfall@redhat.com>
Date: Tue, 20 Dec 2016 09:15:50 -0500
Message-ID: <CAKLkqD4EJcb1=3AinYZd4N9uwvqqxi8YQdoxV1CbG12eeneSTQ@mail.gmail.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, 
 "thomas.monjalon@6wind.com" <thomas.monjalon@6wind.com>, "Lu,
 Wenzhuo" <wenzhuo.lu@intel.com>, 
 "dev@dpdk.org" <dev@dpdk.org>, Stephen Hemminger <stephen@networkplumber.org>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers
 in TX ring
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Dec 2016 14:15:52 -0000

Thank you for your responses, see inline.

On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
>>
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
>> > Sent: Tuesday, December 20, 2016 11:28 AM
>> > To: Billy McFall <bmcfall@redhat.com>
>> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
>> > <stephen@networkplumber.org>
>> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
>> >
>> > Hi Billy,
>> >
>> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
>> > > Add a new API to force free consumed buffers on TX ring. API will return
>> > > the number of packets freed (0-n) or error code if feature not supported
>> > > (-ENOTSUP) or input invalid (-ENODEV).
>> > >
>> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
>> > > in local buffer, the API also accepts *buffer and *sent. Before
>> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
>> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
>> > > if threshold is not met.
>> > >
>> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
>> > > ---
>> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 56 insertions(+)
>> > >
>> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> > > index 9678179..e3f2be4 100644
>> > > --- a/lib/librte_ether/rte_ethdev.h
>> > > +++ b/lib/librte_ether/rte_ethdev.h
>> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>> > >  /**< @internal Check DD bit of specific RX descriptor */
>> > >
>> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
>> > > +/**< @internal Force mbufs to be from TX ring. */
>> > > +
>> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>> > >
>> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
>> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
>> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
>> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
>> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
>> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
>> > >  }
>> > >
>> > >  /**
>> > > + * Request the driver to free mbufs currently cached by the driver. The
>> > > + * driver will only free the mbuf if it is no longer in use.
>> > > + *
>> > > + * @param port_id
>> > > + *   The port identifier of the Ethernet device.
>> > > + * @param queue_id
>> > > + *   The index of the transmit queue through which output packets must be
>> > > + *   sent.
>> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
>> > > + *   to rte_eth_dev_configure().
>> > > + * @param free_cnt
>> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
>> > > + *   should be freed. Note that a packet may be using multiple mbufs.
>> > > + * @param buffer
>> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
>> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
>> > > + *   if buffer has already been flushed.
>> > > + * @param sent
>> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
>> > > + *   If *buffer is supplied, *sent must also be supplied.
>> > > + * @return
>> > > + *   Failure: < 0
>> > > + *     -ENODEV: Invalid interface
>> > > + *     -ENOTSUP: Driver does not support function
>> > > + *   Success: >= 0
>> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
>> > > + *     are in use.
>> > > + */
>> > > +
>> > > +static inline int
>> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
>> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
>> > > +{
>> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> > > +
>> > > + /* Validate Input Data. Bail if not valid or not supported. */
>> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
>> > > +
>> > > + /*
>> > > +  * If transmit buffer is provided and there are still packets to be
>> > > +  * sent, then send them before attempting to free pending mbufs.
>> > > +  */
>> > > + if (buffer && sent)
>> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
>> > > +
>> > > + /* Call driver to free pending mbufs. */
>> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
>> > > +                 free_cnt);
>> > > +}
>> > > +
>> > > +/**
>> > >   * Configure a callback for buffered packets which cannot be sent
>> > >   *
>> > >   * Register a specific callback to be called when an attempt is made to send
>> >

I will remove the buffer/sent parameters. It will be the applications
responsibility
to make sure rte_eth_tx_buffer_flush() is called.

I don't feel strongly about the free_cnt parameter. It was in the
original request
so that if there was a large ring buffer, the API could bail early
without having
to go through all the entire ring. It might be a little unrealistic
for the application
to truly know how many mbufs it wants freed. Also, as an example, the I40e
driver already has a i40e_tx_free_bufs(...) function, so by dropping
the free_cnt
parameter, this function could be reused without having to account for
the free_cnt.

>> > Just a thought to follow-up on Stephen's comment to further simplify this
>> > API, how about not adding any new eth_dev_ops but instead defining what
>> > should happen during an empty TX burst call (tx_burst() with 0 packets).
>> >

In the original API request thread, see dpdk-dev mailing list from 11/21/2016
with subject "Adding API to force freeing consumed buffers in TX ring",
overloading the existing API with nb_pkts == 0 was suggested and consensus
was to go with new API. I lean towards a new API since this is a special case
most applications won't use, but I will go with the community on whether to
enhance the existing burst functionality or add a new API.

>> > Several PMDs already have a check for this scenario and start by cleaning up
>> > completed packets anyway, they effectively partially implement this
>> > definition for free already.
>>
>> Many PMDs  start by cleaning up only when number of free entries
>> drop below some point.

True, but the original request for this API was for the scenario where packets
are being flooded and the application wanted to reuse mbuf to avoid a packet
copy. So the API was to request the driver to free "done" mbufs outside of any
threshold.

>> Also in that case the author would have to modify (and test) all existing TX routinies.
>> So I think a separate API call seems more plausible.
>
> Not necessarily, as I understand this API in its current form only suggests
> that a PMD should release a few mbufs from a queue if possible, without any
> guarantee, PMDs are not forced to comply.
>
> I think the threshold you mention is a valid reason not to release them, and
> it wouldn't change a thing to existing tx_burst() implementations in the
> meantime (only documentation).
>
> This threshold could also be bypassed rather painlessly in the
> "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> way or another.
>
>> Though I am agree with previous comment from Stephen that last two parameters
>> are redundant and would just overcomplicate things.
>> tin
>>
>> >
>> > The main difference with this API would be that you wouldn't know how many
>> > mbufs were freed and wouldn't collect them into an array. However most
>> > applications have one mbuf pool and/or know where they come from, so they
>> > can just query the pool or attempt to re-allocate from it after doing empty
>> > bursts in case of starvation.
>> >
>> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
>
> --
> Adrien Mazarguil
> 6WIND