DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
@ 2015-05-27 20:12 Zoltan Kiss
  2015-05-28 10:50 ` Venkatesan, Venky
  2015-06-01 16:15 ` Zoltan Kiss
  0 siblings, 2 replies; 12+ messages in thread
From: Zoltan Kiss @ 2015-05-27 20:12 UTC (permalink / raw)
  To: dev

This check doesn't do what's required by rte_eth_tx_burst:
"When the number of previously sent packets reached the "minimum transmit
packets to free" threshold"

This can cause problems when txq->tx_free_thresh + [number of elements in the
pool] < txq->nb_tx_desc.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
---
 drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 4f9ab22..b70ed8c 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/*
 	 * Begin scanning the H/W ring for done descriptors when the
-	 * number of available descriptors drops below tx_free_thresh.  For
+	 * number of in flight descriptors reaches tx_free_thresh. For
 	 * each done descriptor, free the associated buffer.
 	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
+	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
 		ixgbe_tx_free_bufs(txq);
 
 	/* Only use descriptors that are available */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index abd10f6..f91c698 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
 		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
 
-	if (txq->nb_tx_free < txq->tx_free_thresh)
+	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
 		ixgbe_tx_free_bufs(txq);
 
 	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-05-27 20:12 [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh Zoltan Kiss
@ 2015-05-28 10:50 ` Venkatesan, Venky
  2015-05-28 11:12   ` Zoltan Kiss
  2015-06-01 16:15 ` Zoltan Kiss
  1 sibling, 1 reply; 12+ messages in thread
From: Venkatesan, Venky @ 2015-05-28 10:50 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: dev

NAK. This causes more (unsuccessful) cleanup attempts on the descriptor ring. What is motivating this change? 

Regards,
Venky


> On May 28, 2015, at 1:42 AM, Zoltan Kiss <zoltan.kiss@linaro.org> wrote:
> 
> This check doesn't do what's required by rte_eth_tx_burst:
> "When the number of previously sent packets reached the "minimum transmit
> packets to free" threshold"
> 
> This can cause problems when txq->tx_free_thresh + [number of elements in the
> pool] < txq->nb_tx_desc.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> ---
> drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 4f9ab22..b70ed8c 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> 
>    /*
>     * Begin scanning the H/W ring for done descriptors when the
> -     * number of available descriptors drops below tx_free_thresh.  For
> +     * number of in flight descriptors reaches tx_free_thresh. For
>     * each done descriptor, free the associated buffer.
>     */
> -    if (txq->nb_tx_free < txq->tx_free_thresh)
> +    if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>        ixgbe_tx_free_bufs(txq);
> 
>    /* Only use descriptors that are available */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> index abd10f6..f91c698 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>    if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>        nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> 
> -    if (txq->nb_tx_free < txq->tx_free_thresh)
> +    if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>        ixgbe_tx_free_bufs(txq);
> 
>    nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-05-28 10:50 ` Venkatesan, Venky
@ 2015-05-28 11:12   ` Zoltan Kiss
  0 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2015-05-28 11:12 UTC (permalink / raw)
  To: Venkatesan, Venky; +Cc: dev

The requirements for rte_eth_tx_burst(), which calls a driver specific 
function, in case of ixgbe, these two:

"It is the responsibility of the rte_eth_tx_burst() function to 
transparently free the memory buffers of packets previously sent. This 
feature is driven by the *tx_free_thresh* value supplied to the 
rte_eth_dev_configure() function at device configuration time. When the 
number of previously sent packets reached the "minimum transmit packets 
to free" threshold, the rte_eth_tx_burst() function must [attempt to] 
free the *rte_mbuf*  buffers of those packets whose transmission was 
effectively completed."

Also rte_eth_tx_queue_setup() uses the same description for tx_free_thresh:

"The *tx_free_thresh* value indicates the [minimum] number of network 
buffers that must be pending in the transmit ring to trigger their 
[implicit] freeing by the driver transmit function."

And all the other poll mode drivers are using this formula. Plus I've 
described a possible hang situation in the commit message.

On 28/05/15 11:50, Venkatesan, Venky wrote:
> NAK. This causes more (unsuccessful) cleanup attempts on the descriptor ring. What is motivating this change?
>
> Regards,
> Venky
>
>
>> On May 28, 2015, at 1:42 AM, Zoltan Kiss <zoltan.kiss@linaro.org> wrote:
>>
>> This check doesn't do what's required by rte_eth_tx_burst:
>> "When the number of previously sent packets reached the "minimum transmit
>> packets to free" threshold"
>>
>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>> pool] < txq->nb_tx_desc.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>> ---
>> drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index 4f9ab22..b70ed8c 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>
>>     /*
>>      * Begin scanning the H/W ring for done descriptors when the
>> -     * number of available descriptors drops below tx_free_thresh.  For
>> +     * number of in flight descriptors reaches tx_free_thresh. For
>>      * each done descriptor, free the associated buffer.
>>      */
>> -    if (txq->nb_tx_free < txq->tx_free_thresh)
>> +    if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>         ixgbe_tx_free_bufs(txq);
>>
>>     /* Only use descriptors that are available */
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> index abd10f6..f91c698 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>>     if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>>         nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>
>> -    if (txq->nb_tx_free < txq->tx_free_thresh)
>> +    if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>         ixgbe_tx_free_bufs(txq);
>>
>>     nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-05-27 20:12 [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh Zoltan Kiss
  2015-05-28 10:50 ` Venkatesan, Venky
@ 2015-06-01 16:15 ` Zoltan Kiss
  2015-06-02 13:31   ` Ananyev, Konstantin
  1 sibling, 1 reply; 12+ messages in thread
From: Zoltan Kiss @ 2015-06-01 16:15 UTC (permalink / raw)
  To: dev

Hi,

Anyone would like to review this patch? Venky sent a NAK, but I've 
explained to him why it is a bug.

Regards,

Zoltan

On 27/05/15 21:12, Zoltan Kiss wrote:
> This check doesn't do what's required by rte_eth_tx_burst:
> "When the number of previously sent packets reached the "minimum transmit
> packets to free" threshold"
>
> This can cause problems when txq->tx_free_thresh + [number of elements in the
> pool] < txq->nb_tx_desc.
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> ---
>   drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>   drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>   2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 4f9ab22..b70ed8c 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>
>   	/*
>   	 * Begin scanning the H/W ring for done descriptors when the
> -	 * number of available descriptors drops below tx_free_thresh.  For
> +	 * number of in flight descriptors reaches tx_free_thresh. For
>   	 * each done descriptor, free the associated buffer.
>   	 */
> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>   		ixgbe_tx_free_bufs(txq);
>
>   	/* Only use descriptors that are available */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> index abd10f6..f91c698 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>   	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>   		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>
> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>   		ixgbe_tx_free_bufs(txq);
>
>   	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-01 16:15 ` Zoltan Kiss
@ 2015-06-02 13:31   ` Ananyev, Konstantin
  2015-06-02 15:08     ` Zoltan Kiss
  0 siblings, 1 reply; 12+ messages in thread
From: Ananyev, Konstantin @ 2015-06-02 13:31 UTC (permalink / raw)
  To: Zoltan Kiss, dev

Hi Zoltan,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> Sent: Monday, June 01, 2015 5:16 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> 
> Hi,
> 
> Anyone would like to review this patch? Venky sent a NAK, but I've
> explained to him why it is a bug.


Well, I think Venky is right here.
Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
slowdown for TX fast-path.  
Anyway, with current PMD implementation, you can't guarantee that at any moment
TX queue wouldn't use more than tx_free_thresh mbufs.
There could be situations (low speed, or link is down for some short period, etc), when
much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
by TX path at any given moment.

Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter. 
That probably creates wrong expectations and confusion.
We might try to unify it's usage one way or another, but I personally don't see much point in it.
After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
So I think a better way would be:
1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
each driver to use what it thinks would be the best value.
2. As you suggested in another mail, introduce an new function:
uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
so further tx_burst, don't need to spend time on freeing TXDs/packets.

Konstantin


> 
> Regards,
> 
> Zoltan
> 
> On 27/05/15 21:12, Zoltan Kiss wrote:
> > This check doesn't do what's required by rte_eth_tx_burst:
> > "When the number of previously sent packets reached the "minimum transmit
> > packets to free" threshold"
> >
> > This can cause problems when txq->tx_free_thresh + [number of elements in the
> > pool] < txq->nb_tx_desc.
> >
> > Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> > ---
> >   drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> >   drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >   2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> > index 4f9ab22..b70ed8c 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >
> >   	/*
> >   	 * Begin scanning the H/W ring for done descriptors when the
> > -	 * number of available descriptors drops below tx_free_thresh.  For
> > +	 * number of in flight descriptors reaches tx_free_thresh. For
> >   	 * each done descriptor, free the associated buffer.
> >   	 */
> > -	if (txq->nb_tx_free < txq->tx_free_thresh)
> > +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >   		ixgbe_tx_free_bufs(txq);
> >
> >   	/* Only use descriptors that are available */
> > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > index abd10f6..f91c698 100644
> > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> > @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >   	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >   		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >
> > -	if (txq->nb_tx_free < txq->tx_free_thresh)
> > +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >   		ixgbe_tx_free_bufs(txq);
> >
> >   	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-02 13:31   ` Ananyev, Konstantin
@ 2015-06-02 15:08     ` Zoltan Kiss
  2015-06-02 17:35       ` Ananyev, Konstantin
  0 siblings, 1 reply; 12+ messages in thread
From: Zoltan Kiss @ 2015-06-02 15:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



On 02/06/15 14:31, Ananyev, Konstantin wrote:
> Hi Zoltan,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
>> Sent: Monday, June 01, 2015 5:16 PM
>> To: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>
>> Hi,
>>
>> Anyone would like to review this patch? Venky sent a NAK, but I've
>> explained to him why it is a bug.
>
>
> Well, I think Venky is right here.
I think the comments above rte_eth_tx_burst() definition are quite clear 
about what tx_free_thresh means, e1000 and i40e use it that way, but not 
ixgbe.

> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> slowdown for TX fast-path.
Not if the applications set tx_free_thresh according to the definition 
of this value. But we can change the default value from 32 to something 
higher, e.g I'm using nb_desc/2, and it works out well.

> Anyway, with current PMD implementation, you can't guarantee that at any moment
> TX queue wouldn't use more than tx_free_thresh mbufs.


> There could be situations (low speed, or link is down for some short period, etc), when
> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> by TX path at any given moment.
>
> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> That probably creates wrong expectations and confusion.
Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function 
doesn't.

> We might try to unify it's usage one way or another, but I personally don't see much point in it.
> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> So I think a better way would be:
> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> each driver to use what it thinks would be the best value.
But how does the driver knows what's the best for the applications 
traffic pattern? I think it's better to leave the possibility for the 
app to fine tune it.
In the meantime we can improve the default selection as well, as I 
suggested above.

> 2. As you suggested in another mail, introduce an new function:
> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> so further tx_burst, don't need to spend time on freeing TXDs/packets.
I agree.

>
> Konstantin
>
>
>>
>> Regards,
>>
>> Zoltan
>>
>> On 27/05/15 21:12, Zoltan Kiss wrote:
>>> This check doesn't do what's required by rte_eth_tx_burst:
>>> "When the number of previously sent packets reached the "minimum transmit
>>> packets to free" threshold"
>>>
>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>>> pool] < txq->nb_tx_desc.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>>> ---
>>>    drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>>>    drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>>>    2 files changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>>> index 4f9ab22..b70ed8c 100644
>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>
>>>    	/*
>>>    	 * Begin scanning the H/W ring for done descriptors when the
>>> -	 * number of available descriptors drops below tx_free_thresh.  For
>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
>>>    	 * each done descriptor, free the associated buffer.
>>>    	 */
>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>    		ixgbe_tx_free_bufs(txq);
>>>
>>>    	/* Only use descriptors that are available */
>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> index abd10f6..f91c698 100644
>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>    	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>>>    		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>>
>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>    		ixgbe_tx_free_bufs(txq);
>>>
>>>    	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-02 15:08     ` Zoltan Kiss
@ 2015-06-02 17:35       ` Ananyev, Konstantin
  2015-06-03 17:46         ` Zoltan Kiss
  0 siblings, 1 reply; 12+ messages in thread
From: Ananyev, Konstantin @ 2015-06-02 17:35 UTC (permalink / raw)
  To: Zoltan Kiss, dev



> -----Original Message-----
> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> Sent: Tuesday, June 02, 2015 4:08 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> 
> 
> 
> On 02/06/15 14:31, Ananyev, Konstantin wrote:
> > Hi Zoltan,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> >> Sent: Monday, June 01, 2015 5:16 PM
> >> To: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>
> >> Hi,
> >>
> >> Anyone would like to review this patch? Venky sent a NAK, but I've
> >> explained to him why it is a bug.
> >
> >
> > Well, I think Venky is right here.
> I think the comments above rte_eth_tx_burst() definition are quite clear
> about what tx_free_thresh means, e1000 and i40e use it that way, but not
> ixgbe.
> 
> > Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> > slowdown for TX fast-path.
> Not if the applications set tx_free_thresh according to the definition
> of this value. But we can change the default value from 32 to something
> higher, e.g I'm using nb_desc/2, and it works out well.

Sure we can, as I said below, we can unify it one way or another.
One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
(what rte_ethdev.h comments say and what full-featured TX is doing).
Though in that case we have to change default value for tx_free_thresh, and all existing apps that 
using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
(free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.

Though, I am not sure that it really worth all these changes.
>From one side, whatever tx_free_thresh would be,
the app should still assume that the worst case might happen,
and up to nb_tx_desc mbufs can be consumed by the queue.
>From other side, I think the default value should work well for most cases.
So I am still for graceful deprecation of that config parameter, see below.

> 
> > Anyway, with current PMD implementation, you can't guarantee that at any moment
> > TX queue wouldn't use more than tx_free_thresh mbufs.
> 
> 
> > There could be situations (low speed, or link is down for some short period, etc), when
> > much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> > So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> > by TX path at any given moment.
> >
> > Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> > That probably creates wrong expectations and confusion.
> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
> doesn't.
> 
> > We might try to unify it's usage one way or another, but I personally don't see much point in it.
> > After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> > So I think a better way would be:
> > 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> > each driver to use what it thinks would be the best value.
> But how does the driver knows what's the best for the applications
> traffic pattern? I think it's better to leave the possibility for the
> app to fine tune it.

My understanding is that for most cases the default value should do pretty well.
That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
and probably shouldn't be too big, to prevent unnecessary mbufs consumption
(something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).

But might be you have a good example, when such tuning is needed?
For what traffic patterns you would set tx_free_thresh to some different values,
and how will it impact performance?

Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?

Konstantin

> In the meantime we can improve the default selection as well, as I
> suggested above.
> 
> > 2. As you suggested in another mail, introduce an new function:
> > uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> > That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> > so further tx_burst, don't need to spend time on freeing TXDs/packets.
> I agree.
> 
> >
> > Konstantin
> >
> >
> >>
> >> Regards,
> >>
> >> Zoltan
> >>
> >> On 27/05/15 21:12, Zoltan Kiss wrote:
> >>> This check doesn't do what's required by rte_eth_tx_burst:
> >>> "When the number of previously sent packets reached the "minimum transmit
> >>> packets to free" threshold"
> >>>
> >>> This can cause problems when txq->tx_free_thresh + [number of elements in the
> >>> pool] < txq->nb_tx_desc.
> >>>
> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> >>> ---
> >>>    drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> >>>    drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >>>    2 files changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>> index 4f9ab22..b70ed8c 100644
> >>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>
> >>>    	/*
> >>>    	 * Begin scanning the H/W ring for done descriptors when the
> >>> -	 * number of available descriptors drops below tx_free_thresh.  For
> >>> +	 * number of in flight descriptors reaches tx_free_thresh. For
> >>>    	 * each done descriptor, free the associated buffer.
> >>>    	 */
> >>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>    		ixgbe_tx_free_bufs(txq);
> >>>
> >>>    	/* Only use descriptors that are available */
> >>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> index abd10f6..f91c698 100644
> >>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>    	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >>>    		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >>>
> >>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>    		ixgbe_tx_free_bufs(txq);
> >>>
> >>>    	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-02 17:35       ` Ananyev, Konstantin
@ 2015-06-03 17:46         ` Zoltan Kiss
  2015-06-09 11:18           ` Ananyev, Konstantin
  0 siblings, 1 reply; 12+ messages in thread
From: Zoltan Kiss @ 2015-06-03 17:46 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



On 02/06/15 18:35, Ananyev, Konstantin wrote:
>
>
>> -----Original Message-----
>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>> Sent: Tuesday, June 02, 2015 4:08 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>
>>
>>
>> On 02/06/15 14:31, Ananyev, Konstantin wrote:
>>> Hi Zoltan,
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
>>>> Sent: Monday, June 01, 2015 5:16 PM
>>>> To: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>
>>>> Hi,
>>>>
>>>> Anyone would like to review this patch? Venky sent a NAK, but I've
>>>> explained to him why it is a bug.
>>>
>>>
>>> Well, I think Venky is right here.
>> I think the comments above rte_eth_tx_burst() definition are quite clear
>> about what tx_free_thresh means, e1000 and i40e use it that way, but not
>> ixgbe.
>>
>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
>>> slowdown for TX fast-path.
>> Not if the applications set tx_free_thresh according to the definition
>> of this value. But we can change the default value from 32 to something
>> higher, e.g I'm using nb_desc/2, and it works out well.
>
> Sure we can, as I said below, we can unify it one way or another.
> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
> (what rte_ethdev.h comments say and what full-featured TX is doing).
> Though in that case we have to change default value for tx_free_thresh, and all existing apps that
> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.

They are in trouble already, because i40e and e1000 uses it as defined. 
But I guess most apps are going with 0, which sets the drivers default. 
Others have to change the value to nb_txd - curr_value to have the same 
behaviour

> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
> (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
And i40e and e1000e code as well. I don't see what difference it makes 
which way of definition you use, what I care is that it should be used 
consistently.
>
> Though, I am not sure that it really worth all these changes.
>  From one side, whatever tx_free_thresh would be,
> the app should still assume that the worst case might happen,
> and up to nb_tx_desc mbufs can be consumed by the queue.
>  From other side, I think the default value should work well for most cases.
> So I am still for graceful deprecation of that config parameter, see below.
>
>>
>>> Anyway, with current PMD implementation, you can't guarantee that at any moment
>>> TX queue wouldn't use more than tx_free_thresh mbufs.
>>
>>
>>> There could be situations (low speed, or link is down for some short period, etc), when
>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
>>> by TX path at any given moment.
>>>
>>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
>>> That probably creates wrong expectations and confusion.
>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
>> doesn't.
>>
>>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
>>> So I think a better way would be:
>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
>>> each driver to use what it thinks would be the best value.
>> But how does the driver knows what's the best for the applications
>> traffic pattern? I think it's better to leave the possibility for the
>> app to fine tune it.
>
> My understanding is that for most cases the default value should do pretty well.
> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
> and probably shouldn't be too big, to prevent unnecessary mbufs consumption
> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
I agree

>
> But might be you have a good example, when such tuning is needed?
> For what traffic patterns you would set tx_free_thresh to some different values,
> and how will it impact performance?
I don't have an actual example, but I think it's worth to keep this 
tuning option if we already have it. Most people probably wouldn't use 
it, but I can imagine that the very enthusiastic wants to try out 
different settings to find the best.
E.g. I was testing odp_l2fwd when I came across the problem, and I found 
it useful to have this option. With its traffic pattern (receive a batch 
of packets then send them out on an another interface) it can happen 
that with different clock speeds you can find different optimums.

>
> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
I think about tx_free_pkts as a rainy day option, when you want ALL TX 
completed packets to be released, because you are out of buffers. While 
tx_free_thresh is the fast path way of TX completion, when you have the 
room to wait for more packets to be gathered.

>
> Konstantin
>
>> In the meantime we can improve the default selection as well, as I
>> suggested above.
>>
>>> 2. As you suggested in another mail, introduce an new function:
>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
>>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
>> I agree.
>>
>>>
>>> Konstantin
>>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Zoltan
>>>>
>>>> On 27/05/15 21:12, Zoltan Kiss wrote:
>>>>> This check doesn't do what's required by rte_eth_tx_burst:
>>>>> "When the number of previously sent packets reached the "minimum transmit
>>>>> packets to free" threshold"
>>>>>
>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>>>>> pool] < txq->nb_tx_desc.
>>>>>
>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>>>>> ---
>>>>>     drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>>>>>     drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>>>>>     2 files changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>> index 4f9ab22..b70ed8c 100644
>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>
>>>>>     	/*
>>>>>     	 * Begin scanning the H/W ring for done descriptors when the
>>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
>>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
>>>>>     	 * each done descriptor, free the associated buffer.
>>>>>     	 */
>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>     		ixgbe_tx_free_bufs(txq);
>>>>>
>>>>>     	/* Only use descriptors that are available */
>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>> index abd10f6..f91c698 100644
>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>     	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>>>>>     		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>>>>
>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>     		ixgbe_tx_free_bufs(txq);
>>>>>
>>>>>     	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-03 17:46         ` Zoltan Kiss
@ 2015-06-09 11:18           ` Ananyev, Konstantin
  2015-06-09 15:08             ` Zoltan Kiss
  0 siblings, 1 reply; 12+ messages in thread
From: Ananyev, Konstantin @ 2015-06-09 11:18 UTC (permalink / raw)
  To: Zoltan Kiss, dev



> -----Original Message-----
> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> Sent: Wednesday, June 03, 2015 6:47 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> 
> 
> 
> On 02/06/15 18:35, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> >> Sent: Tuesday, June 02, 2015 4:08 PM
> >> To: Ananyev, Konstantin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>
> >>
> >>
> >> On 02/06/15 14:31, Ananyev, Konstantin wrote:
> >>> Hi Zoltan,
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> >>>> Sent: Monday, June 01, 2015 5:16 PM
> >>>> To: dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>
> >>>> Hi,
> >>>>
> >>>> Anyone would like to review this patch? Venky sent a NAK, but I've
> >>>> explained to him why it is a bug.
> >>>
> >>>
> >>> Well, I think Venky is right here.
> >> I think the comments above rte_eth_tx_burst() definition are quite clear
> >> about what tx_free_thresh means, e1000 and i40e use it that way, but not
> >> ixgbe.
> >>
> >>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> >>> slowdown for TX fast-path.
> >> Not if the applications set tx_free_thresh according to the definition
> >> of this value. But we can change the default value from 32 to something
> >> higher, e.g I'm using nb_desc/2, and it works out well.
> >
> > Sure we can, as I said below, we can unify it one way or another.
> > One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
> > (what rte_ethdev.h comments say and what full-featured TX is doing).
> > Though in that case we have to change default value for tx_free_thresh, and all existing apps that
> > using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
> 
> They are in trouble already, because i40e and e1000 uses it as defined.

In fact, i40e has exactly the same problem as ixgbe:
fast-path and full-featured TX  code treat  tx_free_thresh in a different way.
igb just ignores input tx_free_thresh, while em has only full featured path.

What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
(as we did in our examples in previous versions) will experience a slowdown,
if we'll make all TX functions to behave like full-featured ones
(txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).

>From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
then it  already has a possible slowdown, because of too often TXDs checking. 
So, if we'll change tx_free_thresh semantics to wht fast-path uses,
It shouldn't see any slowdown, in fact it might see some improvement.

> But I guess most apps are going with 0, which sets the drivers default.
> Others have to change the value to nb_txd - curr_value to have the same
> behaviour
> 
> > Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
> > (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
> And i40e and e1000e code as well. I don't see what difference it makes
> which way of definition you use, what I care is that it should be used
> consistently.

Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
That's why I am leaning to the fast-path way.

> >
> > Though, I am not sure that it really worth all these changes.
> >  From one side, whatever tx_free_thresh would be,
> > the app should still assume that the worst case might happen,
> > and up to nb_tx_desc mbufs can be consumed by the queue.
> >  From other side, I think the default value should work well for most cases.
> > So I am still for graceful deprecation of that config parameter, see below.
> >
> >>
> >>> Anyway, with current PMD implementation, you can't guarantee that at any moment
> >>> TX queue wouldn't use more than tx_free_thresh mbufs.
> >>
> >>
> >>> There could be situations (low speed, or link is down for some short period, etc), when
> >>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> >>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> >>> by TX path at any given moment.
> >>>
> >>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> >>> That probably creates wrong expectations and confusion.
> >> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
> >> doesn't.
> >>
> >>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
> >>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> >>> So I think a better way would be:
> >>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> >>> each driver to use what it thinks would be the best value.
> >> But how does the driver knows what's the best for the applications
> >> traffic pattern? I think it's better to leave the possibility for the
> >> app to fine tune it.
> >
> > My understanding is that for most cases the default value should do pretty well.
> > That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
> > and probably shouldn't be too big, to prevent unnecessary mbufs consumption
> > (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
> I agree
> 
> >
> > But might be you have a good example, when such tuning is needed?
> > For what traffic patterns you would set tx_free_thresh to some different values,
> > and how will it impact performance?
> I don't have an actual example, but I think it's worth to keep this
> tuning option if we already have it. Most people probably wouldn't use
> it, but I can imagine that the very enthusiastic wants to try out
> different settings to find the best.
> E.g. I was testing odp_l2fwd when I came across the problem, and I found
> it useful to have this option. With its traffic pattern (receive a batch
> of packets then send them out on an another interface) it can happen
> that with different clock speeds you can find different optimums.

Actually, after thinking about it a bit more -
I think it would more depend on how many RX/TX queues particular  lcore has to manage.
So, as you said, yes we probably better leave it, at least for now.

> 
> >
> > Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
> I think about tx_free_pkts as a rainy day option, when you want ALL TX
> completed packets to be released, because you are out of buffers.

What do you mean as 'rainy day' here?
App getting short of mbufs?
As I said before, it could be absolutely valid situation when
up to nb_tx_desc mbufs per TX queue can be in use.
So the app better be prepared for such situation anyway.
Either make sure there are enough mbufs in the pool,
or somehow gracefully degrade the service.
Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
freeing all TXDs at once could take a long time.
So I think it should be possible to specify maximum number of mbufs to free per call. 

My thought people will use it to release mbufs when there is an idle period.
Something like:

n = rx_burst(...);
if (n == 0)  { ...; tx_free_mbufs(...); ... }
else {...}

Konstantin

> While
> tx_free_thresh is the fast path way of TX completion, when you have the
> room to wait for more packets to be gathered.
> 
> >
> > Konstantin
> >
> >> In the meantime we can improve the default selection as well, as I
> >> suggested above.
> >>
> >>> 2. As you suggested in another mail, introduce an new function:
> >>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> >>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> >>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
> >> I agree.
> >>
> >>>
> >>> Konstantin
> >>>
> >>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Zoltan
> >>>>
> >>>> On 27/05/15 21:12, Zoltan Kiss wrote:
> >>>>> This check doesn't do what's required by rte_eth_tx_burst:
> >>>>> "When the number of previously sent packets reached the "minimum transmit
> >>>>> packets to free" threshold"
> >>>>>
> >>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
> >>>>> pool] < txq->nb_tx_desc.
> >>>>>
> >>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> >>>>> ---
> >>>>>     drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> >>>>>     drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >>>>>     2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> index 4f9ab22..b70ed8c 100644
> >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>
> >>>>>     	/*
> >>>>>     	 * Begin scanning the H/W ring for done descriptors when the
> >>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
> >>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
> >>>>>     	 * each done descriptor, free the associated buffer.
> >>>>>     	 */
> >>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>     		ixgbe_tx_free_bufs(txq);
> >>>>>
> >>>>>     	/* Only use descriptors that are available */
> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> index abd10f6..f91c698 100644
> >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>     	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >>>>>     		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >>>>>
> >>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>     		ixgbe_tx_free_bufs(txq);
> >>>>>
> >>>>>     	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-09 11:18           ` Ananyev, Konstantin
@ 2015-06-09 15:08             ` Zoltan Kiss
  2015-06-09 15:44               ` Ananyev, Konstantin
  0 siblings, 1 reply; 12+ messages in thread
From: Zoltan Kiss @ 2015-06-09 15:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



On 09/06/15 12:18, Ananyev, Konstantin wrote:
>
>
>> -----Original Message-----
>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>> Sent: Wednesday, June 03, 2015 6:47 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>
>>
>>
>> On 02/06/15 18:35, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>>>> Sent: Tuesday, June 02, 2015 4:08 PM
>>>> To: Ananyev, Konstantin; dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>
>>>>
>>>>
>>>> On 02/06/15 14:31, Ananyev, Konstantin wrote:
>>>>> Hi Zoltan,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
>>>>>> Sent: Monday, June 01, 2015 5:16 PM
>>>>>> To: dev@dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've
>>>>>> explained to him why it is a bug.
>>>>>
>>>>>
>>>>> Well, I think Venky is right here.
>>>> I think the comments above rte_eth_tx_burst() definition are quite clear
>>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not
>>>> ixgbe.
>>>>
>>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
>>>>> slowdown for TX fast-path.
>>>> Not if the applications set tx_free_thresh according to the definition
>>>> of this value. But we can change the default value from 32 to something
>>>> higher, e.g I'm using nb_desc/2, and it works out well.
>>>
>>> Sure we can, as I said below, we can unify it one way or another.
>>> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
>>> (what rte_ethdev.h comments say and what full-featured TX is doing).
>>> Though in that case we have to change default value for tx_free_thresh, and all existing apps that
>>> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
>>
>> They are in trouble already, because i40e and e1000 uses it as defined.
>
> In fact, i40e has exactly the same problem as ixgbe:
> fast-path and full-featured TX  code treat  tx_free_thresh in a different way.
> igb just ignores input tx_free_thresh, while em has only full featured path.
>
> What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
> (as we did in our examples in previous versions) will experience a slowdown,
> if we'll make all TX functions to behave like full-featured ones
> (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).
>
>  From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
> then it  already has a possible slowdown, because of too often TXDs checking.
> So, if we'll change tx_free_thresh semantics to wht fast-path uses,
> It shouldn't see any slowdown, in fact it might see some improvement.
>
>> But I guess most apps are going with 0, which sets the drivers default.
>> Others have to change the value to nb_txd - curr_value to have the same
>> behaviour
>>
>>> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
>>> (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
>> And i40e and e1000e code as well. I don't see what difference it makes
>> which way of definition you use, what I care is that it should be used
>> consistently.
>
> Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
> That's why I am leaning to the fast-path way.

Make sense to favour the fast-path way, I'll look into that and try to 
come up with a patch

>
>>>
>>> Though, I am not sure that it really worth all these changes.
>>>   From one side, whatever tx_free_thresh would be,
>>> the app should still assume that the worst case might happen,
>>> and up to nb_tx_desc mbufs can be consumed by the queue.
>>>   From other side, I think the default value should work well for most cases.
>>> So I am still for graceful deprecation of that config parameter, see below.
>>>
>>>>
>>>>> Anyway, with current PMD implementation, you can't guarantee that at any moment
>>>>> TX queue wouldn't use more than tx_free_thresh mbufs.
>>>>
>>>>
>>>>> There could be situations (low speed, or link is down for some short period, etc), when
>>>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
>>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
>>>>> by TX path at any given moment.
>>>>>
>>>>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
>>>>> That probably creates wrong expectations and confusion.
>>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
>>>> doesn't.
>>>>
>>>>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
>>>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
>>>>> So I think a better way would be:
>>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
>>>>> each driver to use what it thinks would be the best value.
>>>> But how does the driver knows what's the best for the applications
>>>> traffic pattern? I think it's better to leave the possibility for the
>>>> app to fine tune it.
>>>
>>> My understanding is that for most cases the default value should do pretty well.
>>> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
>>> and probably shouldn't be too big, to prevent unnecessary mbufs consumption
>>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
>> I agree
>>
>>>
>>> But might be you have a good example, when such tuning is needed?
>>> For what traffic patterns you would set tx_free_thresh to some different values,
>>> and how will it impact performance?
>> I don't have an actual example, but I think it's worth to keep this
>> tuning option if we already have it. Most people probably wouldn't use
>> it, but I can imagine that the very enthusiastic wants to try out
>> different settings to find the best.
>> E.g. I was testing odp_l2fwd when I came across the problem, and I found
>> it useful to have this option. With its traffic pattern (receive a batch
>> of packets then send them out on an another interface) it can happen
>> that with different clock speeds you can find different optimums.
>
> Actually, after thinking about it a bit more -
> I think it would more depend on how many RX/TX queues particular  lcore has to manage.
> So, as you said, yes we probably better leave it, at least for now.
>
>>
>>>
>>> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
>> I think about tx_free_pkts as a rainy day option, when you want ALL TX
>> completed packets to be released, because you are out of buffers.
>
> What do you mean as 'rainy day' here?
> App getting short of mbufs?
> As I said before, it could be absolutely valid situation when
> up to nb_tx_desc mbufs per TX queue can be in use.
> So the app better be prepared for such situation anyway.
> Either make sure there are enough mbufs in the pool,
> or somehow gracefully degrade the service.
> Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
> freeing all TXDs at once could take a long time.
> So I think it should be possible to specify maximum number of mbufs to free per call.
>
> My thought people will use it to release mbufs when there is an idle period.
> Something like:
>
> n = rx_burst(...);
> if (n == 0)  { ...; tx_free_mbufs(...); ... }
> else {...}

Yes, this is similar to what I'm doing in odp-dpdk:

https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b

The only problem is when the tx_free_threshold is not reached, the 
interfaces doesn't even start releasing the mbufs. And when you can't 
receive anything, it's likely you won't send out anything as well, which 
would normally trigger the releasing of the completed buffers.

>
> Konstantin
>
>> While
>> tx_free_thresh is the fast path way of TX completion, when you have the
>> room to wait for more packets to be gathered.
>>
>>>
>>> Konstantin
>>>
>>>> In the meantime we can improve the default selection as well, as I
>>>> suggested above.
>>>>
>>>>> 2. As you suggested in another mail, introduce an new function:
>>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
>>>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
>>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
>>>> I agree.
>>>>
>>>>>
>>>>> Konstantin
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>> On 27/05/15 21:12, Zoltan Kiss wrote:
>>>>>>> This check doesn't do what's required by rte_eth_tx_burst:
>>>>>>> "When the number of previously sent packets reached the "minimum transmit
>>>>>>> packets to free" threshold"
>>>>>>>
>>>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>>>>>>> pool] < txq->nb_tx_desc.
>>>>>>>
>>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>>>>>>> ---
>>>>>>>      drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>>>>>>>      drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>>>>>>>      2 files changed, 3 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>> index 4f9ab22..b70ed8c 100644
>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>>>
>>>>>>>      	/*
>>>>>>>      	 * Begin scanning the H/W ring for done descriptors when the
>>>>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
>>>>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
>>>>>>>      	 * each done descriptor, free the associated buffer.
>>>>>>>      	 */
>>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>>>      		ixgbe_tx_free_bufs(txq);
>>>>>>>
>>>>>>>      	/* Only use descriptors that are available */
>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>> index abd10f6..f91c698 100644
>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>>>      	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>>>>>>>      		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>>>>>>
>>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>>>      		ixgbe_tx_free_bufs(txq);
>>>>>>>
>>>>>>>      	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>>>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-09 15:08             ` Zoltan Kiss
@ 2015-06-09 15:44               ` Ananyev, Konstantin
  2015-06-09 17:46                 ` Zoltan Kiss
  0 siblings, 1 reply; 12+ messages in thread
From: Ananyev, Konstantin @ 2015-06-09 15:44 UTC (permalink / raw)
  To: Zoltan Kiss, dev



> -----Original Message-----
> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> Sent: Tuesday, June 09, 2015 4:08 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> 
> 
> 
> On 09/06/15 12:18, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> >> Sent: Wednesday, June 03, 2015 6:47 PM
> >> To: Ananyev, Konstantin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>
> >>
> >>
> >> On 02/06/15 18:35, Ananyev, Konstantin wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> >>>> Sent: Tuesday, June 02, 2015 4:08 PM
> >>>> To: Ananyev, Konstantin; dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>
> >>>>
> >>>>
> >>>> On 02/06/15 14:31, Ananyev, Konstantin wrote:
> >>>>> Hi Zoltan,
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> >>>>>> Sent: Monday, June 01, 2015 5:16 PM
> >>>>>> To: dev@dpdk.org
> >>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've
> >>>>>> explained to him why it is a bug.
> >>>>>
> >>>>>
> >>>>> Well, I think Venky is right here.
> >>>> I think the comments above rte_eth_tx_burst() definition are quite clear
> >>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not
> >>>> ixgbe.
> >>>>
> >>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> >>>>> slowdown for TX fast-path.
> >>>> Not if the applications set tx_free_thresh according to the definition
> >>>> of this value. But we can change the default value from 32 to something
> >>>> higher, e.g I'm using nb_desc/2, and it works out well.
> >>>
> >>> Sure we can, as I said below, we can unify it one way or another.
> >>> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
> >>> (what rte_ethdev.h comments say and what full-featured TX is doing).
> >>> Though in that case we have to change default value for tx_free_thresh, and all existing apps that
> >>> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
> >>
> >> They are in trouble already, because i40e and e1000 uses it as defined.
> >
> > In fact, i40e has exactly the same problem as ixgbe:
> > fast-path and full-featured TX  code treat  tx_free_thresh in a different way.
> > igb just ignores input tx_free_thresh, while em has only full featured path.
> >
> > What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
> > (as we did in our examples in previous versions) will experience a slowdown,
> > if we'll make all TX functions to behave like full-featured ones
> > (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).
> >
> >  From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
> > then it  already has a possible slowdown, because of too often TXDs checking.
> > So, if we'll change tx_free_thresh semantics to wht fast-path uses,
> > It shouldn't see any slowdown, in fact it might see some improvement.
> >
> >> But I guess most apps are going with 0, which sets the drivers default.
> >> Others have to change the value to nb_txd - curr_value to have the same
> >> behaviour
> >>
> >>> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
> >>> (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
> >> And i40e and e1000e code as well. I don't see what difference it makes
> >> which way of definition you use, what I care is that it should be used
> >> consistently.
> >
> > Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
> > That's why I am leaning to the fast-path way.
> 
> Make sense to favour the fast-path way, I'll look into that and try to
> come up with a patch
> 
> >
> >>>
> >>> Though, I am not sure that it really worth all these changes.
> >>>   From one side, whatever tx_free_thresh would be,
> >>> the app should still assume that the worst case might happen,
> >>> and up to nb_tx_desc mbufs can be consumed by the queue.
> >>>   From other side, I think the default value should work well for most cases.
> >>> So I am still for graceful deprecation of that config parameter, see below.
> >>>
> >>>>
> >>>>> Anyway, with current PMD implementation, you can't guarantee that at any moment
> >>>>> TX queue wouldn't use more than tx_free_thresh mbufs.
> >>>>
> >>>>
> >>>>> There could be situations (low speed, or link is down for some short period, etc), when
> >>>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> >>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> >>>>> by TX path at any given moment.
> >>>>>
> >>>>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> >>>>> That probably creates wrong expectations and confusion.
> >>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
> >>>> doesn't.
> >>>>
> >>>>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
> >>>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> >>>>> So I think a better way would be:
> >>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> >>>>> each driver to use what it thinks would be the best value.
> >>>> But how does the driver knows what's the best for the applications
> >>>> traffic pattern? I think it's better to leave the possibility for the
> >>>> app to fine tune it.
> >>>
> >>> My understanding is that for most cases the default value should do pretty well.
> >>> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
> >>> and probably shouldn't be too big, to prevent unnecessary mbufs consumption
> >>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
> >> I agree
> >>
> >>>
> >>> But might be you have a good example, when such tuning is needed?
> >>> For what traffic patterns you would set tx_free_thresh to some different values,
> >>> and how will it impact performance?
> >> I don't have an actual example, but I think it's worth to keep this
> >> tuning option if we already have it. Most people probably wouldn't use
> >> it, but I can imagine that the very enthusiastic wants to try out
> >> different settings to find the best.
> >> E.g. I was testing odp_l2fwd when I came across the problem, and I found
> >> it useful to have this option. With its traffic pattern (receive a batch
> >> of packets then send them out on an another interface) it can happen
> >> that with different clock speeds you can find different optimums.
> >
> > Actually, after thinking about it a bit more -
> > I think it would more depend on how many RX/TX queues particular  lcore has to manage.
> > So, as you said, yes we probably better leave it, at least for now.
> >
> >>
> >>>
> >>> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
> >> I think about tx_free_pkts as a rainy day option, when you want ALL TX
> >> completed packets to be released, because you are out of buffers.
> >
> > What do you mean as 'rainy day' here?
> > App getting short of mbufs?
> > As I said before, it could be absolutely valid situation when
> > up to nb_tx_desc mbufs per TX queue can be in use.
> > So the app better be prepared for such situation anyway.
> > Either make sure there are enough mbufs in the pool,
> > or somehow gracefully degrade the service.
> > Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
> > freeing all TXDs at once could take a long time.
> > So I think it should be possible to specify maximum number of mbufs to free per call.
> >
> > My thought people will use it to release mbufs when there is an idle period.
> > Something like:
> >
> > n = rx_burst(...);
> > if (n == 0)  { ...; tx_free_mbufs(...); ... }
> > else {...}
> 
> Yes, this is similar to what I'm doing in odp-dpdk:
> 
> https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b
> 
> The only problem is when the tx_free_threshold is not reached, the
> interfaces doesn't even start releasing the mbufs. And when you can't
> receive anything, it's likely you won't send out anything as well, which
> would normally trigger the releasing of the completed buffers.

Yep, that's why I think it is good to add this function:
If you are going to TX something, then you'll probably hit tx_free_thresh
and PMD will release unused mbufs 'automatically'.
If you have nothing to TX (idle period), you can try to release unused mbufs manually:
by calling tx_free_mbufs().

Konstantin

> 
> >
> > Konstantin
> >
> >> While
> >> tx_free_thresh is the fast path way of TX completion, when you have the
> >> room to wait for more packets to be gathered.
> >>
> >>>
> >>> Konstantin
> >>>
> >>>> In the meantime we can improve the default selection as well, as I
> >>>> suggested above.
> >>>>
> >>>>> 2. As you suggested in another mail, introduce an new function:
> >>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> >>>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> >>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
> >>>> I agree.
> >>>>
> >>>>>
> >>>>> Konstantin
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Zoltan
> >>>>>>
> >>>>>> On 27/05/15 21:12, Zoltan Kiss wrote:
> >>>>>>> This check doesn't do what's required by rte_eth_tx_burst:
> >>>>>>> "When the number of previously sent packets reached the "minimum transmit
> >>>>>>> packets to free" threshold"
> >>>>>>>
> >>>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
> >>>>>>> pool] < txq->nb_tx_desc.
> >>>>>>>
> >>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> >>>>>>> ---
> >>>>>>>      drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
> >>>>>>>      drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >>>>>>>      2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> index 4f9ab22..b70ed8c 100644
> >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>>>
> >>>>>>>      	/*
> >>>>>>>      	 * Begin scanning the H/W ring for done descriptors when the
> >>>>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
> >>>>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
> >>>>>>>      	 * each done descriptor, free the associated buffer.
> >>>>>>>      	 */
> >>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>>>      		ixgbe_tx_free_bufs(txq);
> >>>>>>>
> >>>>>>>      	/* Only use descriptors that are available */
> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> index abd10f6..f91c698 100644
> >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>>>      	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >>>>>>>      		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >>>>>>>
> >>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>>>>      		ixgbe_tx_free_bufs(txq);
> >>>>>>>
> >>>>>>>      	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >>>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
  2015-06-09 15:44               ` Ananyev, Konstantin
@ 2015-06-09 17:46                 ` Zoltan Kiss
  0 siblings, 0 replies; 12+ messages in thread
From: Zoltan Kiss @ 2015-06-09 17:46 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



On 09/06/15 16:44, Ananyev, Konstantin wrote:
>
>
>> -----Original Message-----
>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>> Sent: Tuesday, June 09, 2015 4:08 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>
>>
>>
>> On 09/06/15 12:18, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>>>> Sent: Wednesday, June 03, 2015 6:47 PM
>>>> To: Ananyev, Konstantin; dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>
>>>>
>>>>
>>>> On 02/06/15 18:35, Ananyev, Konstantin wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
>>>>>> Sent: Tuesday, June 02, 2015 4:08 PM
>>>>>> To: Ananyev, Konstantin; dev@dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 02/06/15 14:31, Ananyev, Konstantin wrote:
>>>>>>> Hi Zoltan,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
>>>>>>>> Sent: Monday, June 01, 2015 5:16 PM
>>>>>>>> To: dev@dpdk.org
>>>>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've
>>>>>>>> explained to him why it is a bug.
>>>>>>>
>>>>>>>
>>>>>>> Well, I think Venky is right here.
>>>>>> I think the comments above rte_eth_tx_burst() definition are quite clear
>>>>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not
>>>>>> ixgbe.
>>>>>>
>>>>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
>>>>>>> slowdown for TX fast-path.
>>>>>> Not if the applications set tx_free_thresh according to the definition
>>>>>> of this value. But we can change the default value from 32 to something
>>>>>> higher, e.g I'm using nb_desc/2, and it works out well.
>>>>>
>>>>> Sure we can, as I said below, we can unify it one way or another.
>>>>> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
>>>>> (what rte_ethdev.h comments say and what full-featured TX is doing).
>>>>> Though in that case we have to change default value for tx_free_thresh, and all existing apps that
>>>>> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
>>>>
>>>> They are in trouble already, because i40e and e1000 uses it as defined.
>>>
>>> In fact, i40e has exactly the same problem as ixgbe:
>>> fast-path and full-featured TX  code treat  tx_free_thresh in a different way.
>>> igb just ignores input tx_free_thresh, while em has only full featured path.
>>>
>>> What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
>>> (as we did in our examples in previous versions) will experience a slowdown,
>>> if we'll make all TX functions to behave like full-featured ones
>>> (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).
>>>
>>>   From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
>>> then it  already has a possible slowdown, because of too often TXDs checking.
>>> So, if we'll change tx_free_thresh semantics to wht fast-path uses,
>>> It shouldn't see any slowdown, in fact it might see some improvement.
>>>
>>>> But I guess most apps are going with 0, which sets the drivers default.
>>>> Others have to change the value to nb_txd - curr_value to have the same
>>>> behaviour
>>>>
>>>>> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
>>>>> (free TXDs when number of free TXDs drops below  tx_free_thresh) and update  rte_ethdev.h comments.
>>>> And i40e and e1000e code as well. I don't see what difference it makes
>>>> which way of definition you use, what I care is that it should be used
>>>> consistently.
>>>
>>> Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
>>> That's why I am leaning to the fast-path way.
>>
>> Make sense to favour the fast-path way, I'll look into that and try to
>> come up with a patch
>>
>>>
>>>>>
>>>>> Though, I am not sure that it really worth all these changes.
>>>>>    From one side, whatever tx_free_thresh would be,
>>>>> the app should still assume that the worst case might happen,
>>>>> and up to nb_tx_desc mbufs can be consumed by the queue.
>>>>>    From other side, I think the default value should work well for most cases.
>>>>> So I am still for graceful deprecation of that config parameter, see below.
>>>>>
>>>>>>
>>>>>>> Anyway, with current PMD implementation, you can't guarantee that at any moment
>>>>>>> TX queue wouldn't use more than tx_free_thresh mbufs.
>>>>>>
>>>>>>
>>>>>>> There could be situations (low speed, or link is down for some short period, etc), when
>>>>>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
>>>>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
>>>>>>> by TX path at any given moment.
>>>>>>>
>>>>>>> Though yes,  there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
>>>>>>> That probably creates wrong expectations and confusion.
>>>>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
>>>>>> doesn't.
>>>>>>
>>>>>>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
>>>>>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
>>>>>>> So I think a better way would be:
>>>>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
>>>>>>> each driver to use what it thinks would be the best value.
>>>>>> But how does the driver knows what's the best for the applications
>>>>>> traffic pattern? I think it's better to leave the possibility for the
>>>>>> app to fine tune it.
>>>>>
>>>>> My understanding is that for most cases the default value should do pretty well.
>>>>> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
>>>>> and probably shouldn't be too big, to prevent unnecessary mbufs consumption
>>>>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
>>>> I agree
>>>>
>>>>>
>>>>> But might be you have a good example, when such tuning is needed?
>>>>> For what traffic patterns you would set tx_free_thresh to some different values,
>>>>> and how will it impact performance?
>>>> I don't have an actual example, but I think it's worth to keep this
>>>> tuning option if we already have it. Most people probably wouldn't use
>>>> it, but I can imagine that the very enthusiastic wants to try out
>>>> different settings to find the best.
>>>> E.g. I was testing odp_l2fwd when I came across the problem, and I found
>>>> it useful to have this option. With its traffic pattern (receive a batch
>>>> of packets then send them out on an another interface) it can happen
>>>> that with different clock speeds you can find different optimums.
>>>
>>> Actually, after thinking about it a bit more -
>>> I think it would more depend on how many RX/TX queues particular  lcore has to manage.
>>> So, as you said, yes we probably better leave it, at least for now.
>>>
>>>>
>>>>>
>>>>> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
>>>> I think about tx_free_pkts as a rainy day option, when you want ALL TX
>>>> completed packets to be released, because you are out of buffers.
>>>
>>> What do you mean as 'rainy day' here?
>>> App getting short of mbufs?
>>> As I said before, it could be absolutely valid situation when
>>> up to nb_tx_desc mbufs per TX queue can be in use.
>>> So the app better be prepared for such situation anyway.
>>> Either make sure there are enough mbufs in the pool,
>>> or somehow gracefully degrade the service.
>>> Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
>>> freeing all TXDs at once could take a long time.
>>> So I think it should be possible to specify maximum number of mbufs to free per call.
>>>
>>> My thought people will use it to release mbufs when there is an idle period.
>>> Something like:
>>>
>>> n = rx_burst(...);
>>> if (n == 0)  { ...; tx_free_mbufs(...); ... }
>>> else {...}
>>
>> Yes, this is similar to what I'm doing in odp-dpdk:
>>
>> https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b
>>
>> The only problem is when the tx_free_threshold is not reached, the
>> interfaces doesn't even start releasing the mbufs. And when you can't
>> receive anything, it's likely you won't send out anything as well, which
>> would normally trigger the releasing of the completed buffers.
>
> Yep, that's why I think it is good to add this function:
> If you are going to TX something, then you'll probably hit tx_free_thresh
> and PMD will release unused mbufs 'automatically'.
> If you have nothing to TX (idle period), you can try to release unused mbufs manually:
> by calling tx_free_mbufs().

Ok, so we are on the same page on this. I just wanted to emphasize, that 
both this new function and tx_free_tresh is important: the former is 
when you either need the buffers or you just have spare cycles to do the 
work. The latter is for the usual TX path, when you want to do TX 
completion in the biggest possible batches.

>
> Konstantin
>
>>
>>>
>>> Konstantin
>>>
>>>> While
>>>> tx_free_thresh is the fast path way of TX completion, when you have the
>>>> room to wait for more packets to be gathered.
>>>>
>>>>>
>>>>> Konstantin
>>>>>
>>>>>> In the meantime we can improve the default selection as well, as I
>>>>>> suggested above.
>>>>>>
>>>>>>> 2. As you suggested in another mail, introduce an new function:
>>>>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
>>>>>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
>>>>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
>>>>>> I agree.
>>>>>>
>>>>>>>
>>>>>>> Konstantin
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>> On 27/05/15 21:12, Zoltan Kiss wrote:
>>>>>>>>> This check doesn't do what's required by rte_eth_tx_burst:
>>>>>>>>> "When the number of previously sent packets reached the "minimum transmit
>>>>>>>>> packets to free" threshold"
>>>>>>>>>
>>>>>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
>>>>>>>>> pool] < txq->nb_tx_desc.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
>>>>>>>>> ---
>>>>>>>>>       drivers/net/ixgbe/ixgbe_rxtx.c     | 4 ++--
>>>>>>>>>       drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
>>>>>>>>>       2 files changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>>>> index 4f9ab22..b70ed8c 100644
>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>>>>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>>>>>
>>>>>>>>>       	/*
>>>>>>>>>       	 * Begin scanning the H/W ring for done descriptors when the
>>>>>>>>> -	 * number of available descriptors drops below tx_free_thresh.  For
>>>>>>>>> +	 * number of in flight descriptors reaches tx_free_thresh. For
>>>>>>>>>       	 * each done descriptor, free the associated buffer.
>>>>>>>>>       	 */
>>>>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>>>>>       		ixgbe_tx_free_bufs(txq);
>>>>>>>>>
>>>>>>>>>       	/* Only use descriptors that are available */
>>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>>>> index abd10f6..f91c698 100644
>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>>>>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>>>>>>       	if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
>>>>>>>>>       		nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
>>>>>>>>>
>>>>>>>>> -	if (txq->nb_tx_free < txq->tx_free_thresh)
>>>>>>>>> +	if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
>>>>>>>>>       		ixgbe_tx_free_bufs(txq);
>>>>>>>>>
>>>>>>>>>       	nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>>>>>>>>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-06-09 17:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27 20:12 [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh Zoltan Kiss
2015-05-28 10:50 ` Venkatesan, Venky
2015-05-28 11:12   ` Zoltan Kiss
2015-06-01 16:15 ` Zoltan Kiss
2015-06-02 13:31   ` Ananyev, Konstantin
2015-06-02 15:08     ` Zoltan Kiss
2015-06-02 17:35       ` Ananyev, Konstantin
2015-06-03 17:46         ` Zoltan Kiss
2015-06-09 11:18           ` Ananyev, Konstantin
2015-06-09 15:08             ` Zoltan Kiss
2015-06-09 15:44               ` Ananyev, Konstantin
2015-06-09 17:46                 ` Zoltan Kiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).