From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Zoltan Kiss <zoltan.kiss@linaro.org>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
Date: Tue, 9 Jun 2015 11:18:12 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <556F3D80.3070904@linaro.org>
> -----Original Message-----
> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> Sent: Wednesday, June 03, 2015 6:47 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
>
>
>
> On 02/06/15 18:35, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org]
> >> Sent: Tuesday, June 02, 2015 4:08 PM
> >> To: Ananyev, Konstantin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>
> >>
> >>
> >> On 02/06/15 14:31, Ananyev, Konstantin wrote:
> >>> Hi Zoltan,
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> >>>> Sent: Monday, June 01, 2015 5:16 PM
> >>>> To: dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh
> >>>>
> >>>> Hi,
> >>>>
> >>>> Anyone would like to review this patch? Venky sent a NAK, but I've
> >>>> explained to him why it is a bug.
> >>>
> >>>
> >>> Well, I think Venky is right here.
> >> I think the comments above rte_eth_tx_burst() definition are quite clear
> >> about what tx_free_thresh means, e1000 and i40e use it that way, but not
> >> ixgbe.
> >>
> >>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a
> >>> slowdown for TX fast-path.
> >> Not if the applications set tx_free_thresh according to the definition
> >> of this value. But we can change the default value from 32 to something
> >> higher, e.g I'm using nb_desc/2, and it works out well.
> >
> > Sure we can, as I said below, we can unify it one way or another.
> > One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh
> > (what rte_ethdev.h comments say and what full-featured TX is doing).
> > Though in that case we have to change default value for tx_free_thresh, and all existing apps that
> > using tx_free_thresh==32 and fast-path TX will probably experience a slowdown.
>
> They are in trouble already, because i40e and e1000 uses it as defined.
In fact, i40e has exactly the same problem as ixgbe:
fast-path and full-featured TX code treat tx_free_thresh in a different way.
igb just ignores input tx_free_thresh, while em has only full featured path.
What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32
(as we did in our examples in previous versions) will experience a slowdown,
if we'll make all TX functions to behave like full-featured ones
(txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh).
>From other side, if app uses TX full-featured TX and sets tx_free_thresh=32,
then it already has a possible slowdown, because of too often TXDs checking.
So, if we'll change tx_free_thresh semantics to wht fast-path uses,
It shouldn't see any slowdown, in fact it might see some improvement.
> But I guess most apps are going with 0, which sets the drivers default.
> Others have to change the value to nb_txd - curr_value to have the same
> behaviour
>
> > Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do
> > (free TXDs when number of free TXDs drops below tx_free_thresh) and update rte_ethdev.h comments.
> And i40e and e1000e code as well. I don't see what difference it makes
> which way of definition you use, what I care is that it should be used
> consistently.
Yes, both ways are possible, the concern is - how to minimise the impact for existing apps.
That's why I am leaning to the fast-path way.
> >
> > Though, I am not sure that it really worth all these changes.
> > From one side, whatever tx_free_thresh would be,
> > the app should still assume that the worst case might happen,
> > and up to nb_tx_desc mbufs can be consumed by the queue.
> > From other side, I think the default value should work well for most cases.
> > So I am still for graceful deprecation of that config parameter, see below.
> >
> >>
> >>> Anyway, with current PMD implementation, you can't guarantee that at any moment
> >>> TX queue wouldn't use more than tx_free_thresh mbufs.
> >>
> >>
> >>> There could be situations (low speed, or link is down for some short period, etc), when
> >>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now.
> >>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use
> >>> by TX path at any given moment.
> >>>
> >>> Though yes, there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter.
> >>> That probably creates wrong expectations and confusion.
> >> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function
> >> doesn't.
> >>
> >>> We might try to unify it's usage one way or another, but I personally don't see much point in it.
> >>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters).
> >>> So I think a better way would be:
> >>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make
> >>> each driver to use what it thinks would be the best value.
> >> But how does the driver knows what's the best for the applications
> >> traffic pattern? I think it's better to leave the possibility for the
> >> app to fine tune it.
> >
> > My understanding is that for most cases the default value should do pretty well.
> > That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks,
> > and probably shouldn't be too big, to prevent unnecessary mbufs consumption
> > (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably).
> I agree
>
> >
> > But might be you have a good example, when such tuning is needed?
> > For what traffic patterns you would set tx_free_thresh to some different values,
> > and how will it impact performance?
> I don't have an actual example, but I think it's worth to keep this
> tuning option if we already have it. Most people probably wouldn't use
> it, but I can imagine that the very enthusiastic wants to try out
> different settings to find the best.
> E.g. I was testing odp_l2fwd when I came across the problem, and I found
> it useful to have this option. With its traffic pattern (receive a batch
> of packets then send them out on an another interface) it can happen
> that with different clock speeds you can find different optimums.
Actually, after thinking about it a bit more -
I think it would more depend on how many RX/TX queues particular lcore has to manage.
So, as you said, yes we probably better leave it, at least for now.
>
> >
> > Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh?
> I think about tx_free_pkts as a rainy day option, when you want ALL TX
> completed packets to be released, because you are out of buffers.
What do you mean as 'rainy day' here?
App getting short of mbufs?
As I said before, it could be absolutely valid situation when
up to nb_tx_desc mbufs per TX queue can be in use.
So the app better be prepared for such situation anyway.
Either make sure there are enough mbufs in the pool,
or somehow gracefully degrade the service.
Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K),
freeing all TXDs at once could take a long time.
So I think it should be possible to specify maximum number of mbufs to free per call.
My thought people will use it to release mbufs when there is an idle period.
Something like:
n = rx_burst(...);
if (n == 0) { ...; tx_free_mbufs(...); ... }
else {...}
Konstantin
> While
> tx_free_thresh is the fast path way of TX completion, when you have the
> room to wait for more packets to be gathered.
>
> >
> > Konstantin
> >
> >> In the meantime we can improve the default selection as well, as I
> >> suggested above.
> >>
> >>> 2. As you suggested in another mail, introduce an new function:
> >>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free).
> >>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time,
> >>> so further tx_burst, don't need to spend time on freeing TXDs/packets.
> >> I agree.
> >>
> >>>
> >>> Konstantin
> >>>
> >>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Zoltan
> >>>>
> >>>> On 27/05/15 21:12, Zoltan Kiss wrote:
> >>>>> This check doesn't do what's required by rte_eth_tx_burst:
> >>>>> "When the number of previously sent packets reached the "minimum transmit
> >>>>> packets to free" threshold"
> >>>>>
> >>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the
> >>>>> pool] < txq->nb_tx_desc.
> >>>>>
> >>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> >>>>> ---
> >>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++--
> >>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +-
> >>>>> 2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> index 4f9ab22..b70ed8c 100644
> >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>>
> >>>>> /*
> >>>>> * Begin scanning the H/W ring for done descriptors when the
> >>>>> - * number of available descriptors drops below tx_free_thresh. For
> >>>>> + * number of in flight descriptors reaches tx_free_thresh. For
> >>>>> * each done descriptor, free the associated buffer.
> >>>>> */
> >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>> ixgbe_tx_free_bufs(txq);
> >>>>>
> >>>>> /* Only use descriptors that are available */
> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> index abd10f6..f91c698 100644
> >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST))
> >>>>> nb_pkts = RTE_IXGBE_VPMD_TX_BURST;
> >>>>>
> >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh)
> >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh)
> >>>>> ixgbe_tx_free_bufs(txq);
> >>>>>
> >>>>> nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
> >>>>>
next prev parent reply other threads:[~2015-06-09 11:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-27 20:12 Zoltan Kiss
2015-05-28 10:50 ` Venkatesan, Venky
2015-05-28 11:12 ` Zoltan Kiss
2015-06-01 16:15 ` Zoltan Kiss
2015-06-02 13:31 ` Ananyev, Konstantin
2015-06-02 15:08 ` Zoltan Kiss
2015-06-02 17:35 ` Ananyev, Konstantin
2015-06-03 17:46 ` Zoltan Kiss
2015-06-09 11:18 ` Ananyev, Konstantin [this message]
2015-06-09 15:08 ` Zoltan Kiss
2015-06-09 15:44 ` Ananyev, Konstantin
2015-06-09 17:46 ` Zoltan Kiss
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=dev@dpdk.org \
--cc=zoltan.kiss@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).