From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by dpdk.org (Postfix) with ESMTP id 97F2C5A53 for ; Tue, 9 Jun 2015 17:08:25 +0200 (CEST) Received: by wiga1 with SMTP id a1so20709713wig.0 for ; Tue, 09 Jun 2015 08:08:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=KHPAZz9ZxCtcQWtubbzEV99CyndBwCwAaVNn+8oCb7k=; b=WmWhSmZtXJKnI2BBYr484pyrWcI649cxHpezc86wwNM58zYRTes1ztaEQCjfsU9XER CqNSub8Ig6mjC72XDuzzywm9I00jYyrOjj8YIpmv66BdIt9NuO7aBSXzhfLlPJ4K7qdE dqf5TTUoR3ukYwqSaIBu55YWoFHgVYT6i3XP1/luJO7Qk8KKspl5hVh3yeSA6NEzwUpZ CJhXzfqAqRAJxLdYnxPl+h3M5q4XnQmywGuOKok3MKFdDXx9TtRqh5DvUYTS+mhuCLu3 6rY7wQIdhYeGxxZKmnzQWP0ZZ4O5Ta42OaXxrp+3oF+SWqjlpleS7tOdo86aSbr/WcYv 0dyg== X-Gm-Message-State: ALoCoQlKO88t4eQSp2/RIjCOimYhb7jHBPxjxCS8Zbq97Pk2oHZOH8YPnRFoOy/anwODz1ecmWDh X-Received: by 10.194.57.211 with SMTP id k19mr43257405wjq.53.1433862505235; Tue, 09 Jun 2015 08:08:25 -0700 (PDT) Received: from [192.168.0.101] ([90.152.119.35]) by mx.google.com with ESMTPSA id vy5sm9892598wjc.33.2015.06.09.08.08.23 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Jun 2015 08:08:24 -0700 (PDT) Message-ID: <55770167.6080701@linaro.org> Date: Tue, 09 Jun 2015 16:08:23 +0100 From: Zoltan Kiss User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "dev@dpdk.org" References: <1432757539-8544-1-git-send-email-zoltan.kiss@linaro.org> <556C853E.8090902@linaro.org> <2601191342CEEE43887BDE71AB977258214346AE@irsmsx105.ger.corp.intel.com> <556DC6D9.3060008@linaro.org> <2601191342CEEE43887BDE71AB977258214348AA@irsmsx105.ger.corp.intel.com> <556F3D80.3070904@linaro.org> <2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jun 2015 15:08:25 -0000 On 09/06/15 12:18, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] >> Sent: Wednesday, June 03, 2015 6:47 PM >> To: Ananyev, Konstantin; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >> >> >> >> On 02/06/15 18:35, Ananyev, Konstantin wrote: >>> >>> >>>> -----Original Message----- >>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] >>>> Sent: Tuesday, June 02, 2015 4:08 PM >>>> To: Ananyev, Konstantin; dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >>>> >>>> >>>> >>>> On 02/06/15 14:31, Ananyev, Konstantin wrote: >>>>> Hi Zoltan, >>>>> >>>>>> -----Original Message----- >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss >>>>>> Sent: Monday, June 01, 2015 5:16 PM >>>>>> To: dev@dpdk.org >>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >>>>>> >>>>>> Hi, >>>>>> >>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've >>>>>> explained to him why it is a bug. >>>>> >>>>> >>>>> Well, I think Venky is right here. >>>> I think the comments above rte_eth_tx_burst() definition are quite clear >>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not >>>> ixgbe. >>>> >>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a >>>>> slowdown for TX fast-path. >>>> Not if the applications set tx_free_thresh according to the definition >>>> of this value. But we can change the default value from 32 to something >>>> higher, e.g I'm using nb_desc/2, and it works out well. >>> >>> Sure we can, as I said below, we can unify it one way or another. >>> One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh >>> (what rte_ethdev.h comments say and what full-featured TX is doing). >>> Though in that case we have to change default value for tx_free_thresh, and all existing apps that >>> using tx_free_thresh==32 and fast-path TX will probably experience a slowdown. >> >> They are in trouble already, because i40e and e1000 uses it as defined. > > In fact, i40e has exactly the same problem as ixgbe: > fast-path and full-featured TX code treat tx_free_thresh in a different way. > igb just ignores input tx_free_thresh, while em has only full featured path. > > What I am saying, existing app that uses TX fast-path and sets tx_free_thresh=32 > (as we did in our examples in previous versions) will experience a slowdown, > if we'll make all TX functions to behave like full-featured ones > (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh). > > From other side, if app uses TX full-featured TX and sets tx_free_thresh=32, > then it already has a possible slowdown, because of too often TXDs checking. > So, if we'll change tx_free_thresh semantics to wht fast-path uses, > It shouldn't see any slowdown, in fact it might see some improvement. > >> But I guess most apps are going with 0, which sets the drivers default. >> Others have to change the value to nb_txd - curr_value to have the same >> behaviour >> >>> Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do >>> (free TXDs when number of free TXDs drops below tx_free_thresh) and update rte_ethdev.h comments. >> And i40e and e1000e code as well. I don't see what difference it makes >> which way of definition you use, what I care is that it should be used >> consistently. > > Yes, both ways are possible, the concern is - how to minimise the impact for existing apps. > That's why I am leaning to the fast-path way. Make sense to favour the fast-path way, I'll look into that and try to come up with a patch > >>> >>> Though, I am not sure that it really worth all these changes. >>> From one side, whatever tx_free_thresh would be, >>> the app should still assume that the worst case might happen, >>> and up to nb_tx_desc mbufs can be consumed by the queue. >>> From other side, I think the default value should work well for most cases. >>> So I am still for graceful deprecation of that config parameter, see below. >>> >>>> >>>>> Anyway, with current PMD implementation, you can't guarantee that at any moment >>>>> TX queue wouldn't use more than tx_free_thresh mbufs. >>>> >>>> >>>>> There could be situations (low speed, or link is down for some short period, etc), when >>>>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now. >>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use >>>>> by TX path at any given moment. >>>>> >>>>> Though yes, there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter. >>>>> That probably creates wrong expectations and confusion. >>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function >>>> doesn't. >>>> >>>>> We might try to unify it's usage one way or another, but I personally don't see much point in it. >>>>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters). >>>>> So I think a better way would be: >>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make >>>>> each driver to use what it thinks would be the best value. >>>> But how does the driver knows what's the best for the applications >>>> traffic pattern? I think it's better to leave the possibility for the >>>> app to fine tune it. >>> >>> My understanding is that for most cases the default value should do pretty well. >>> That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks, >>> and probably shouldn't be too big, to prevent unnecessary mbufs consumption >>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably). >> I agree >> >>> >>> But might be you have a good example, when such tuning is needed? >>> For what traffic patterns you would set tx_free_thresh to some different values, >>> and how will it impact performance? >> I don't have an actual example, but I think it's worth to keep this >> tuning option if we already have it. Most people probably wouldn't use >> it, but I can imagine that the very enthusiastic wants to try out >> different settings to find the best. >> E.g. I was testing odp_l2fwd when I came across the problem, and I found >> it useful to have this option. With its traffic pattern (receive a batch >> of packets then send them out on an another interface) it can happen >> that with different clock speeds you can find different optimums. > > Actually, after thinking about it a bit more - > I think it would more depend on how many RX/TX queues particular lcore has to manage. > So, as you said, yes we probably better leave it, at least for now. > >> >>> >>> Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh? >> I think about tx_free_pkts as a rainy day option, when you want ALL TX >> completed packets to be released, because you are out of buffers. > > What do you mean as 'rainy day' here? > App getting short of mbufs? > As I said before, it could be absolutely valid situation when > up to nb_tx_desc mbufs per TX queue can be in use. > So the app better be prepared for such situation anyway. > Either make sure there are enough mbufs in the pool, > or somehow gracefully degrade the service. > Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K), > freeing all TXDs at once could take a long time. > So I think it should be possible to specify maximum number of mbufs to free per call. > > My thought people will use it to release mbufs when there is an idle period. > Something like: > > n = rx_burst(...); > if (n == 0) { ...; tx_free_mbufs(...); ... } > else {...} Yes, this is similar to what I'm doing in odp-dpdk: https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b The only problem is when the tx_free_threshold is not reached, the interfaces doesn't even start releasing the mbufs. And when you can't receive anything, it's likely you won't send out anything as well, which would normally trigger the releasing of the completed buffers. > > Konstantin > >> While >> tx_free_thresh is the fast path way of TX completion, when you have the >> room to wait for more packets to be gathered. >> >>> >>> Konstantin >>> >>>> In the meantime we can improve the default selection as well, as I >>>> suggested above. >>>> >>>>> 2. As you suggested in another mail, introduce an new function: >>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free). >>>>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time, >>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets. >>>> I agree. >>>> >>>>> >>>>> Konstantin >>>>> >>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Zoltan >>>>>> >>>>>> On 27/05/15 21:12, Zoltan Kiss wrote: >>>>>>> This check doesn't do what's required by rte_eth_tx_burst: >>>>>>> "When the number of previously sent packets reached the "minimum transmit >>>>>>> packets to free" threshold" >>>>>>> >>>>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the >>>>>>> pool] < txq->nb_tx_desc. >>>>>>> >>>>>>> Signed-off-by: Zoltan Kiss >>>>>>> --- >>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++-- >>>>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +- >>>>>>> 2 files changed, 3 insertions(+), 3 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>> index 4f9ab22..b70ed8c 100644 >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, >>>>>>> >>>>>>> /* >>>>>>> * Begin scanning the H/W ring for done descriptors when the >>>>>>> - * number of available descriptors drops below tx_free_thresh. For >>>>>>> + * number of in flight descriptors reaches tx_free_thresh. For >>>>>>> * each done descriptor, free the associated buffer. >>>>>>> */ >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) >>>>>>> ixgbe_tx_free_bufs(txq); >>>>>>> >>>>>>> /* Only use descriptors that are available */ >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>>>> index abd10f6..f91c698 100644 >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, >>>>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST)) >>>>>>> nb_pkts = RTE_IXGBE_VPMD_TX_BURST; >>>>>>> >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) >>>>>>> ixgbe_tx_free_bufs(txq); >>>>>>> >>>>>>> nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts); >>>>>>>