From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43]) by dpdk.org (Postfix) with ESMTP id DCBF09A88 for ; Wed, 3 Jun 2015 19:46:41 +0200 (CEST) Received: by wgbgq6 with SMTP id gq6so15472427wgb.3 for ; Wed, 03 Jun 2015 10:46:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=Du7UaRROGgeJR87RQbnwCF67t9yTfgN2wzL53bc7j3E=; b=IGPJBA+KFVrCZ1q3Yym1lEZPp4v5dRDOdNic95n3p+9wM+3z5cBfXJNrFnHprvb01/ Bd4TFRLbnkEv/Oq/AiJ2vUNyPRg4XnFUagGuesEydmG617xsSHL37nvp/0rW/vjVx46M gcKQOiFJiFJLlPo4uKox5CxKpIYaobCccU0faq4XVCJvX0W9wwZnUyplN6W3FozQX/9n vt7qCD8K+EhZEFSmRePG360Y2fcdfQP239Log1S1q4RdhHxbEdfvDvoaJApMUXO+K2As 0s1ylC+8GA2zKx6GTmQDbQuNmnfOlpSBTCzNVNz7hrg6MA/yrhCLtsE19p0aecJaaHwf LurA== X-Gm-Message-State: ALoCoQkvDJNU2J4QrmTnU+gguyEQ6NF0785m3Iypm6AJF+gqBFEBNTmMkS7tz14xGdAMs7YPEPzM X-Received: by 10.180.149.240 with SMTP id ud16mr43385940wib.7.1433353601750; Wed, 03 Jun 2015 10:46:41 -0700 (PDT) Received: from [192.168.0.101] ([90.152.119.35]) by mx.google.com with ESMTPSA id n6sm2667671wic.16.2015.06.03.10.46.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Jun 2015 10:46:41 -0700 (PDT) Message-ID: <556F3D80.3070904@linaro.org> Date: Wed, 03 Jun 2015 18:46:40 +0100 From: Zoltan Kiss User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "dev@dpdk.org" References: <1432757539-8544-1-git-send-email-zoltan.kiss@linaro.org> <556C853E.8090902@linaro.org> <2601191342CEEE43887BDE71AB977258214346AE@irsmsx105.ger.corp.intel.com> <556DC6D9.3060008@linaro.org> <2601191342CEEE43887BDE71AB977258214348AA@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258214348AA@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jun 2015 17:46:42 -0000 On 02/06/15 18:35, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] >> Sent: Tuesday, June 02, 2015 4:08 PM >> To: Ananyev, Konstantin; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >> >> >> >> On 02/06/15 14:31, Ananyev, Konstantin wrote: >>> Hi Zoltan, >>> >>>> -----Original Message----- >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss >>>> Sent: Monday, June 01, 2015 5:16 PM >>>> To: dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >>>> >>>> Hi, >>>> >>>> Anyone would like to review this patch? Venky sent a NAK, but I've >>>> explained to him why it is a bug. >>> >>> >>> Well, I think Venky is right here. >> I think the comments above rte_eth_tx_burst() definition are quite clear >> about what tx_free_thresh means, e1000 and i40e use it that way, but not >> ixgbe. >> >>> Indeed that fix, will cause more often unsuccessful checks for DD bits and might cause a >>> slowdown for TX fast-path. >> Not if the applications set tx_free_thresh according to the definition >> of this value. But we can change the default value from 32 to something >> higher, e.g I'm using nb_desc/2, and it works out well. > > Sure we can, as I said below, we can unify it one way or another. > One way would be to make fast-path TX to free TXDs when number of occupied TXDs raises above tx_free_thresh > (what rte_ethdev.h comments say and what full-featured TX is doing). > Though in that case we have to change default value for tx_free_thresh, and all existing apps that > using tx_free_thresh==32 and fast-path TX will probably experience a slowdown. They are in trouble already, because i40e and e1000 uses it as defined. But I guess most apps are going with 0, which sets the drivers default. Others have to change the value to nb_txd - curr_value to have the same behaviour > Another way would be to make all TX functions to treat tx_conf->tx_free_thresh as fast-path TX functions do > (free TXDs when number of free TXDs drops below tx_free_thresh) and update rte_ethdev.h comments. And i40e and e1000e code as well. I don't see what difference it makes which way of definition you use, what I care is that it should be used consistently. > > Though, I am not sure that it really worth all these changes. > From one side, whatever tx_free_thresh would be, > the app should still assume that the worst case might happen, > and up to nb_tx_desc mbufs can be consumed by the queue. > From other side, I think the default value should work well for most cases. > So I am still for graceful deprecation of that config parameter, see below. > >> >>> Anyway, with current PMD implementation, you can't guarantee that at any moment >>> TX queue wouldn't use more than tx_free_thresh mbufs. >> >> >>> There could be situations (low speed, or link is down for some short period, etc), when >>> much more than tx_free_thresh TXDs are in use and none of them could be freed by HW right now. >>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_queues) could be in use >>> by TX path at any given moment. >>> >>> Though yes, there is an inconsistency how different ixgbe TX functions treat tx_conf->tx_free_thresh parameter. >>> That probably creates wrong expectations and confusion. >> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function >> doesn't. >> >>> We might try to unify it's usage one way or another, but I personally don't see much point in it. >>> After all, tx_free_tresh seems like a driver internal choice (based on the nb_tx_desc and other parameters). >>> So I think a better way would be: >>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) and make >>> each driver to use what it thinks would be the best value. >> But how does the driver knows what's the best for the applications >> traffic pattern? I think it's better to leave the possibility for the >> app to fine tune it. > > My understanding is that for most cases the default value should do pretty well. > That default value, shouldn't be too small, so we avoid unnecessary & unsuccessful checks, > and probably shouldn't be too big, to prevent unnecessary mbufs consumption > (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably). I agree > > But might be you have a good example, when such tuning is needed? > For what traffic patterns you would set tx_free_thresh to some different values, > and how will it impact performance? I don't have an actual example, but I think it's worth to keep this tuning option if we already have it. Most people probably wouldn't use it, but I can imagine that the very enthusiastic wants to try out different settings to find the best. E.g. I was testing odp_l2fwd when I came across the problem, and I found it useful to have this option. With its traffic pattern (receive a batch of packets then send them out on an another interface) it can happen that with different clock speeds you can find different optimums. > > Again, if there would be tx_free_pkts(), why someone would also need a tx_conf->tx_free_thresh? I think about tx_free_pkts as a rainy day option, when you want ALL TX completed packets to be released, because you are out of buffers. While tx_free_thresh is the fast path way of TX completion, when you have the room to wait for more packets to be gathered. > > Konstantin > >> In the meantime we can improve the default selection as well, as I >> suggested above. >> >>> 2. As you suggested in another mail, introduce an new function: >>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free). >>> That would give upper layer a better control of memory usage, and might be called by the upper layer at idle time, >>> so further tx_burst, don't need to spend time on freeing TXDs/packets. >> I agree. >> >>> >>> Konstantin >>> >>> >>>> >>>> Regards, >>>> >>>> Zoltan >>>> >>>> On 27/05/15 21:12, Zoltan Kiss wrote: >>>>> This check doesn't do what's required by rte_eth_tx_burst: >>>>> "When the number of previously sent packets reached the "minimum transmit >>>>> packets to free" threshold" >>>>> >>>>> This can cause problems when txq->tx_free_thresh + [number of elements in the >>>>> pool] < txq->nb_tx_desc. >>>>> >>>>> Signed-off-by: Zoltan Kiss >>>>> --- >>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++-- >>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +- >>>>> 2 files changed, 3 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> index 4f9ab22..b70ed8c 100644 >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, >>>>> >>>>> /* >>>>> * Begin scanning the H/W ring for done descriptors when the >>>>> - * number of available descriptors drops below tx_free_thresh. For >>>>> + * number of in flight descriptors reaches tx_free_thresh. For >>>>> * each done descriptor, free the associated buffer. >>>>> */ >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) >>>>> ixgbe_tx_free_bufs(txq); >>>>> >>>>> /* Only use descriptors that are available */ >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>> index abd10f6..f91c698 100644 >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, >>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST)) >>>>> nb_pkts = RTE_IXGBE_VPMD_TX_BURST; >>>>> >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) >>>>> ixgbe_tx_free_bufs(txq); >>>>> >>>>> nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts); >>>>>