From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 2AD095A54 for ; Tue, 9 Jun 2015 13:20:05 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP; 09 Jun 2015 04:18:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,579,1427785200"; d="scan'208";a="743561504" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga002.jf.intel.com with ESMTP; 09 Jun 2015 04:18:14 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.73]) by IRSMSX107.ger.corp.intel.com ([169.254.10.94]) with mapi id 14.03.0224.002; Tue, 9 Jun 2015 12:18:12 +0100 From: "Ananyev, Konstantin" To: Zoltan Kiss , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh Thread-Index: AQHQmLmPEOJX52mtSEe6r+HqaQm4f52XykYAgAFWiJCAACjagIAAFnEAgAGoLgCAB52ZwA== Date: Tue, 9 Jun 2015 11:18:12 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com> References: <1432757539-8544-1-git-send-email-zoltan.kiss@linaro.org> <556C853E.8090902@linaro.org> <2601191342CEEE43887BDE71AB977258214346AE@irsmsx105.ger.corp.intel.com> <556DC6D9.3060008@linaro.org> <2601191342CEEE43887BDE71AB977258214348AA@irsmsx105.ger.corp.intel.com> <556F3D80.3070904@linaro.org> In-Reply-To: <556F3D80.3070904@linaro.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jun 2015 11:20:06 -0000 > -----Original Message----- > From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] > Sent: Wednesday, June 03, 2015 6:47 PM > To: Ananyev, Konstantin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >=20 >=20 >=20 > On 02/06/15 18:35, Ananyev, Konstantin wrote: > > > > > >> -----Original Message----- > >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] > >> Sent: Tuesday, June 02, 2015 4:08 PM > >> To: Ananyev, Konstantin; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > >> > >> > >> > >> On 02/06/15 14:31, Ananyev, Konstantin wrote: > >>> Hi Zoltan, > >>> > >>>> -----Original Message----- > >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss > >>>> Sent: Monday, June 01, 2015 5:16 PM > >>>> To: dev@dpdk.org > >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thre= sh > >>>> > >>>> Hi, > >>>> > >>>> Anyone would like to review this patch? Venky sent a NAK, but I've > >>>> explained to him why it is a bug. > >>> > >>> > >>> Well, I think Venky is right here. > >> I think the comments above rte_eth_tx_burst() definition are quite cle= ar > >> about what tx_free_thresh means, e1000 and i40e use it that way, but n= ot > >> ixgbe. > >> > >>> Indeed that fix, will cause more often unsuccessful checks for DD bit= s and might cause a > >>> slowdown for TX fast-path. > >> Not if the applications set tx_free_thresh according to the definition > >> of this value. But we can change the default value from 32 to somethin= g > >> higher, e.g I'm using nb_desc/2, and it works out well. > > > > Sure we can, as I said below, we can unify it one way or another. > > One way would be to make fast-path TX to free TXDs when number of occup= ied TXDs raises above tx_free_thresh > > (what rte_ethdev.h comments say and what full-featured TX is doing). > > Though in that case we have to change default value for tx_free_thresh,= and all existing apps that > > using tx_free_thresh=3D=3D32 and fast-path TX will probably experience = a slowdown. >=20 > They are in trouble already, because i40e and e1000 uses it as defined. In fact, i40e has exactly the same problem as ixgbe: fast-path and full-featured TX code treat tx_free_thresh in a different w= ay. igb just ignores input tx_free_thresh, while em has only full featured path= . What I am saying, existing app that uses TX fast-path and sets tx_free_thre= sh=3D32 (as we did in our examples in previous versions) will experience a slowdown= , if we'll make all TX functions to behave like full-featured ones (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh). >>From other side, if app uses TX full-featured TX and sets tx_free_thresh=3D= 32, then it already has a possible slowdown, because of too often TXDs checkin= g.=20 So, if we'll change tx_free_thresh semantics to wht fast-path uses, It shouldn't see any slowdown, in fact it might see some improvement. > But I guess most apps are going with 0, which sets the drivers default. > Others have to change the value to nb_txd - curr_value to have the same > behaviour >=20 > > Another way would be to make all TX functions to treat tx_conf->tx_free= _thresh as fast-path TX functions do > > (free TXDs when number of free TXDs drops below tx_free_thresh) and up= date rte_ethdev.h comments. > And i40e and e1000e code as well. I don't see what difference it makes > which way of definition you use, what I care is that it should be used > consistently. Yes, both ways are possible, the concern is - how to minimise the impact fo= r existing apps. That's why I am leaning to the fast-path way. > > > > Though, I am not sure that it really worth all these changes. > > From one side, whatever tx_free_thresh would be, > > the app should still assume that the worst case might happen, > > and up to nb_tx_desc mbufs can be consumed by the queue. > > From other side, I think the default value should work well for most c= ases. > > So I am still for graceful deprecation of that config parameter, see be= low. > > > >> > >>> Anyway, with current PMD implementation, you can't guarantee that at = any moment > >>> TX queue wouldn't use more than tx_free_thresh mbufs. > >> > >> > >>> There could be situations (low speed, or link is down for some short = period, etc), when > >>> much more than tx_free_thresh TXDs are in use and none of them could = be freed by HW right now. > >>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_qu= eues) could be in use > >>> by TX path at any given moment. > >>> > >>> Though yes, there is an inconsistency how different ixgbe TX functio= ns treat tx_conf->tx_free_thresh parameter. > >>> That probably creates wrong expectations and confusion. > >> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function > >> doesn't. > >> > >>> We might try to unify it's usage one way or another, but I personally= don't see much point in it. > >>> After all, tx_free_tresh seems like a driver internal choice (based o= n the nb_tx_desc and other parameters). > >>> So I think a better way would be: > >>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases= ) and make > >>> each driver to use what it thinks would be the best value. > >> But how does the driver knows what's the best for the applications > >> traffic pattern? I think it's better to leave the possibility for the > >> app to fine tune it. > > > > My understanding is that for most cases the default value should do pre= tty well. > > That default value, shouldn't be too small, so we avoid unnecessary & u= nsuccessful checks, > > and probably shouldn't be too big, to prevent unnecessary mbufs consump= tion > > (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably). > I agree >=20 > > > > But might be you have a good example, when such tuning is needed? > > For what traffic patterns you would set tx_free_thresh to some differen= t values, > > and how will it impact performance? > I don't have an actual example, but I think it's worth to keep this > tuning option if we already have it. Most people probably wouldn't use > it, but I can imagine that the very enthusiastic wants to try out > different settings to find the best. > E.g. I was testing odp_l2fwd when I came across the problem, and I found > it useful to have this option. With its traffic pattern (receive a batch > of packets then send them out on an another interface) it can happen > that with different clock speeds you can find different optimums. Actually, after thinking about it a bit more - I think it would more depend on how many RX/TX queues particular lcore has= to manage. So, as you said, yes we probably better leave it, at least for now. >=20 > > > > Again, if there would be tx_free_pkts(), why someone would also need a = tx_conf->tx_free_thresh? > I think about tx_free_pkts as a rainy day option, when you want ALL TX > completed packets to be released, because you are out of buffers. What do you mean as 'rainy day' here? App getting short of mbufs? As I said before, it could be absolutely valid situation when up to nb_tx_desc mbufs per TX queue can be in use. So the app better be prepared for such situation anyway. Either make sure there are enough mbufs in the pool, or somehow gracefully degrade the service. Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K), freeing all TXDs at once could take a long time. So I think it should be possible to specify maximum number of mbufs to free= per call.=20 My thought people will use it to release mbufs when there is an idle period= . Something like: n =3D rx_burst(...); if (n =3D=3D 0) { ...; tx_free_mbufs(...); ... } else {...} Konstantin > While > tx_free_thresh is the fast path way of TX completion, when you have the > room to wait for more packets to be gathered. >=20 > > > > Konstantin > > > >> In the meantime we can improve the default selection as well, as I > >> suggested above. > >> > >>> 2. As you suggested in another mail, introduce an new function: > >>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free). > >>> That would give upper layer a better control of memory usage, and mig= ht be called by the upper layer at idle time, > >>> so further tx_burst, don't need to spend time on freeing TXDs/packets= . > >> I agree. > >> > >>> > >>> Konstantin > >>> > >>> > >>>> > >>>> Regards, > >>>> > >>>> Zoltan > >>>> > >>>> On 27/05/15 21:12, Zoltan Kiss wrote: > >>>>> This check doesn't do what's required by rte_eth_tx_burst: > >>>>> "When the number of previously sent packets reached the "minimum tr= ansmit > >>>>> packets to free" threshold" > >>>>> > >>>>> This can cause problems when txq->tx_free_thresh + [number of eleme= nts in the > >>>>> pool] < txq->nb_tx_desc. > >>>>> > >>>>> Signed-off-by: Zoltan Kiss > >>>>> --- > >>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++-- > >>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +- > >>>>> 2 files changed, 3 insertions(+), 3 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixg= be_rxtx.c > >>>>> index 4f9ab22..b70ed8c 100644 > >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf = **tx_pkts, > >>>>> > >>>>> /* > >>>>> * Begin scanning the H/W ring for done descriptors when the > >>>>> - * number of available descriptors drops below tx_free_thresh. F= or > >>>>> + * number of in flight descriptors reaches tx_free_thresh. For > >>>>> * each done descriptor, free the associated buffer. > >>>>> */ > >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>> ixgbe_tx_free_bufs(txq); > >>>>> > >>>>> /* Only use descriptors that are available */ > >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe= /ixgbe_rxtx_vec.c > >>>>> index abd10f6..f91c698 100644 > >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_= mbuf **tx_pkts, > >>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST)) > >>>>> nb_pkts =3D RTE_IXGBE_VPMD_TX_BURST; > >>>>> > >>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>> ixgbe_tx_free_bufs(txq); > >>>>> > >>>>> nb_commit =3D nb_pkts =3D (uint16_t)RTE_MIN(txq->nb_tx_free, n= b_pkts); > >>>>>