From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id F34E61288 for ; Tue, 9 Jun 2015 18:18:21 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 09 Jun 2015 08:45:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,581,1427785200"; d="scan'208";a="505570339" Received: from irsmsx105.ger.corp.intel.com ([163.33.3.28]) by FMSMGA003.fm.intel.com with ESMTP; 09 Jun 2015 08:45:13 -0700 Received: from irsmsx104.ger.corp.intel.com ([169.254.5.171]) by irsmsx105.ger.corp.intel.com ([169.254.7.73]) with mapi id 14.03.0224.002; Tue, 9 Jun 2015 16:44:05 +0100 From: "Ananyev, Konstantin" To: Zoltan Kiss , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh Thread-Index: AQHQmLmPEOJX52mtSEe6r+HqaQm4f52XykYAgAFWiJCAACjagIAAFnEAgAGoLgCAB52ZwIABpCqAgAAZtFA= Date: Tue, 9 Jun 2015 15:44:04 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836A021F8@IRSMSX104.ger.corp.intel.com> References: <1432757539-8544-1-git-send-email-zoltan.kiss@linaro.org> <556C853E.8090902@linaro.org> <2601191342CEEE43887BDE71AB977258214346AE@irsmsx105.ger.corp.intel.com> <556DC6D9.3060008@linaro.org> <2601191342CEEE43887BDE71AB977258214348AA@irsmsx105.ger.corp.intel.com> <556F3D80.3070904@linaro.org> <2601191342CEEE43887BDE71AB97725821436355@irsmsx105.ger.corp.intel.com> <55770167.6080701@linaro.org> In-Reply-To: <55770167.6080701@linaro.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jun 2015 16:18:23 -0000 > -----Original Message----- > From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] > Sent: Tuesday, June 09, 2015 4:08 PM > To: Ananyev, Konstantin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh >=20 >=20 >=20 > On 09/06/15 12:18, Ananyev, Konstantin wrote: > > > > > >> -----Original Message----- > >> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] > >> Sent: Wednesday, June 03, 2015 6:47 PM > >> To: Ananyev, Konstantin; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > >> > >> > >> > >> On 02/06/15 18:35, Ananyev, Konstantin wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Zoltan Kiss [mailto:zoltan.kiss@linaro.org] > >>>> Sent: Tuesday, June 02, 2015 4:08 PM > >>>> To: Ananyev, Konstantin; dev@dpdk.org > >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thre= sh > >>>> > >>>> > >>>> > >>>> On 02/06/15 14:31, Ananyev, Konstantin wrote: > >>>>> Hi Zoltan, > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss > >>>>>> Sent: Monday, June 01, 2015 5:16 PM > >>>>>> To: dev@dpdk.org > >>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_th= resh > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've > >>>>>> explained to him why it is a bug. > >>>>> > >>>>> > >>>>> Well, I think Venky is right here. > >>>> I think the comments above rte_eth_tx_burst() definition are quite c= lear > >>>> about what tx_free_thresh means, e1000 and i40e use it that way, but= not > >>>> ixgbe. > >>>> > >>>>> Indeed that fix, will cause more often unsuccessful checks for DD b= its and might cause a > >>>>> slowdown for TX fast-path. > >>>> Not if the applications set tx_free_thresh according to the definiti= on > >>>> of this value. But we can change the default value from 32 to someth= ing > >>>> higher, e.g I'm using nb_desc/2, and it works out well. > >>> > >>> Sure we can, as I said below, we can unify it one way or another. > >>> One way would be to make fast-path TX to free TXDs when number of occ= upied TXDs raises above tx_free_thresh > >>> (what rte_ethdev.h comments say and what full-featured TX is doing). > >>> Though in that case we have to change default value for tx_free_thres= h, and all existing apps that > >>> using tx_free_thresh=3D=3D32 and fast-path TX will probably experienc= e a slowdown. > >> > >> They are in trouble already, because i40e and e1000 uses it as defined= . > > > > In fact, i40e has exactly the same problem as ixgbe: > > fast-path and full-featured TX code treat tx_free_thresh in a differe= nt way. > > igb just ignores input tx_free_thresh, while em has only full featured = path. > > > > What I am saying, existing app that uses TX fast-path and sets tx_free_= thresh=3D32 > > (as we did in our examples in previous versions) will experience a slow= down, > > if we'll make all TX functions to behave like full-featured ones > > (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh). > > > > From other side, if app uses TX full-featured TX and sets tx_free_thre= sh=3D32, > > then it already has a possible slowdown, because of too often TXDs che= cking. > > So, if we'll change tx_free_thresh semantics to wht fast-path uses, > > It shouldn't see any slowdown, in fact it might see some improvement. > > > >> But I guess most apps are going with 0, which sets the drivers default= . > >> Others have to change the value to nb_txd - curr_value to have the sam= e > >> behaviour > >> > >>> Another way would be to make all TX functions to treat tx_conf->tx_fr= ee_thresh as fast-path TX functions do > >>> (free TXDs when number of free TXDs drops below tx_free_thresh) and = update rte_ethdev.h comments. > >> And i40e and e1000e code as well. I don't see what difference it makes > >> which way of definition you use, what I care is that it should be used > >> consistently. > > > > Yes, both ways are possible, the concern is - how to minimise the impac= t for existing apps. > > That's why I am leaning to the fast-path way. >=20 > Make sense to favour the fast-path way, I'll look into that and try to > come up with a patch >=20 > > > >>> > >>> Though, I am not sure that it really worth all these changes. > >>> From one side, whatever tx_free_thresh would be, > >>> the app should still assume that the worst case might happen, > >>> and up to nb_tx_desc mbufs can be consumed by the queue. > >>> From other side, I think the default value should work well for mos= t cases. > >>> So I am still for graceful deprecation of that config parameter, see = below. > >>> > >>>> > >>>>> Anyway, with current PMD implementation, you can't guarantee that a= t any moment > >>>>> TX queue wouldn't use more than tx_free_thresh mbufs. > >>>> > >>>> > >>>>> There could be situations (low speed, or link is down for some shor= t period, etc), when > >>>>> much more than tx_free_thresh TXDs are in use and none of them coul= d be freed by HW right now. > >>>>> So your app better be prepared, that up to (nb_tx_desc * num_of_TX_= queues) could be in use > >>>>> by TX path at any given moment. > >>>>> > >>>>> Though yes, there is an inconsistency how different ixgbe TX funct= ions treat tx_conf->tx_free_thresh parameter. > >>>>> That probably creates wrong expectations and confusion. > >>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two functio= n > >>>> doesn't. > >>>> > >>>>> We might try to unify it's usage one way or another, but I personal= ly don't see much point in it. > >>>>> After all, tx_free_tresh seems like a driver internal choice (based= on the nb_tx_desc and other parameters). > >>>>> So I think a better way would be: > >>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releas= es) and make > >>>>> each driver to use what it thinks would be the best value. > >>>> But how does the driver knows what's the best for the applications > >>>> traffic pattern? I think it's better to leave the possibility for th= e > >>>> app to fine tune it. > >>> > >>> My understanding is that for most cases the default value should do p= retty well. > >>> That default value, shouldn't be too small, so we avoid unnecessary &= unsuccessful checks, > >>> and probably shouldn't be too big, to prevent unnecessary mbufs consu= mption > >>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably). > >> I agree > >> > >>> > >>> But might be you have a good example, when such tuning is needed? > >>> For what traffic patterns you would set tx_free_thresh to some differ= ent values, > >>> and how will it impact performance? > >> I don't have an actual example, but I think it's worth to keep this > >> tuning option if we already have it. Most people probably wouldn't use > >> it, but I can imagine that the very enthusiastic wants to try out > >> different settings to find the best. > >> E.g. I was testing odp_l2fwd when I came across the problem, and I fou= nd > >> it useful to have this option. With its traffic pattern (receive a bat= ch > >> of packets then send them out on an another interface) it can happen > >> that with different clock speeds you can find different optimums. > > > > Actually, after thinking about it a bit more - > > I think it would more depend on how many RX/TX queues particular lcore= has to manage. > > So, as you said, yes we probably better leave it, at least for now. > > > >> > >>> > >>> Again, if there would be tx_free_pkts(), why someone would also need = a tx_conf->tx_free_thresh? > >> I think about tx_free_pkts as a rainy day option, when you want ALL TX > >> completed packets to be released, because you are out of buffers. > > > > What do you mean as 'rainy day' here? > > App getting short of mbufs? > > As I said before, it could be absolutely valid situation when > > up to nb_tx_desc mbufs per TX queue can be in use. > > So the app better be prepared for such situation anyway. > > Either make sure there are enough mbufs in the pool, > > or somehow gracefully degrade the service. > > Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K), > > freeing all TXDs at once could take a long time. > > So I think it should be possible to specify maximum number of mbufs to = free per call. > > > > My thought people will use it to release mbufs when there is an idle pe= riod. > > Something like: > > > > n =3D rx_burst(...); > > if (n =3D=3D 0) { ...; tx_free_mbufs(...); ... } > > else {...} >=20 > Yes, this is similar to what I'm doing in odp-dpdk: >=20 > https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd8357= 29bc3d01fcb87ebc6b >=20 > The only problem is when the tx_free_threshold is not reached, the > interfaces doesn't even start releasing the mbufs. And when you can't > receive anything, it's likely you won't send out anything as well, which > would normally trigger the releasing of the completed buffers. Yep, that's why I think it is good to add this function: If you are going to TX something, then you'll probably hit tx_free_thresh and PMD will release unused mbufs 'automatically'. If you have nothing to TX (idle period), you can try to release unused mbuf= s manually: by calling tx_free_mbufs(). Konstantin >=20 > > > > Konstantin > > > >> While > >> tx_free_thresh is the fast path way of TX completion, when you have th= e > >> room to wait for more packets to be gathered. > >> > >>> > >>> Konstantin > >>> > >>>> In the meantime we can improve the default selection as well, as I > >>>> suggested above. > >>>> > >>>>> 2. As you suggested in another mail, introduce an new function: > >>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free). > >>>>> That would give upper layer a better control of memory usage, and m= ight be called by the upper layer at idle time, > >>>>> so further tx_burst, don't need to spend time on freeing TXDs/packe= ts. > >>>> I agree. > >>>> > >>>>> > >>>>> Konstantin > >>>>> > >>>>> > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Zoltan > >>>>>> > >>>>>> On 27/05/15 21:12, Zoltan Kiss wrote: > >>>>>>> This check doesn't do what's required by rte_eth_tx_burst: > >>>>>>> "When the number of previously sent packets reached the "minimum = transmit > >>>>>>> packets to free" threshold" > >>>>>>> > >>>>>>> This can cause problems when txq->tx_free_thresh + [number of ele= ments in the > >>>>>>> pool] < txq->nb_tx_desc. > >>>>>>> > >>>>>>> Signed-off-by: Zoltan Kiss > >>>>>>> --- > >>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++-- > >>>>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +- > >>>>>>> 2 files changed, 3 insertions(+), 3 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/i= xgbe_rxtx.c > >>>>>>> index 4f9ab22..b70ed8c 100644 > >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbu= f **tx_pkts, > >>>>>>> > >>>>>>> /* > >>>>>>> * Begin scanning the H/W ring for done descriptors when th= e > >>>>>>> - * number of available descriptors drops below tx_free_thresh. = For > >>>>>>> + * number of in flight descriptors reaches tx_free_thresh. For > >>>>>>> * each done descriptor, free the associated buffer. > >>>>>>> */ > >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>>>> ixgbe_tx_free_bufs(txq); > >>>>>>> > >>>>>>> /* Only use descriptors that are available */ > >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixg= be/ixgbe_rxtx_vec.c > >>>>>>> index abd10f6..f91c698 100644 > >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rt= e_mbuf **tx_pkts, > >>>>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST)) > >>>>>>> nb_pkts =3D RTE_IXGBE_VPMD_TX_BURST; > >>>>>>> > >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>>>> ixgbe_tx_free_bufs(txq); > >>>>>>> > >>>>>>> nb_commit =3D nb_pkts =3D (uint16_t)RTE_MIN(txq->nb_tx_free= , nb_pkts); > >>>>>>>