From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 30B5E3DC for ; Fri, 3 Mar 2017 19:46:54 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2017 10:46:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,238,1484035200"; d="scan'208";a="1104491432" Received: from orsmsx110.amr.corp.intel.com ([10.22.240.8]) by orsmga001.jf.intel.com with ESMTP; 03 Mar 2017 10:46:53 -0800 Received: from orsmsx151.amr.corp.intel.com (10.22.226.38) by ORSMSX110.amr.corp.intel.com (10.22.240.8) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 3 Mar 2017 10:46:53 -0800 Received: from orsmsx102.amr.corp.intel.com ([169.254.3.228]) by ORSMSX151.amr.corp.intel.com ([169.254.7.168]) with mapi id 14.03.0248.002; Fri, 3 Mar 2017 10:46:53 -0800 From: "Venkatesan, Venky" To: Olivier Matz CC: "Richardson, Bruce" , "dev@dpdk.org" , "thomas.monjalon@6wind.com" , "Ananyev, Konstantin" , "Lu, Wenzhuo" , "Zhang, Helin" , "Wu, Jingjing" , "adrien.mazarguil@6wind.com" , "nelio.laranjeiro@6wind.com" , "Yigit, Ferruh" Thread-Topic: [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors Thread-Index: AQHSkrAobLfqZIwC9keND6chLEfjXqGCNRKAgAAL7QCAAQrVUIAAj+iA//+J/IA= Date: Fri, 3 Mar 2017 18:46:52 +0000 Message-ID: <1FD9B82B8BF2CF418D9A1000154491D97F007A8E@ORSMSX102.amr.corp.intel.com> References: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com> <1488388752-1819-1-git-send-email-olivier.matz@6wind.com> <20170302153215.GA173492@bricha3-MOBL3.ger.corp.intel.com> <20170302171456.52a9415b@platinum> <1FD9B82B8BF2CF418D9A1000154491D97F0072FD@ORSMSX102.amr.corp.intel.com> <20170303174501.7dbfbf10@platinum> In-Reply-To: <20170303174501.7dbfbf10@platinum> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMWU5MGVlNzctMjk2OS00YThiLTg0ZWItMzI5YTZmZTA5NjUyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InB4NTdNTXhkTEZqenh6MjhNQWdndFBzZ0V2M1NEMk1uUEY1bVVpRjQrY3c9In0= x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Mar 2017 18:46:55 -0000 Hi Olivier,=20 > -----Original Message----- > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Friday, March 3, 2017 8:45 AM > To: Venkatesan, Venky > Cc: Richardson, Bruce ; dev@dpdk.org; > thomas.monjalon@6wind.com; Ananyev, Konstantin > ; Lu, Wenzhuo ; > Zhang, Helin ; Wu, Jingjing > ; adrien.mazarguil@6wind.com; > nelio.laranjeiro@6wind.com; Yigit, Ferruh > Subject: Re: [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors >=20 > Hi Venky, >=20 > On Fri, 3 Mar 2017 16:18:52 +0000, "Venkatesan, Venky" > wrote: > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz > > > Sent: Thursday, March 2, 2017 8:15 AM > > > To: Richardson, Bruce > > > Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Ananyev, Konstantin > > > ; Lu, Wenzhuo > ; > > > Zhang, Helin ; Wu, Jingjing > > > ; adrien.mazarguil@6wind.com; > > > nelio.laranjeiro@6wind.com; Yigit, Ferruh > > > Subject: Re: [dpdk-dev] [PATCH 0/6] get status of Rx and Tx > > > descriptors > > > > > > On Thu, 2 Mar 2017 15:32:15 +0000, Bruce Richardson > > > wrote: > > > > On Wed, Mar 01, 2017 at 06:19:06PM +0100, Olivier Matz wrote: > > > > > This patchset introduces a new ethdev API: > > > > > - rte_eth_rx_descriptor_status() > > > > > - rte_eth_tx_descriptor_status() > > > > > > > > > > The Rx API is aims to replace rte_eth_rx_descriptor_done() which > > > > > does almost the same, but does not differentiate the case of a > > > > > descriptor used by the driver (not returned to the hw). > > > > > > > > > > The usage of these functions can be: > > > > > - on Rx, anticipate that the cpu is not fast enough to process > > > > > all incoming packets, and take dispositions to solve the > > > > > problem (add more cpus, drop specific packets, ...) > > > > > - on Tx, detect that the link is overloaded, and take disposition= s > > > > > to solve the problem (notify flow control, drop specific > > > > > packets) > > > > > > > > > Looking at it from a slightly higher level, are these APIs really > > > > going to help in these situations? If something is overloaded, > > > > doing more work by querying the ring status only makes things > > > > worse. I suspect that in most cases better results can be got by > > > > just looking at the results of RX and TX burst functions. For > > > > example, if RX burst is always returning a full set of packets, > > > > chances are you are overloaded, or at least have no scope for an > unexpected burst or event. > > > > > > > > Are these really needed for real applications? I suspect our > > > > trivial l3fwd power example can be made to work ok without them. > > > > > > The l3fwd example uses the rte_eth_rx_descriptor_done() API, which > > > is very similar to what I'm adding here. The differences are: > > > > > > - the new lib provides a Tx counterpart > > > - it differentiates done/avail/hold descriptors > > > > > > The alternative was to update the descriptor_done API, but I think > > > we agreed to do that in this thread: > > > http://www.dpdk.org/ml/archives/dev/2017-January/054947.html > > > > > > About the usefulness of the API, I confirm it is useful: for > > > instance, you can detect that you ring is more than half-full, and > > > take dispositions to increase your processing power or select the pac= kets > you want to drop first. > > > > > For either of those cases, you could still implement this in your appli= cation > without any of these APIs. Simply keep reading rx_burst() till it returns= zero. > You now have all the packets that you want - look at how many and increas= e > your processing power, or drop them. >=20 > In my use case, I may have several thresholds, so it gives a fast informa= tion > about the ring status. The problem is that it costs too much to return that status. Let me explain= it this way - when processing a burst, it takes the IXGBE driver ~20 - 25 = cycles to process a packet. Assuming memory latency is 75ns, a miss to memo= ry costs ~150 cycles at 2.0 GHz, or 215 cycles at 3.0 GHz. Either way, it i= s between 7 - 10 packet times if you miss to memory. In the case you are su= ggesting where the CPU isn't able to keep up with the packets, all we've ha= ve really done is to make it harder for the CPU to keep up.=20 Could you use an RX alternative that returns a 0 or 1 if there are more pac= kets remaining in the ring? That will be lower cost to implement (as a sepa= rate API or even as a part of the Rx_burst API itself). But touching the ri= ng at a random offset is almost always going to be a performance problem.=20 >=20 > Keeping reading rx_burst() until it returns 0 will not work if the packet= rate is > higher than (or close to) what the cpu is able to eat. >=20 If the packet input rate is higher than what the CPU is capable of handling= , reading the packets gives you the option of dropping those that you consi= der lower priority - if you have an application that takes 400 cycles to p= rocess a packet, but the input rate is a packet every 100 cycles, then what= you have to look at is to read the packets, figure out a lighter weight wa= y of determining a drop per packet (easy suggestion, use the RX filter API = to tag packets) and drop them within 10-20 cycles per packet. Ideally, you = would do this by draining some percentage of the descriptor ring and priori= tizing and dropping those. A second, even better alternative is to use NIC = facilities to prioritize packets into different queues. I don't see how add= ing another 150 cycles to the budget helps you keep up with packets.=20 > > > > The issue I have with this newer instantiation of the API is that it is > essentially to pick up a descriptor at a specified offset. In most cases,= if you > plan to read far enough ahead with the API (let's say 16 or 32 ahead, or = even > more), you are almost always guaranteed an L1/L2 miss - essentially makin= g it > a costly API call. In cases that don't have something like Data Direct I/= O > (DDIO), you are now going to hit memory and stall the CPU for a long time= . In > any case, the API becomes pretty useless unless you want to stay within a > smaller look ahead offset. The rx_burst() methodology simply works better > in most circumstances, and allows application level control. > > > > So, NAK. My suggestion would be to go back to the older API. >=20 > I don't understand the reason of your nack. > The old API is there (for Rx it works the same), and it is illustrated in= an > example. Since your arguments also applies to the old API, so why are you > saying we should keep the older API? >=20 I am not a fan of the old API either. In hindsight, it was a mistake (which= we didn't catch in time). As Bruce suggested, the example should be rework= ed to work without the API, and deprecate it.=20 > For Tx, I want to know if I have enough room to send my packets before > doing it. There is no API yet to do that. >=20 Yes. This could be a lightweight API if it returned a count (txq->nb_tx_fre= e) instead of actually touching the ring, which is what I have a problem wi= th. If the implementation changes to that, that may be okay to do. The Tx A= PI has more merit than the Rx API, but not coded around an offset. > And yes, this could trigger cache misses, but in some situations it's pre= ferable > to be a bit slower (all tests are not test-iofwd) and be able to anticipa= te that > the ring is getting full. >=20 >=20 > Regards, > Olivier