From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id B4EDA902 for ; Wed, 28 Oct 2015 22:27:18 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 28 Oct 2015 14:27:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,211,1444719600"; d="scan'208";a="837868526" Received: from orsmsx105.amr.corp.intel.com ([10.22.225.132]) by fmsmga002.fm.intel.com with ESMTP; 28 Oct 2015 14:27:17 -0700 Received: from orsmsx159.amr.corp.intel.com (10.22.240.24) by ORSMSX105.amr.corp.intel.com (10.22.225.132) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 28 Oct 2015 14:27:17 -0700 Received: from orsmsx102.amr.corp.intel.com ([169.254.1.29]) by ORSMSX159.amr.corp.intel.com ([169.254.11.120]) with mapi id 14.03.0248.002; Wed, 28 Oct 2015 14:27:16 -0700 From: "Polehn, Mike A" To: "Richardson, Bruce" Thread-Topic: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates Thread-Index: AdEQ+AadyfEp6lhBStyWaXD0d3T1sgAsEmyAAAYhdmA= Date: Wed, 28 Oct 2015 21:27:15 +0000 Message-ID: <745DB4B8861F8E4B9849C970520ABBF14974C81E@ORSMSX102.amr.corp.intel.com> References: <745DB4B8861F8E4B9849C970520ABBF14974C1DF@ORSMSX102.amr.corp.intel.com> <20151028104437.GA8052@bricha3-MOBL3> In-Reply-To: <20151028104437.GA8052@bricha3-MOBL3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsIiwiaWQiOiJkODQ5YTY3Yy1lODEwLTQ3ZGItYmE0NC1jM2YzMzQ1OWI4ZmUiLCJwcm9wcyI6W3sibiI6IkludGVsRGF0YUNsYXNzaWZpY2F0aW9uIiwidmFscyI6W3sidmFsdWUiOiJDVFBfSUMifV19XX0sIlN1YmplY3RMYWJlbHMiOltdLCJUTUNWZXJzaW9uIjoiMTUuNC4xMC4xOSIsIlRydXN0ZWRMYWJlbEhhc2giOiJKYTBBWTFwV2dKOXd2MlBPMWpQalRPMmxWTTRicVNoeldqSTFGelRnRjlNPSJ9 x-inteldataclassification: CTP_IC x-originating-ip: [10.22.254.138] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC processing rates X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Oct 2015 21:27:19 -0000 Hi Bruce! Thank you for reviewing, sorry didn't write clearly as possible. I was trying to say more than "The performance improved". I didn't call out= RFC 2544 since many=20 people may not know much about it. I was also trying to convey what was obs= erved and the=20 conclusion derived from the observation without getting too big. When the NIC processing loop rate is around 400,000/sec the entry and exit = savings are not easily=20 observable when the average data rate variation from test to test is higher= than the packet rate=20 gain. If RFC 2544 zero loss convergence is set too fine, the time it takes = to make a complete test=20 increases substantially (I set my convergence about 0.25% of line rate) at = 60 seconds per=20 measurement point. Unless the current convergence data rate is close to zer= o loss for the next point, a small improvement is not going to show up as higher zero loss= rate. However the test has a series of measurements, which has average latency and packet los= s. Also since the test equipment uses a predefined sequence algorithm that cause the same dat= a rate to to a high degree of accuracy be generated for each test, the results for sa= me data rates can be compared across tests. If someone repeats the tests, I am pointing to the p= articular data to look at. One 60 second measurement itself does not give sufficient accuracy= to make a=20 conclusion, but information correlated across multiple measurements gives b= asis for a correct conclusion. For l3fwd, to be stable with i40e requires the queues to be increased (I us= e 2k) and the=20 Packet count to also be increased. This then gets 100% zero loss line rate = with 64 byte=20 Packets for 2 10 GbE connections (given the correct Fortville firmware). Th= is makes it good to verify the correct NIC firmware but does not work well for testing = since the=20 data is network limited. I have my own stable packet processing code which = I used for=20 testing. I have multiple programs, but during the optimization cycle, hit l= ine rate and had to move to a 5 tuple processing program for a higher load to proceed. I= have a doc that covers this setup and the optimization results, but cannot be shar= ed. Someone making their on measurements needs to have made sufficient tests to underst= and the stability of their test environment. Mike -----Original Message----- From: Richardson, Bruce=20 Sent: Wednesday, October 28, 2015 3:45 AM To: Polehn, Mike A Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [Patch] Eth Driver: Optimization for improved NIC p= rocessing rates On Tue, Oct 27, 2015 at 08:56:31PM +0000, Polehn, Mike A wrote: > Prefetch of interface access variables while calling into driver RX and T= X subroutines. >=20 > For converging zero loss packet task tests, a small drop in latency=20 > for zero loss measurements and small drop in lost packet counts for=20 > the lossy measurement points was observed, indicating some savings of exe= cution clock cycles. >=20 Hi Mike, the commit log message above seems a bit awkward to read. If I understand i= t correctly, would the below suggestion be a shorter, clearer equivalent? Prefetch RX and TX queue variables in ethdev before driver function call This has been measured to produce higher throughput and reduced latency in RFC 2544 throughput tests. Or perhaps you could suggest yourself some similar wording. It would also b= e good to clarify with what applications the improvements were seen - was i= t using testpmd or l3fwd or something else? Regards, /Bruce