From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 66E851B451 for ; Thu, 31 Jan 2019 17:58:27 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jan 2019 08:58:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,545,1539673200"; d="scan'208,217";a="271504927" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga004.jf.intel.com with ESMTP; 31 Jan 2019 08:58:26 -0800 Received: from fmsmsx117.amr.corp.intel.com ([169.254.3.160]) by fmsmsx107.amr.corp.intel.com ([169.254.6.186]) with mapi id 14.03.0415.000; Thu, 31 Jan 2019 08:58:25 -0800 From: "Wiles, Keith" To: Harsh Patel CC: Stephen Hemminger , Kyle Larose , "users@dpdk.org" Thread-Topic: [dpdk-users] Query on handling packets Thread-Index: AQHUdzydRRkBFdv4fkKjO7j2RJcyb6VGGaEAgACGp4CAAAyagIABE0+AgAFRlQCAAnP3gIACAjmAgAC+mQCAAZQ9gIAAFqaAgARlHYCAAMRLgIACmg8AgATaFACAAyZygIAA0tiAgAgmHoCAAHMbAIAETeGAgBHQrwCAAAcRAIAX1QuAgAApEICAB3IeAIAAS86AgAB5ToCAE2FWAIAWow+AgACdCIY= Date: Thu, 31 Jan 2019 16:58:25 +0000 Message-ID: <05A7519B-28EC-4A34-812E-A50A50F16A8A@intel.com> References: <71CBA720-633D-4CFE-805C-606DAAEDD356@intel.com> <3C60E59D-36AD-4382-8CC3-89D4EEB0140D@intel.com> <76959924-D9DB-4C58-BB05-E33107AD98AC@intel.com> <485F0372-7486-473B-ACDA-F42A2D86EF03@intel.com> <34E92C48-A90C-472C-A915-AAA4A6B5CDE8@intel.com> <20181124203541.4aa9bbf2@xeon-e3> <1B6F92FD-D742-4377-896A-8D7DA6AAF799@intel.com> <72A7DD4D-35FD-4247-805D-E9A736B1C9B6@intel.com> <5F05CD7D-2EAB-476A-99B6-031CF835BA37@intel.com> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Query on handling packets X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2019 16:58:33 -0000 Sent from my iPhone On Jan 30, 2019, at 5:36 PM, Harsh Patel > wrote: Hello, This mail is to inform you that the integration of DPDK is working with ns-= 3 on a basic level. The model is running. For UDP traffic we are getting throughput same or better than raw socket. (= Around 100Mbps) But unfortunately for TCP, there are burst packet losses due to which the t= hroughput is drastically affected after some point of time. The bandwidth o= f the link used was 100Mbps. We have obtained cwnd and ssthresh graphs which show that once the flow get= s out from Slow Start mode, there are so many packet losses that the conges= tion window & the slow start threshold is not able to go above 4-5 packets. Can you determine where the packets are being dropped? We have attached the graphs with this mail. I do not see the graphs attached but that=92s OK. We would like to know if there is any reason to this or how can we fix this= . I think we have to find out where the packets are being dropped this is the= only reason for the case to your referring to. Thanks & Regards Harsh & Hrishikesh On Wed, 16 Jan 2019 at 19:25, Harsh Patel > wrote: Hi We were able to optimise the DPDK version. There were couple of things we n= eeded to do. We were using tx timeout as 1s/2048, which we found out to be very less. Th= en we increased the timeout, but we were getting lot of retransmissions. So we removed the timeout and sent single packet as soon as we get it. This= increased the throughput. Then we used DPDK feature to launch function on core, and gave a dedicated = core for Rx. This increased the throughput further. The code is working really well for low bandwidth (<~50Mbps) and is outperf= orming raw socket version. But for high bandwidth, we are getting packet length mismatches for some re= ason. We are investigating it. We really thank you for the suggestions given by you and also for keeping t= he patience for last couple of months. Thank you Regards, Harsh & Hrishikesh On Fri, Jan 4, 2019, 11:27 Harsh Patel > wrote: Yes that would be helpful. It'd be ok for now to use the same dpdk version to overcome the build issue= s. We will look into updating the code for latest versions once we get past th= is problem. Thank you very much. Regards, Harsh & Hrishikesh On Fri, Jan 4, 2019, 04:13 Wiles, Keith > wrote: > On Jan 3, 2019, at 12:12 PM, Harsh Patel > wrote: > > Hi > > We applied your suggestion of removing the `IsLinkUp()` call. But the per= formace is even worse. We could only get around 340kbits/s. > > The Top Hotspots are: > > Function Module CPU Time > eth_em_recv_pkts librte_pmd_e1000.so 15.106s > rte_delay_us_block librte_eal.so.6.1 7.372s > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 5.080s > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 3.558s > ns3::DpdkNetDeviceReader::DoRead libns3.28.1-fd-net-device-debug.so 3.364s > [Others] 4.760s Performance reduced by removing that link status check, that is weird. > > Upon checking the callers of `rte_delay_us_block`, we got to know that mo= st of the time (92%) spent in this function is during initialization. > This does not waste our processing time during communication. So, it's a = good start to our optimization. > > Callers CPU Time: Total CPU Time: Self > rte_delay_us_block 100.0% 7.372s > e1000_enable_ulp_lpt_lp 92.3% 6.804s > e1000_write_phy_reg_mdic 1.8% 0.136s > e1000_reset_hw_ich8lan 1.7% 0.128s > e1000_read_phy_reg_mdic 1.4% 0.104s > eth_em_link_update 1.4% 0.100s > e1000_get_cfg_done_generic 0.7% 0.052s > e1000_post_phy_reset_ich8lan.part.18 0.7% 0.048s I guess you are having vTune start your application and that is why you hav= e init time items in your log. I normally start my application and then att= ach vtune to the application. One of the options in configuration of vtune = for that project is to attach to the application. Maybe it would help hear. Looking at the data you provided it was ok. The problem is it would not loa= d the source files as I did not have the same build or executable. I tried = to build the code, but it failed to build and I did not go further. I guess= I would need to see the full source tree and the executable you used to re= ally look at the problem. I have limited time, but I can try if you like. > > > Effective CPU Utilization: 21.4% (0.856 out of 4) > > Here is the link to vtune profiling results. https://drive.google.com/ope= n?id=3D1M6g2iRZq2JGPoDVPwZCxWBo7qzUhvWi5 > > Thank you > > Regards > > On Sun, Dec 30, 2018, 06:00 Wiles, Keith > wrote: > > > > On Dec 29, 2018, at 4:03 PM, Harsh Patel > wrote: > > > > Hello, > > As suggested, we tried profiling the application using Intel VTune Ampl= ifier. We aren't sure how to use these results, so we are attaching them to= this email. > > > > The things we understood were 'Top Hotspots' and 'Effective CPU utiliza= tion'. Following are some of our understandings: > > > > Top Hotspots > > > > Function Module CPU Time > > rte_delay_us_block librte_eal.so.6.1 15.042s > > eth_em_recv_pkts librte_pmd_e1000.so 9.544s > > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 3.522s > > ns3::DpdkNetDeviceReader::DoRead libns3.28.1-fd-net-device-debug= .so 2.470s > > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 2.456s > > [Others] 6.656s > > > > We knew about other methods except `rte_delay_us_block`. So we investig= ated the callers of this method: > > > > Callers Effective Time Spin Time Overhead Time Effective Time = Spin Time Overhead Time Wait Time: Total Wait Time: Self > > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec 0usec > > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s 0usec = 0usec > > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec 0usec > > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec 0usec > > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec 0usec > > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% 0.064s = 0usec 0usec > > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s 0usec = 0usec > > > > We lack sufficient knowledge to investigate more than this. > > > > Effective CPU utilization > > > > Interestingly, the effective CPU utilization was 20.8% (0.832 out of 4 = logical CPUs). We thought this is less. So we compared this with the raw-so= cket version of the code, which was even less, 8.0% (0.318 out of 4 logical= CPUs), and even then it is performing way better. > > > > It would be helpful if you give us insights on how to use these results= or point us to some resources to do so. > > > > Thank you > > > > BTW, I was able to build ns3 with DPDK 18.11 it required a couple changes= in the DPDK init code in ns3 plus one hack in rte_mbuf.h file. > > I did have a problem including rte_mbuf.h file into your code. It appears= the g++ compiler did not like referencing the struct rte_mbuf_sched inside= the rte_mbuf structure. The rte_mbuf_sched was inside the big union as a h= ack I moved the struct outside of the rte_mbuf structure and replaced the s= truct in the union with =92struct rte_mbuf_sched sched;', but I am guessing= you are missing some compiler options in your build system as DPDK builds = just fine without that hack. > > The next place was the rxmode and the txq_flags. The rxmode structure has= changed and I commented out the inits in ns3 and then commented out the tx= q_flags init code as these are now the defaults. > > Regards, > Keith > Regards, Keith