From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it1-f179.google.com (mail-it1-f179.google.com [209.85.166.179]) by dpdk.org (Postfix) with ESMTP id 422841B1FB for ; Tue, 5 Feb 2019 15:33:28 +0100 (CET) Received: by mail-it1-f179.google.com with SMTP id c9so8663156itj.1 for ; Tue, 05 Feb 2019 06:33:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8tpGy82PPo4bFw+/3uyDTflEcpOu5ltnX/PeYJC0sV0=; b=VEh4RwZhqwVIwzVakSJLlODNLYSPIjQF50a+Mpuv4onxdimUaCyn11F79SBhXW5wo8 yzwqZVVpO/L3ppTgmIMUiHSb1VozQFnBkfjk6IUyjUEMbpWwkBLGYVYccWgax6zjHH+Y d56gK/MNoHsXXDNgZizq6reCXuXdo5WoexhNbffYa8Etl3XMWy58O/oMVeBezlQWIYJu 1df6qp8UJfetXP060NxaUi/65lmyqOxqjnZ9vl9WXqL/uxq1GzKG6BLoct+bmzPLsA8k yrU603QfKysqwjERohNzJDLbJ8794xQLsn4Zf5pQoaabu5S6IZO6jRnBEbUb9gMlPBwE C3fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8tpGy82PPo4bFw+/3uyDTflEcpOu5ltnX/PeYJC0sV0=; b=g4meqbgf058GvvnnHHQg5fj2w2kjbujL7KrSSlDt5jicyl+1AlQpO+4JcZABIue5hu zbCqmOcLPDLjRni3FSPgiuLb9LjLbtAVCeGUdT+7BshZ9wHjBTV9GU5FJmVf0GFvPTJk fiAVe5wB/iDhg63C5O9Qt4Fpq3dRN7Or3PwKfO7c3Q508m27K4LRdrwGSdUNr+Q6icm4 ysVn43cFCKfURIWKSwYA25v7nXb8kwPsiaPPsLexJwdEoC8jjYamhUuLwJ1cilR9vMGt zqTfupGRkENOHdtNpupOmrDzbXE1vhVzGesConiu37TRK7hTkI61r/s58coWq/BWlmy7 GG2Q== X-Gm-Message-State: AHQUAuYiL7e0luMKTXc8IBddZEkNINjijRSLuIX5qWquzcKCG5zmiP4x TrHVTnNkiALYgcCVKW1WvI3JKHhVch7LbgYXmL0= X-Google-Smtp-Source: AHgI3IaSEevWgkhamyvtU9udTaTU4fZ/DdrPyaaJ7lX4DRoWJOw04jC3aXL3lN1HOK6d9KUBMWXynVj8V3og+8plPxA= X-Received: by 2002:a6b:1807:: with SMTP id 7mr2667054ioy.167.1549377207242; Tue, 05 Feb 2019 06:33:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Harsh Patel Date: Tue, 5 Feb 2019 20:03:14 +0530 Message-ID: To: "Wiles, Keith" Cc: Stephen Hemminger , Kyle Larose , "users@dpdk.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Query on handling packets X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2019 14:33:28 -0000 Cool. Thanks a lot. We'll do that. On Tue, Feb 5, 2019, 19:57 Wiles, Keith wrote: > > > > On Feb 5, 2019, at 8:22 AM, Harsh Patel > wrote: > > > > Can you help us with those questions we asked you? We need them as > parameters for our testing. > > i would love to but i do not know much about what you are asking, sorry. > > i hope someone else steps in, maybe the pmd maintainer could help. look i= n > the maintainers file and message him directly. > > > > Thanks, > > Harsh & Hrishikesh > > > > On Tue, Feb 5, 2019, 19:42 Wiles, Keith wrote: > > > > > > > On Feb 5, 2019, at 8:00 AM, Harsh Patel > wrote: > > > > > > Hi, > > > One of the mistake was as following. ns-3 frees the packet buffer jus= t > as it writes to the socket and thus we thought that we should also do the > same. But dpdk while writing places the packet buffer to the tx descripto= r > ring and perform the transmission after that on its own. And we were > freeing early so sometimes the packets were lost i.e. freed before > transmission. > > > > > > Another thing was that as you suggested earlier we compiled the whole > ns-3 in optimized mode. That improved the performance. > > > > > > These 2 things combined got us the desired results. > > > > Excellent thanks > > > > > > Regards, > > > Harsh & Hrishikesh > > > > > > On Tue, Feb 5, 2019, 18:33 Wiles, Keith wrote= : > > > > > > > > > > On Feb 5, 2019, at 12:37 AM, Harsh Patel > wrote: > > > > > > > > Hi, > > > > > > > > We would like to inform you that our code is working as expected an= d > we are able to obtain 95-98 Mbps data rate for a 100Mbps application rate= . > We are now working on the testing of the code. Thanks a lot, especially t= o > Keith for all the help you provided. > > > > > > > > We have 2 main queries :- > > > > 1) We wanted to calculate Backlog at the NIC Tx Descriptors but wer= e > not able to find anything in the documentation. Can you help us in how to > calculate the backlog? > > > > 2) We searched on how to use Byte Queue Limit (BQL) on the NIC queu= e > but couldn't find anything like that in DPDK. Does DPDK support BQL? If s= o, > can you help us on how to use it for our project? > > > > > > what was the last set of problems if I may ask? > > > > > > > > Thanks & Regards > > > > Harsh & Hrishikesh > > > > > > > > On Thu, 31 Jan 2019 at 22:28, Wiles, Keith > wrote: > > > > > > > > > > > > Sent from my iPhone > > > > > > > > On Jan 30, 2019, at 5:36 PM, Harsh Patel > wrote: > > > > > > > >> Hello, > > > >> > > > >> This mail is to inform you that the integration of DPDK is working > with ns-3 on a basic level. The model is running. > > > >> For UDP traffic we are getting throughput same or better than raw > socket. (Around 100Mbps) > > > >> But unfortunately for TCP, there are burst packet losses due to > which the throughput is drastically affected after some point of time. Th= e > bandwidth of the link used was 100Mbps. > > > >> We have obtained cwnd and ssthresh graphs which show that once the > flow gets out from Slow Start mode, there are so many packet losses that > the congestion window & the slow start threshold is not able to go above > 4-5 packets. > > > > > > > > Can you determine where the packets are being dropped? > > > >> We have attached the graphs with this mail. > > > >> > > > > > > > > I do not see the graphs attached but that=E2=80=99s OK. > > > >> We would like to know if there is any reason to this or how can we > fix this. > > > > > > > > I think we have to find out where the packets are being dropped thi= s > is the only reason for the case to your referring to. > > > >> > > > >> Thanks & Regards > > > >> Harsh & Hrishikesh > > > >> > > > >> On Wed, 16 Jan 2019 at 19:25, Harsh Patel > wrote: > > > >> Hi > > > >> > > > >> We were able to optimise the DPDK version. There were couple of > things we needed to do. > > > >> > > > >> We were using tx timeout as 1s/2048, which we found out to be very > less. Then we increased the timeout, but we were getting lot of > retransmissions. > > > >> > > > >> So we removed the timeout and sent single packet as soon as we get > it. This increased the throughput. > > > >> > > > >> Then we used DPDK feature to launch function on core, and gave a > dedicated core for Rx. This increased the throughput further. > > > >> > > > >> The code is working really well for low bandwidth (<~50Mbps) and i= s > outperforming raw socket version. > > > >> But for high bandwidth, we are getting packet length mismatches fo= r > some reason. We are investigating it. > > > >> > > > >> We really thank you for the suggestions given by you and also for > keeping the patience for last couple of months. > > > >> > > > >> Thank you > > > >> > > > >> Regards, > > > >> Harsh & Hrishikesh > > > >> > > > >> On Fri, Jan 4, 2019, 11:27 Harsh Patel > wrote: > > > >> Yes that would be helpful. > > > >> It'd be ok for now to use the same dpdk version to overcome the > build issues. > > > >> We will look into updating the code for latest versions once we ge= t > past this problem. > > > >> > > > >> Thank you very much. > > > >> > > > >> Regards, > > > >> Harsh & Hrishikesh > > > >> > > > >> On Fri, Jan 4, 2019, 04:13 Wiles, Keith > wrote: > > > >> > > > >> > > > >> > On Jan 3, 2019, at 12:12 PM, Harsh Patel < > thadodaharsh10@gmail.com> wrote: > > > >> > > > > >> > Hi > > > >> > > > > >> > We applied your suggestion of removing the `IsLinkUp()` call. Bu= t > the performace is even worse. We could only get around 340kbits/s. > > > >> > > > > >> > The Top Hotspots are: > > > >> > > > > >> > Function Module CPU Time > > > >> > eth_em_recv_pkts librte_pmd_e1000.so 15.106s > > > >> > rte_delay_us_block librte_eal.so.6.1 7.372s > > > >> > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so > 5.080s > > > >> > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 3.558s > > > >> > ns3::DpdkNetDeviceReader::DoRead > libns3.28.1-fd-net-device-debug.so 3.364s > > > >> > [Others] 4.760s > > > >> > > > >> Performance reduced by removing that link status check, that is > weird. > > > >> > > > > >> > Upon checking the callers of `rte_delay_us_block`, we got to kno= w > that most of the time (92%) spent in this function is during initializati= on. > > > >> > This does not waste our processing time during communication. So= , > it's a good start to our optimization. > > > >> > > > > >> > Callers CPU Time: Total CPU Time: Self > > > >> > rte_delay_us_block 100.0% 7.372s > > > >> > e1000_enable_ulp_lpt_lp 92.3% 6.804s > > > >> > e1000_write_phy_reg_mdic 1.8% 0.136s > > > >> > e1000_reset_hw_ich8lan 1.7% 0.128s > > > >> > e1000_read_phy_reg_mdic 1.4% 0.104s > > > >> > eth_em_link_update 1.4% 0.100s > > > >> > e1000_get_cfg_done_generic 0.7% 0.052s > > > >> > e1000_post_phy_reset_ich8lan.part.18 0.7% 0.048s > > > >> > > > >> I guess you are having vTune start your application and that is wh= y > you have init time items in your log. I normally start my application and > then attach vtune to the application. One of the options in configuration > of vtune for that project is to attach to the application. Maybe it would > help hear. > > > >> > > > >> Looking at the data you provided it was ok. The problem is it woul= d > not load the source files as I did not have the same build or executable.= I > tried to build the code, but it failed to build and I did not go further.= I > guess I would need to see the full source tree and the executable you use= d > to really look at the problem. I have limited time, but I can try if you > like. > > > >> > > > > >> > > > > >> > Effective CPU Utilization: 21.4% (0.856 out of 4) > > > >> > > > > >> > Here is the link to vtune profiling results. > https://drive.google.com/open?id=3D1M6g2iRZq2JGPoDVPwZCxWBo7qzUhvWi5 > > > >> > > > > >> > Thank you > > > >> > > > > >> > Regards > > > >> > > > > >> > On Sun, Dec 30, 2018, 06:00 Wiles, Keith > wrote: > > > >> > > > > >> > > > > >> > > On Dec 29, 2018, at 4:03 PM, Harsh Patel < > thadodaharsh10@gmail.com> wrote: > > > >> > > > > > >> > > Hello, > > > >> > > As suggested, we tried profiling the application using Intel > VTune Amplifier. We aren't sure how to use these results, so we are > attaching them to this email. > > > >> > > > > > >> > > The things we understood were 'Top Hotspots' and 'Effective CP= U > utilization'. Following are some of our understandings: > > > >> > > > > > >> > > Top Hotspots > > > >> > > > > > >> > > Function Module CPU Time > > > >> > > rte_delay_us_block librte_eal.so.6.1 15.042s > > > >> > > eth_em_recv_pkts librte_pmd_e1000.so 9.544s > > > >> > > ns3::DpdkNetDevice::Read > libns3.28.1-fd-net-device-debug.so 3.522s > > > >> > > ns3::DpdkNetDeviceReader::DoRead > libns3.28.1-fd-net-device-debug.so 2.470s > > > >> > > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so > 2.456s > > > >> > > [Others] 6.656s > > > >> > > > > > >> > > We knew about other methods except `rte_delay_us_block`. So we > investigated the callers of this method: > > > >> > > > > > >> > > Callers Effective Time Spin Time Overhead Time > Effective Time Spin Time Overhead Time Wait Time: Total > Wait Time: Self > > > >> > > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec > 0usec > > > >> > > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s > 0usec 0usec > > > >> > > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec > 0usec > > > >> > > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec > 0usec > > > >> > > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec > 0usec > > > >> > > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% > 0.064s 0usec 0usec > > > >> > > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s > 0usec 0usec > > > >> > > > > > >> > > We lack sufficient knowledge to investigate more than this. > > > >> > > > > > >> > > Effective CPU utilization > > > >> > > > > > >> > > Interestingly, the effective CPU utilization was 20.8% (0.832 > out of 4 logical CPUs). We thought this is less. So we compared this with > the raw-socket version of the code, which was even less, 8.0% (0.318 out = of > 4 logical CPUs), and even then it is performing way better. > > > >> > > > > > >> > > It would be helpful if you give us insights on how to use thes= e > results or point us to some resources to do so. > > > >> > > > > > >> > > Thank you > > > >> > > > > > >> > > > > >> > BTW, I was able to build ns3 with DPDK 18.11 it required a coupl= e > changes in the DPDK init code in ns3 plus one hack in rte_mbuf.h file. > > > >> > > > > >> > I did have a problem including rte_mbuf.h file into your code. I= t > appears the g++ compiler did not like referencing the struct rte_mbuf_sch= ed > inside the rte_mbuf structure. The rte_mbuf_sched was inside the big unio= n > as a hack I moved the struct outside of the rte_mbuf structure and replac= ed > the struct in the union with =E2=80=99struct rte_mbuf_sched sched;', but = I am > guessing you are missing some compiler options in your build system as DP= DK > builds just fine without that hack. > > > >> > > > > >> > The next place was the rxmode and the txq_flags. The rxmode > structure has changed and I commented out the inits in ns3 and then > commented out the txq_flags init code as these are now the defaults. > > > >> > > > > >> > Regards, > > > >> > Keith > > > >> > > > > >> > > > >> Regards, > > > >> Keith > > > >> > > > >> > > > >> > > > > > > Regards, > > > Keith > > > > > > > Regards, > > Keith > > > > Regards, > Keith > >