From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it1-f180.google.com (mail-it1-f180.google.com [209.85.166.180]) by dpdk.org (Postfix) with ESMTP id C5AED1B442 for ; Fri, 4 Jan 2019 06:57:58 +0100 (CET) Received: by mail-it1-f180.google.com with SMTP id w18so364033ite.1 for ; Thu, 03 Jan 2019 21:57:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SnYodetsrEPsntdyrJ3hb6KWbwDzFwcKoR5jDF2tsiI=; b=r8F9SwaY+QKlnz0lNKteypdLrYiVPdvzmBLHY3Iv7MGEyPfJ3GwJ+ofJ0Nqqj/2Z2K WZGLiQOzWGMeg5UQsA6qIFvDwmkX7XacpRHKXTKOPgwGK8ssnxbRgVpgKhe0aNsPahlO grlvXtsAr917+pyNgbed8PGx5mVMS24c/NsPwvfEFqPUygH9SCioFotfi/sruo71uHDw GisS68AJNNATr59isi7eDbBDo0v3DnqJ4EJdI2RyfajXX5US77wSRGIEsgFj4SamXYHd Utg5NJLOD0BktEu7/CEDV1KxT1HA+g86tiSHwRubPRHrH3cgVK0PbPcuQAE1JnzOM9H+ dKXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SnYodetsrEPsntdyrJ3hb6KWbwDzFwcKoR5jDF2tsiI=; b=agXilixV8T7bKM1/4yDVwDMLykdXpECr/21beFuUB9fvNB9Et6eJAFNZ2IyImlV6Nx OlL6cryMOZ4pEaeXT3D/mp25WZeyNgbdSP6Z0xR8XM9DNAazF/DhPGhwmMilyy/r2DVF vOGVCw4PIIh1rrlU2V7rKpeIyHRUoZERZdgrwWd957ifSziDFijGKJ5jq7RW2ibW93mW 9DqvSgd/eCkNVfW3dtJurawUN6pCUnZnRsg1FnBXpLfTmWXalfzgefxm353012B88X69 sM3VqzZ3a4aKf9O34cKV7M+qusRenemYs2IZ69XAzEYhf2Ld547HmO90JQ535bAC5MMT q1NQ== X-Gm-Message-State: AJcUukfSKHSEK90iU9WhmmCGtyjsZWs+biGL3f1lcEzCfkLmdkyONEe1 Hq6z9wgDrQgYr1Nme1VvjQVkLQg2IxNZAVtHJFk= X-Google-Smtp-Source: ALg8bN6BK2h9oWi1YdFTn5Yps3uBLzI9oYshymFNLvg3TG8FHooKrqF2w9Z/uq1A8ZNT1llkRXMuHe4P72xXsMOSaYo= X-Received: by 2002:a24:1f0d:: with SMTP id d13mr168141itd.140.1546581477870; Thu, 03 Jan 2019 21:57:57 -0800 (PST) MIME-Version: 1.0 References: <71CBA720-633D-4CFE-805C-606DAAEDD356@intel.com> <3C60E59D-36AD-4382-8CC3-89D4EEB0140D@intel.com> <76959924-D9DB-4C58-BB05-E33107AD98AC@intel.com> <485F0372-7486-473B-ACDA-F42A2D86EF03@intel.com> <34E92C48-A90C-472C-A915-AAA4A6B5CDE8@intel.com> <20181124203541.4aa9bbf2@xeon-e3> <1B6F92FD-D742-4377-896A-8D7DA6AAF799@intel.com> <72A7DD4D-35FD-4247-805D-E9A736B1C9B6@intel.com> <5F05CD7D-2EAB-476A-99B6-031CF835BA37@intel.com> In-Reply-To: <5F05CD7D-2EAB-476A-99B6-031CF835BA37@intel.com> From: Harsh Patel Date: Fri, 4 Jan 2019 11:27:45 +0530 Message-ID: To: "Wiles, Keith" Cc: Stephen Hemminger , Kyle Larose , users@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Query on handling packets X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jan 2019 05:57:59 -0000 Yes that would be helpful. It'd be ok for now to use the same dpdk version to overcome the build issues. We will look into updating the code for latest versions once we get past this problem. Thank you very much. Regards, Harsh & Hrishikesh On Fri, Jan 4, 2019, 04:13 Wiles, Keith wrote: > > > > On Jan 3, 2019, at 12:12 PM, Harsh Patel > wrote: > > > > Hi > > > > We applied your suggestion of removing the `IsLinkUp()` call. But the > performace is even worse. We could only get around 340kbits/s. > > > > The Top Hotspots are: > > > > Function Module CPU Time > > eth_em_recv_pkts librte_pmd_e1000.so 15.106s > > rte_delay_us_block librte_eal.so.6.1 7.372s > > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 5.080= s > > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 3.558s > > ns3::DpdkNetDeviceReader::DoRead libns3.28.1-fd-net-device-debug.so > 3.364s > > [Others] 4.760s > > Performance reduced by removing that link status check, that is weird. > > > > Upon checking the callers of `rte_delay_us_block`, we got to know that > most of the time (92%) spent in this function is during initialization. > > This does not waste our processing time during communication. So, it's = a > good start to our optimization. > > > > Callers CPU Time: Total CPU Time: Self > > rte_delay_us_block 100.0% 7.372s > > e1000_enable_ulp_lpt_lp 92.3% 6.804s > > e1000_write_phy_reg_mdic 1.8% 0.136s > > e1000_reset_hw_ich8lan 1.7% 0.128s > > e1000_read_phy_reg_mdic 1.4% 0.104s > > eth_em_link_update 1.4% 0.100s > > e1000_get_cfg_done_generic 0.7% 0.052s > > e1000_post_phy_reset_ich8lan.part.18 0.7% 0.048s > > I guess you are having vTune start your application and that is why you > have init time items in your log. I normally start my application and the= n > attach vtune to the application. One of the options in configuration of > vtune for that project is to attach to the application. Maybe it would he= lp > hear. > > Looking at the data you provided it was ok. The problem is it would not > load the source files as I did not have the same build or executable. I > tried to build the code, but it failed to build and I did not go further.= I > guess I would need to see the full source tree and the executable you use= d > to really look at the problem. I have limited time, but I can try if you > like. > > > > > > Effective CPU Utilization: 21.4% (0.856 out of 4) > > > > Here is the link to vtune profiling results. > https://drive.google.com/open?id=3D1M6g2iRZq2JGPoDVPwZCxWBo7qzUhvWi5 > > > > Thank you > > > > Regards > > > > On Sun, Dec 30, 2018, 06:00 Wiles, Keith wrote: > > > > > > > On Dec 29, 2018, at 4:03 PM, Harsh Patel > wrote: > > > > > > Hello, > > > As suggested, we tried profiling the application using Intel VTune > Amplifier. We aren't sure how to use these results, so we are attaching > them to this email. > > > > > > The things we understood were 'Top Hotspots' and 'Effective CPU > utilization'. Following are some of our understandings: > > > > > > Top Hotspots > > > > > > Function Module CPU Time > > > rte_delay_us_block librte_eal.so.6.1 15.042s > > > eth_em_recv_pkts librte_pmd_e1000.so 9.544s > > > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so > 3.522s > > > ns3::DpdkNetDeviceReader::DoRead > libns3.28.1-fd-net-device-debug.so 2.470s > > > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 2.456= s > > > [Others] 6.656s > > > > > > We knew about other methods except `rte_delay_us_block`. So we > investigated the callers of this method: > > > > > > Callers Effective Time Spin Time Overhead Time Effective > Time Spin Time Overhead Time Wait Time: Total Wait Time: > Self > > > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec 0usec > > > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s 0usec > 0usec > > > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec 0usec > > > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec 0usec > > > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec 0usec > > > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% > 0.064s 0usec 0usec > > > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s 0usec > 0usec > > > > > > We lack sufficient knowledge to investigate more than this. > > > > > > Effective CPU utilization > > > > > > Interestingly, the effective CPU utilization was 20.8% (0.832 out of = 4 > logical CPUs). We thought this is less. So we compared this with the > raw-socket version of the code, which was even less, 8.0% (0.318 out of 4 > logical CPUs), and even then it is performing way better. > > > > > > It would be helpful if you give us insights on how to use these > results or point us to some resources to do so. > > > > > > Thank you > > > > > > > BTW, I was able to build ns3 with DPDK 18.11 it required a couple > changes in the DPDK init code in ns3 plus one hack in rte_mbuf.h file. > > > > I did have a problem including rte_mbuf.h file into your code. It > appears the g++ compiler did not like referencing the struct rte_mbuf_sch= ed > inside the rte_mbuf structure. The rte_mbuf_sched was inside the big unio= n > as a hack I moved the struct outside of the rte_mbuf structure and replac= ed > the struct in the union with =E2=80=99struct rte_mbuf_sched sched;', but = I am > guessing you are missing some compiler options in your build system as DP= DK > builds just fine without that hack. > > > > The next place was the rxmode and the txq_flags. The rxmode structure > has changed and I commented out the inits in ns3 and then commented out t= he > txq_flags init code as these are now the defaults. > > > > Regards, > > Keith > > > > Regards, > Keith > >