From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by dpdk.org (Postfix) with ESMTP id 8C9D011A2 for ; Fri, 15 Apr 2016 01:51:39 +0200 (CEST) Received: by mail-wm0-f46.google.com with SMTP id a140so9406760wma.0 for ; Thu, 14 Apr 2016 16:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=Kx10+QvHye59JZFb/1FBsPewMkBe4go9o/+pbcZ0aU8=; b=tOoskTUGwIjCVBfZyUoMb7voJan4AxPd5bx4drI9Zeqt4VWLwNVbW2Yrqir1wJ5j88 hSF34q5KLTQzUcNIfLpY9N/b957o4JjpzLGnuOj6seNHiV7V/SCCpTNmN3LsznnzSkBy g1uPtqQKM3QhW9TSmtuYfnvMQVw6wMYlP87x3d7ZBaeMyWRSRH6JRvzNH+BnBCT0cFzY ryZrljvOT/Hua1+rblMXD0a3RoePxvmolzcs8WsWLR9l6w6U0so7/t8zhlWIzBL+BgoE VFU20M0M111axeXiBkGbyQdnJEOWdPTSzEnm95fp6+RjH/Tu7pd9aO2LAAVb8iH7i8bQ VVQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=Kx10+QvHye59JZFb/1FBsPewMkBe4go9o/+pbcZ0aU8=; b=QMHCGfv+I64S/y+jhabCVZOI+u/E9iCK5XLHzZfEtPJoyxI8R7jJ+xC47ZBRLuDT5N zFPwixCB6XqNV1R/CiF1eyvFAWpqC04bC5v3wAYxEB1DbHZvRbllIVMc2w5yRL+W7bIl iCbC5iYRamD/tjtq0Eo3imZjxWOnOpQZs6bsTB5/jUHNjxBJtfgPMaUFFGZxh2P45dob H6fIY6Kvt00hGvZfSr3ixhdRwFq95ACL1atKXQTBj6nlMuExxaXMT3XodmmaQX/NV0Xv qtJk57D8f9H3+yxr5HYk5Hbe1B4ZI57p5BaIH1um2qLuidS1OhWPwEu8Ev1C7JTfjqew jxSA== X-Gm-Message-State: AOPr4FVkZ8sthXzL06Ls0/ec5LRP5yGRmjojf6G2LYARvB1M1ntFB8Np0GanduWT397wpDo4ozuHRIHnsU5V3w== MIME-Version: 1.0 X-Received: by 10.28.179.84 with SMTP id c81mr1020033wmf.13.1460677899364; Thu, 14 Apr 2016 16:51:39 -0700 (PDT) Received: by 10.28.19.134 with HTTP; Thu, 14 Apr 2016 16:51:39 -0700 (PDT) In-Reply-To: References: <9485D7B0-E2EA-4D23-BBD9-6D233BDF8E29@gmail.com> <6EBF3C5F-D1A0-4E49-9A16-7FDB2F15E46C@gmail.com> <8CD7A8EE-BCAF-4107-9CEA-8B696B7F4A5C@gmail.com> <88A92D351643BA4CB23E303155170626150A9974@SHSMSX101.ccr.corp.intel.com> <88A92D351643BA4CB23E303155170626150A9C16@SHSMSX101.ccr.corp.intel.com> <88A92D351643BA4CB23E303155170626150A9E02@SHSMSX101.ccr.corp.intel.com> Date: Fri, 15 Apr 2016 02:51:39 +0300 Message-ID: From: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCa0LjRgdC10LvQtdCy?= To: "Hu, Xuekun" , Shawn Lewis Cc: "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Lcore impact X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2016 23:51:39 -0000 I found out the cause of context-switches and iTLB misses. I ran the socket reading application on the same host with dpdk app. Sorry guys I was a fool :) Thank you for your help. Shawn you was right from the beginning I should've think more carefully about other applications running on the same host. 2016-04-15 1:47 GMT+03:00 =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4= =D1=80 =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 : > Yes. 31% is ITLB-load-misses. My cpu is Intel(R) Core(TM) i5-2400 CPU @ > 3.10GHz. > There is no other big differences. I would say there is no other little > differences too. The numbers are about the same. > > I also noticed another strange thing: once I start sockets operations > perf context-switches counter increases in the ALL sibling thread > corresponded to dpdk lcores. But why? Only one thread is doing socket > operation and invoke system calls, so I expected context-switches to occu= r > only in that thread, not in the all threads. > > 2016-04-15 1:09 GMT+03:00 Hu, Xuekun : > >> I think 31.09% means ITLB-load-misses, right? To be strait forward, yes, >> this count means code misses is high that code footprint is big. For >> example, function A call function B, while the code address of B is far >> away from A. >> >> >> >> Is there any other big difference? Like L2/L3 cache miss? Actually I >> don=E2=80=99t expect the iTLB-load-misses could have that big impact (10= % packet >> loss). >> >> >> >> BTW. What=E2=80=99s your CPU? >> >> >> >> *From:* =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=9A=D0= =B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 [mailto:kiselev99@gmail.com] >> *Sent:* Friday, April 15, 2016 5:33 AM >> *To:* Hu, Xuekun >> >> *Cc:* users@dpdk.org >> *Subject:* Re: [dpdk-users] Lcore impact >> >> >> >> I've done my homework with perf and the results show that >> iTLB-load-misses value is very high. In the tests without socket operati= ons >> the processing lcore has 0.87% of all iTLB cache hits and there is no >> packet loss. In the test WITH socket operations the processing lcore >> has 31.09% of all iTLB cache hits and there is about 10% packet loss. Ho= w >> to interpret with results? Google shows a little about iTLB. So far some >> web pages suggest the following: >> >> "Try to minimize the size of the source code and locality so that >> instructions span a minimum number of pages, and so that the instruction >> span is less then the number of ITLB entries." >> >> >> >> Any ideas? >> >> >> >> >> >> >> >> >> >> >> >> 2016-04-14 23:43 GMT+03:00 Hu, Xuekun : >> >> Perf could. Or PCM, that is also a good tool. >> https://software.intel.com/en-us/articles/intel-performance-counter-moni= tor-a-better-way-to-measure-cpu-utilization >> >> >> >> *From:* =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=9A=D0= =B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 [mailto:kiselev99@gmail.com] >> *Sent:* Friday, April 15, 2016 3:31 AM >> *To:* Hu, Xuekun >> *Cc:* Shawn Lewis; users@dpdk.org >> >> >> *Subject:* Re: [dpdk-users] Lcore impact >> >> >> >> >> >> >> >> 2016-04-14 20:49 GMT+03:00 Hu, Xuekun : >> >> Are the two lcore belonging to one processor, or two processors? What th= e >> memory footprint is for the system call threads? If the memory footprint= is >> big (>LLC cache size) and two locre are in the same processor, then it >> could have impact on packet processing thread. >> >> >> >> Those two lcores belong to one processor and it's a single processor >> machine. >> >> >> >> Both cores allocates a lot of memory and use the full dpdk arsenal: lpm, >> mempools, hashes and etc. But during the test the core doing socket data >> transfering is using only small 16k buffer for sending and sending is th= e >> all it does during the test. It doesn't use any other allocated memory >> structures. The processing core in turn is using rte_lpm whitch is big, = but >> in my test there are only about 10 routes in it, so I think the amount >> "hot" memory is not very big. But I can't say if it's bigger than l3 cpu >> cache or not. Should I use some profilers and see if socket operations >> cause a lot of cache miss in the processing lcore? It there some tool th= at >> allows me to do that? perf maybe? >> >> >> >> >> >> >> >> -----Original Message----- >> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Alexander >> Kiselev >> Sent: Friday, April 15, 2016 1:19 AM >> To: Shawn Lewis >> Cc: users@dpdk.org >> Subject: Re: [dpdk-users] Lcore impact >> >> I've already seen this documen and have used this tricks a lot of times. >> But this time I send data locally over localhost. There is even no nics >> bind to linux in my machine. Therefore there is no nics interruptions I = can >> pin to cpu. So what do you propose? >> >> > 14 =D0=B0=D0=BF=D1=80. 2016 =D0=B3., =D0=B2 20:06, Shawn Lewis =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >> > >> > You have to work with IRQBalancer as well >> > >> > >> http://www.intel.com/content/dam/doc/application-note/82575-82576-82598-= 82599-ethernet-controllers-interrupts-appl-note.pdf >> > >> > Is just an example document which discuss this (not so much DPDK >> related)... But the OS will attempt to balance the interrupts when you >> actually want to remove or pin them down... >> > >> >> On Thu, Apr 14, 2016 at 1:02 PM, Alexander Kiselev < >> kiselev99@gmail.com> wrote: >> >> >> >> >> >>> 14 =D0=B0=D0=BF=D1=80. 2016 =D0=B3., =D0=B2 19:35, Shawn Lewis =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >> >>> >> >>> Lots of things... >> >>> >> >>> One just because you have a process running on an lcore, does not >> mean thats all that runs on it. Unless you have told the kernel at boot= to >> NOT use those specific cores, those cores will be used for many things O= S >> related. >> >> >> >> Generally yes, but unless I start sending data to socket there is no >> packet loss. I did about 10 test runs in a raw and everythis was ok. An= d >> there is no other application running on that test machine that uses cpu >> cores. >> >> >> >> So the question is why this socket operations influence the other >> lcore? >> >> >> >>> >> >>> IRQBlance >> >>> System OS operations. >> >>> Other Applications. >> >>> >> >>> So by doing file i/o you are generating interrupts, where those >> interrupts get serviced is up to IRQBalancer. So could be any one of yo= ur >> cores. >> >> >> >> That is a good point. I can use cpu affinity feature to bind >> unterruption handler to the core not used in my test. But I send data >> locally over localhost. Is it possible to use cpu affinity in that case? >> >> >> >>> >> >>> >> >>> >> >>>> On Thu, Apr 14, 2016 at 12:31 PM, Alexander Kiselev < >> kiselev99@gmail.com> wrote: >> >>>> Could someone give me any hints about what could cause permormance >> issues in a situation where one lcore doing a lot of linux system calls >> (read/write on socket) slow down the other lcore doing packet forwarding= ? >> In my test the forwarding lcore doesn't share any memory structures with >> the other lcore that sends test data to socket. Both lcores pins to >> different processors cores. So therotically they shouldn't have any impa= ct >> on each other but they do, once one lcore starts sending data to socket = the >> other lcore starts dropping packets. Why? >> >>> >> > >> >> >> >> >> >> -- >> >> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, >> =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1= =81=D0=B0=D0=BD=D0=B4=D1=80 >> >> >> >> >> >> -- >> >> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, >> =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1= =81=D0=B0=D0=BD=D0=B4=D1=80 >> > > > > -- > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, > =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1=81= =D0=B0=D0=BD=D0=B4=D1=80 > --=20 =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1=81= =D0=B0=D0=BD=D0=B4=D1=80