From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by dpdk.org (Postfix) with ESMTP id B28B32C09 for ; Fri, 15 Apr 2016 00:47:05 +0200 (CEST) Received: by mail-wm0-f47.google.com with SMTP id n3so8410959wmn.0 for ; Thu, 14 Apr 2016 15:47:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=XP55uVaMZfebJGPBqTtC5e5ZjIDdCyT7lAeuTZCaP9g=; b=tOJ5wnijXN+ERMWpEh7lRlsnyA94hxhqo8qvmZM6w6KlaGq/1raVw+aikcseRjk5o2 RyoD2JDK+OHpQ2nk/XZECD/N8DUg6vajFmm31Pn43eJ1kWQ0yaa+uea1r8l4TGyB1mSu w2Laev2hEcSAGpmzhPeYneU5KgSn1dcNknIIMeYWHfcQhltW/whML3IlU8eea3oEXQSp TFEKwRHVNJEXbXxR6TK9jEJIgX06pJJbP+cPNG/W0WFcVLJRQxtb2U+5g8c5aUTFHPsu XMB/IoV+3be/rjsAcwemXD+WqiieUgQFrmddCp7kq5oqYZgj44U6d/G0krE4CjM3cFYD 2uMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=XP55uVaMZfebJGPBqTtC5e5ZjIDdCyT7lAeuTZCaP9g=; b=aEkBMEE+gOq0IaU3qRRamS79SN6+ybFdC7VxaKL2hsN6tMB0Fo/lWXRDmio6Lgv3h3 e95JadGd9tZoOWgwnUQpTSVZoFl8aArBjgLYGJg7reFT1LIpLruuDBL2UIV6i259OrYv N1+WKruujFKlHSueHZ1gdll05JEuB1GLJ70p8QWgmce/2ujadqxjsartyp469nxDgbVH aHMz/hJ350ejj3+mQiEK0y/fJ3E6Bjcoe890EZWprnxBzYU59fXKbia8P7JZKbtf69Es ghT2S+CExLugpoMqx1KxBQzGRa+hDmLdlkRuX+7zO+mZtgi9KNX0whIrBKrMc0rRe2YE UfIw== X-Gm-Message-State: AOPr4FVXEl4QQ75mknWqb5NEH6/h1qY39uLmjERyFgjwpwCOdtc91b89QBK3+hXs2F5l7yxVAObk01O1uPpuyA== MIME-Version: 1.0 X-Received: by 10.28.179.84 with SMTP id c81mr846322wmf.13.1460674025420; Thu, 14 Apr 2016 15:47:05 -0700 (PDT) Received: by 10.28.19.134 with HTTP; Thu, 14 Apr 2016 15:47:05 -0700 (PDT) In-Reply-To: <88A92D351643BA4CB23E303155170626150A9E02@SHSMSX101.ccr.corp.intel.com> References: <9485D7B0-E2EA-4D23-BBD9-6D233BDF8E29@gmail.com> <6EBF3C5F-D1A0-4E49-9A16-7FDB2F15E46C@gmail.com> <8CD7A8EE-BCAF-4107-9CEA-8B696B7F4A5C@gmail.com> <88A92D351643BA4CB23E303155170626150A9974@SHSMSX101.ccr.corp.intel.com> <88A92D351643BA4CB23E303155170626150A9C16@SHSMSX101.ccr.corp.intel.com> <88A92D351643BA4CB23E303155170626150A9E02@SHSMSX101.ccr.corp.intel.com> Date: Fri, 15 Apr 2016 01:47:05 +0300 Message-ID: From: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCa0LjRgdC10LvQtdCy?= To: "Hu, Xuekun" Cc: "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Lcore impact X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2016 22:47:05 -0000 Yes. 31% is ITLB-load-misses. My cpu is Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz. There is no other big differences. I would say there is no other little differences too. The numbers are about the same. I also noticed another strange thing: once I start sockets operations perf context-switches counter increases in the ALL sibling thread corresponded to dpdk lcores. But why? Only one thread is doing socket operation and invoke system calls, so I expected context-switches to occur only in that thread, not in the all threads. 2016-04-15 1:09 GMT+03:00 Hu, Xuekun : > I think 31.09% means ITLB-load-misses, right? To be strait forward, yes, > this count means code misses is high that code footprint is big. For > example, function A call function B, while the code address of B is far > away from A. > > > > Is there any other big difference? Like L2/L3 cache miss? Actually I don= =E2=80=99t > expect the iTLB-load-misses could have that big impact (10% packet loss). > > > > BTW. What=E2=80=99s your CPU? > > > > *From:* =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=9A=D0= =B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 [mailto:kiselev99@gmail.com] > *Sent:* Friday, April 15, 2016 5:33 AM > *To:* Hu, Xuekun > > *Cc:* users@dpdk.org > *Subject:* Re: [dpdk-users] Lcore impact > > > > I've done my homework with perf and the results show that iTLB-load-misse= s > value is very high. In the tests without socket operations the processing > lcore has 0.87% of all iTLB cache hits and there is no packet loss. In th= e > test WITH socket operations the processing lcore has 31.09% of all iTLB > cache hits and there is about 10% packet loss. How to interpret with > results? Google shows a little about iTLB. So far some web pages suggest > the following: > > "Try to minimize the size of the source code and locality so that > instructions span a minimum number of pages, and so that the instruction > span is less then the number of ITLB entries." > > > > Any ideas? > > > > > > > > > > > > 2016-04-14 23:43 GMT+03:00 Hu, Xuekun : > > Perf could. Or PCM, that is also a good tool. > https://software.intel.com/en-us/articles/intel-performance-counter-monit= or-a-better-way-to-measure-cpu-utilization > > > > *From:* =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=9A=D0= =B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 [mailto:kiselev99@gmail.com] > *Sent:* Friday, April 15, 2016 3:31 AM > *To:* Hu, Xuekun > *Cc:* Shawn Lewis; users@dpdk.org > > > *Subject:* Re: [dpdk-users] Lcore impact > > > > > > > > 2016-04-14 20:49 GMT+03:00 Hu, Xuekun : > > Are the two lcore belonging to one processor, or two processors? What the > memory footprint is for the system call threads? If the memory footprint = is > big (>LLC cache size) and two locre are in the same processor, then it > could have impact on packet processing thread. > > > > Those two lcores belong to one processor and it's a single processor > machine. > > > > Both cores allocates a lot of memory and use the full dpdk arsenal: lpm, > mempools, hashes and etc. But during the test the core doing socket data > transfering is using only small 16k buffer for sending and sending is the > all it does during the test. It doesn't use any other allocated memory > structures. The processing core in turn is using rte_lpm whitch is big, b= ut > in my test there are only about 10 routes in it, so I think the amount > "hot" memory is not very big. But I can't say if it's bigger than l3 cpu > cache or not. Should I use some profilers and see if socket operations > cause a lot of cache miss in the processing lcore? It there some tool tha= t > allows me to do that? perf maybe? > > > > > > > > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Alexander Kisele= v > Sent: Friday, April 15, 2016 1:19 AM > To: Shawn Lewis > Cc: users@dpdk.org > Subject: Re: [dpdk-users] Lcore impact > > I've already seen this documen and have used this tricks a lot of times. > But this time I send data locally over localhost. There is even no nics > bind to linux in my machine. Therefore there is no nics interruptions I c= an > pin to cpu. So what do you propose? > > > 14 =D0=B0=D0=BF=D1=80. 2016 =D0=B3., =D0=B2 20:06, Shawn Lewis =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): > > > > You have to work with IRQBalancer as well > > > > > http://www.intel.com/content/dam/doc/application-note/82575-82576-82598-8= 2599-ethernet-controllers-interrupts-appl-note.pdf > > > > Is just an example document which discuss this (not so much DPDK > related)... But the OS will attempt to balance the interrupts when you > actually want to remove or pin them down... > > > >> On Thu, Apr 14, 2016 at 1:02 PM, Alexander Kiselev > wrote: > >> > >> > >>> 14 =D0=B0=D0=BF=D1=80. 2016 =D0=B3., =D0=B2 19:35, Shawn Lewis =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): > >>> > >>> Lots of things... > >>> > >>> One just because you have a process running on an lcore, does not mea= n > thats all that runs on it. Unless you have told the kernel at boot to NO= T > use those specific cores, those cores will be used for many things OS > related. > >> > >> Generally yes, but unless I start sending data to socket there is no > packet loss. I did about 10 test runs in a raw and everythis was ok. And > there is no other application running on that test machine that uses cpu > cores. > >> > >> So the question is why this socket operations influence the other lcor= e? > >> > >>> > >>> IRQBlance > >>> System OS operations. > >>> Other Applications. > >>> > >>> So by doing file i/o you are generating interrupts, where those > interrupts get serviced is up to IRQBalancer. So could be any one of you= r > cores. > >> > >> That is a good point. I can use cpu affinity feature to bind > unterruption handler to the core not used in my test. But I send data > locally over localhost. Is it possible to use cpu affinity in that case? > >> > >>> > >>> > >>> > >>>> On Thu, Apr 14, 2016 at 12:31 PM, Alexander Kiselev < > kiselev99@gmail.com> wrote: > >>>> Could someone give me any hints about what could cause permormance > issues in a situation where one lcore doing a lot of linux system calls > (read/write on socket) slow down the other lcore doing packet forwarding? > In my test the forwarding lcore doesn't share any memory structures with > the other lcore that sends test data to socket. Both lcores pins to > different processors cores. So therotically they shouldn't have any impac= t > on each other but they do, once one lcore starts sending data to socket t= he > other lcore starts dropping packets. Why? > >>> > > > > > > > > -- > > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, > =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1=81= =D0=B0=D0=BD=D0=B4=D1=80 > > > > > > -- > > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, > =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1=81= =D0=B0=D0=BD=D0=B4=D1=80 > --=20 =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D0=B8=D1=81=D0=B5=D0=BB=D0=B5=D0=B2 =D0=90=D0=BB=D0=B5=D0=BA=D1=81= =D0=B0=D0=BD=D0=B4=D1=80