From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ex1.cas-well.com (unknown [122.147.166.54]) by dpdk.org (Postfix) with ESMTP id D97C22E89 for ; Mon, 2 Sep 2013 05:21:44 +0200 (CEST) Received: from [172.16.1.178] (122.147.166.57) by ex1.cas-well.com (192.168.200.10) with Microsoft SMTP Server id 14.2.247.3; Mon, 2 Sep 2013 11:24:28 +0800 Message-ID: <52240466.7050907@cas-well.com> Date: Mon, 2 Sep 2013 11:22:14 +0800 From: Zachary Organization: Cas-Well User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: Content-Type: multipart/alternative; boundary="------------030907080403080602000004" X-Originating-IP: [122.147.166.57] Cc: =?UTF-8?B?Illhbm5pYy5DaG91ICjlkajlk7LmraMpIDogNjg=?= =?UTF-8?B?MDgi?= , =?UTF-8?B?IkFsYW4gWXUgKOS/nuS6puWBiSkgOiA2NjMyIg==?= Subject: [dpdk-dev] DPDK & QPI performance issue in Romley platform. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 03:21:46 -0000 --------------030907080403080602000004 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Hi~ I have a question about DPDK & QPI performance issue in Romley platform. Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romle= y platform. When I try to do the test, crossing used CPU, I find the performance dramat= ically decrease. Is it true? Or any method can prove the phenomenon? In my opinion, there should be no this kind of issue here due to QPI have e= nough bandwidth to deal the kinds of case. Thus, I am so amaze in our results and can not explain it. Could someone can help me to solve this problem. Thank a lot! My testing environment describe as below: Platform: Romley CPU: E5-2643 * 2 RAM: Transcend 8GB PC3-1600 DDR3 * 8 OS: Fedora core 14 DPDK: v1.3.1r2, example/l2fwd Slot setting: SlotA is controled by CPU1 directly. SlotB is controled by CPU0 directly. DPDK pre-setting: a. BIOS setting: HT=3Ddisable b. Kernel paramaters isolcpus=3D2,3,6,7 default_hugepagesz=3D1024M hugepagesz=3D1024M hugepages=3D16 c. OS setting: service avahi-daemon stop service NetworkManager stop service iptables stop service acpid stop selinux disable Example program Command: a. SlotB(CPU0) -> CPU1 #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc b. SlotA(CPU1) -> CPU0 #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0 Results: use frame size 128 bytes CPU Affinity Slot A (CPU1) Slot B (CPU0) CPU0 15.9% 96.49% CPU1 90.88% 24.78% =E6=9C=AC=E4=BF=A1=E4=BB=B6=E5=8F=AF=E8=83=BD=E5=8C=85=E5=90=AB=E7=91=9E=E7= =A5=BA=E9=9B=BB=E9=80=9A=E6=A9=9F=E5=AF=86=E8=B3=87=E8=A8=8A=EF=BC=8C=E9=9D= =9E=E6=8C=87=E5=AE=9A=E4=B9=8B=E6=94=B6=E4=BB=B6=E8=80=85=EF=BC=8C=E8=AB=8B= =E5=8B=BF=E4=BD=BF=E7=94=A8=E6=88=96=E6=8F=AD=E9=9C=B2=E6=9C=AC=E4=BF=A1=E4= =BB=B6=E5=85=A7=E5=AE=B9=EF=BC=8C=E4=B8=A6=E8=AB=8B=E9=8A=B7=E6=AF=80=E6=AD= =A4=E4=BF=A1=E4=BB=B6=E3=80=82 This email may contain confidential informat= ion. Please do not use or disclose it in any way and delete it if you are n= ot the intended recipient. --------------030907080403080602000004 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi~

I have a question about DPDK & QPI performance issue in Romley  pl= atform.
Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romle= y platform.
When I try to do the test, crossing used CPU, I find the performance dramat= ically decrease.
Is it true? Or any method can prove the phenomenon?

In my opinion, there should be no this kind of issue here due to QPI have e= nough bandwidth to deal the kinds of case.
Thus, I am so amaze in our results and can not explain it.
Could someone can help me to solve this problem.

Thank a lot!


My testing environment describe as below:

Platform:         Romley
CPU:            = ;    E5-2643 * 2
RAM:            = ;   Transcend 8GB PC3-1600 DDR3 * 8
OS:           &= nbsp;     Fedora core 14
DPDK:           = ; v1.3.1r2, example/l2fwd
Slot setting:
            &nb= sp;         SlotA is controled by C= PU1 directly.

          &nbs= p;           SlotB is con= troled by CPU0 directly.

= DPDK pre-setting:
a. BIOS setting:
    HT=3Ddisable
b. Kernel paramaters
    isolcpus=3D2,3,6,7
    default_hugepagesz=3D1024M
    hugepagesz=3D1024M
    hugepages=3D16
c. OS setting:
    service avahi-daemon stop
    service NetworkManager stop
    service iptables stop
    service acpid stop
    selinux disable


Example program Command:
a. SlotB(CPU0) -> CPU1
    #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc

b. SlotA(CPU1) -> CPU0
    #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0
=

Results:
     use frame size 128 bytes

CPU Affinity

Slot = A (CPU1)

Slot = B (CPU0)

CPU0<= /span>

15.9%=

96.49= %

CPU1<= /span>

90.88= %

24.78= %



=E6=9C=AC=E4=BF=A1=E4=BB=B6=E5=8F=AF=E8=83=BD=E5=8C=85=E5=90=AB=E7=91=9E=E7= =A5=BA=E9=9B=BB=E9=80=9A=E6=A9=9F=E5=AF=86=E8=B3=87=E8=A8=8A=EF=BC=8C=E9=9D= =9E=E6=8C=87=E5=AE=9A=E4=B9=8B=E6=94=B6=E4=BB=B6=E8=80=85=EF=BC=8C=E8=AB=8B= =E5=8B=BF=E4=BD=BF=E7=94=A8=E6=88=96=E6=8F=AD=E9=9C=B2=E6=9C=AC=E4=BF=A1=E4= =BB=B6=E5=85=A7=E5=AE=B9=EF=BC=8C=E4=B8=A6=E8=AB=8B=E9=8A=B7=E6=AF=80=E6=AD= =A4=E4=BF=A1=E4=BB=B6=E3=80=82 This email may contain confidential informat= ion. Please do not use or disclose it in any way and delete it if you are n= ot the intended recipient. --------------030907080403080602000004-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by dpdk.org (Postfix) with ESMTP id 39D891F3 for ; Mon, 2 Sep 2013 18:09:42 +0200 (CEST) Received: by mail-pa0-f50.google.com with SMTP id fb10so5363171pad.37 for ; Mon, 02 Sep 2013 09:10:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=WJheowdGKoyFQufjzZMFcwAWF0of9XPwF1uZnN7HEpY=; b=Y4iLNcpUnaEqgQX4PZ1HJnkhrB69nVWQdtxDc43Gzb1pNRz4hg2pKynIPaxtboSkOO Ch9qdZxyMK1U7pTciFZlrZIXm7qXY6bdUy6qEhaVxwsePfP/RYa8PomaLmHFQb9KTx5R SRtSn70dbigZXi9AcF0UIZ8Sw+q/cGVcoskgYz9VPAb4vDzARHsf6YKPmTqmPl9s9mf7 RSh555J8xe1hWSE+oSpnDSH/YCr8vJLpxtaxuVyP5UdKw2MPVSInzHoN/tMnUVgGEQrX 5xjROwyuDNqBHA4HgrJJLhk7DxaDD5WRNVRK7kB6e2duQOEbHnx9co4k0alp95lgcaWW XtNg== X-Gm-Message-State: ALoCoQml9nPjfg8UnX8WqA9OqeUL8eSiYiNL/yY1ZedybG1D5w4l9XFs6ak4K3FWKEQb9U2mlbf/ X-Received: by 10.68.14.234 with SMTP id s10mr4215282pbc.139.1378138215336; Mon, 02 Sep 2013 09:10:15 -0700 (PDT) Received: from nehalam.linuxnetplumber.net (static-50-53-69-237.bvtn.or.frontiernet.net. [50.53.69.237]) by mx.google.com with ESMTPSA id ta10sm18048127pab.5.1969.12.31.16.00.00 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 02 Sep 2013 09:10:15 -0700 (PDT) Date: Mon, 2 Sep 2013 09:10:12 -0700 From: Stephen Hemminger To: Zachary Message-ID: <20130902091012.2e68b88e@nehalam.linuxnetplumber.net> In-Reply-To: <52240466.7050907@cas-well.com> References: <52240466.7050907@cas-well.com> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, "Yannic.Chou \(=?utf-8?B?5ZGo5ZOy5q2j?=\) : 6808" , "Alan Yu \(=?utf-8?B?5L+e5Lqm5YGJ?=\) : 6632" Subject: Re: [dpdk-dev] DPDK & QPI performance issue in Romley platform. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 16:09:42 -0000 On Mon, 2 Sep 2013 11:22:14 +0800 Zachary wrote: > Hi~ > > I have a question about DPDK & QPI performance issue in Romley platform. > Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley platform. > When I try to do the test, crossing used CPU, I find the performance dramatically decrease. > Is it true? Or any method can prove the phenomenon? > > In my opinion, there should be no this kind of issue here due to QPI have enough bandwidth to deal the kinds of case. > Thus, I am so amaze in our results and can not explain it. > Could someone can help me to solve this problem. > > Thank a lot! Many DPDK API's have NUMA socket as one of the parameters. In order to get good performance it is up to the application to be NUMA aware and use socket local resources. One example we do is to have a packet mbuf pool per socket, and assign each device to the correct pool. Also, you may want to choose which lcore's to assign to which function based on socket locality. For example threads that are polling receiver should be on same socket as that NIC. Remember the example applications are demo toys, and don't do all the things a real application would need to do. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ex1.cas-well.com (unknown [122.147.166.54]) by dpdk.org (Postfix) with ESMTP id AEAF02E89 for ; Fri, 6 Sep 2013 09:31:15 +0200 (CEST) Received: from [172.16.1.178] (122.147.166.57) by ex1.cas-well.com (192.168.200.10) with Microsoft SMTP Server id 14.2.247.3; Fri, 6 Sep 2013 15:34:00 +0800 Message-ID: <522984E2.8020802@cas-well.com> Date: Fri, 6 Sep 2013 15:31:46 +0800 From: Zachary Organization: Cas-Well User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130803 Thunderbird/17.0.8 MIME-Version: 1.0 To: References: <52289F01.7010503@cas-well.com> In-Reply-To: <52289F01.7010503@cas-well.com> X-Forwarded-Message-Id: <52289F01.7010503@cas-well.com> X-Originating-IP: [122.147.166.57] Content-Type: text/plain; charset="GB18030"; format=flowed Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] DPDK & QPI performance issue in Romley platform. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Sep 2013 07:31:17 -0000 Hi~ Bob, Thanks for your response! So, you think it is memory usage problem rather than QPI issue? That means if I improve the memory usage issue, may the preformance will ra= ise to my expected? BTW, Have anyone every use DPDK in NUMA and use crossing CPU like my case? If yes, could you tell me how to solve the question? If no, I would to know the DPDK allow user to this kinds of case in their a= pp or not? If the answer is true, I need change a way to use DPDK. above, it has lots of questions. I hope someone can help me to answer the q= uestions. On 09/04/2013 12:19 AM, Bob Chen wrote: QPI bandwidth is definitely large enough, but it seems that QPI is only res= ponsible for the communication between separate CPU chips. What you need to= do is actually accessing the memory on the other part, probably not even h= it the bandwidth. The latency can be caused by a lot of facts during a NUMA= operation. /Bob ------------------ =D4=AD=CA=BC=D3=CA =BC=FE ------------------ =B7=A2=BC=FE=C8=CB: "Zachary";; =B7=A2=CB=CD=CA=B1=BC=E4: 2013=C4=EA9=D4=C22=C8=D5(=D0=C7=C6=DA=D2=BB) =D6= =D0=CE=E711:22 =CA=D5=BC=FE=C8=CB: "dev"; =B3=AD=CB=CD: " "Yannic.Chou (=D6=DC=D5=DC=D5=FD) : 6808" ; "Alan Yu =D3=E1=D2=E0=82=A5 : 6= 632""; =D6=F7=CC=E2: [dpdk-dev] DPDK & QPI performance issue in Romley platform. Hi~ I have a question about DPDK & QPI performance issue in Romley platform. Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romle= y platform. When I try to do the test, crossing used CPU, I find the performance dramat= ically decrease. Is it true? Or any method can prove the phenomenon? In my opinion, there should be no this kind of issue here due to QPI have e= nough bandwidth to deal the kinds of case. Thus, I am so amaze in our results and can not explain it. Could someone can help me to solve this problem. Thank a lot! My testing environment describe as below: Platform: Romley CPU: E5-2643 * 2 RAM: Transcend 8GB PC3-1600 DDR3 * 8 OS: Fedora core 14 DPDK: v1.3.1r2, example/l2fwd Slot setting: SlotA is controled by CPU1 directly. SlotB is controled by CPU0 directly. DPDK pre-setting: a. BIOS setting: HT=3Ddisable b. Kernel paramaters isolcpus=3D2,3,6,7 default_hugepagesz=3D1024M hugepagesz=3D1024M hugepages=3D16 c. OS setting: service avahi-daemon stop service NetworkManager stop service iptables stop service acpid stop selinux disable Example program Command: a. SlotB(CPU0) -> CPU1 #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc b. SlotA(CPU1) -> CPU0 #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0 Results: use frame size 128 bytes CPU Affinity Slot A (CPU1) Slot B (CPU0) CPU0 15.9% 96.49% CPU1 90.88% 24.78% =B1=BE=D0=C5=BC=FE=BF=C9=C4=DC=B0=FC=BA=AC=C8=F0=EC=F7=EB=8A=CD=A8=99C=C3= =DC=D9Y=D3=8D=A3=AC=B7=C7=D6=B8=B6=A8=D6=AE=CA=D5=BC=FE=D5=DF=A3=AC=D5=88= =CE=F0=CA=B9=D3=C3=BB=F2=BD=D2=C2=B6=B1=BE=D0=C5=BC=FE=83=C8=C8=DD=A3=AC=81= K=D5=88=E4N=9A=A7=B4=CB=D0=C5=BC=FE=A1=A3 This email may contain confidenti= al information. Please do not use or disclose it in any way and delete it i= f you are not the intended recipient. -- Best Regards, Zachary Jen Software RD CAS-WELL Inc. 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan Tel: +886-2-7731-8888#6305 Fax: +886-2-7731-9988 =B1=BE=D0=C5=BC=FE=BF=C9=C4=DC=B0=FC=BA=AC=C8=F0=EC=F7=EB=8A=CD=A8=99C=C3= =DC=D9Y=D3=8D=A3=AC=B7=C7=D6=B8=B6=A8=D6=AE=CA=D5=BC=FE=D5=DF=A3=AC=D5=88= =CE=F0=CA=B9=D3=C3=BB=F2=BD=D2=C2=B6=B1=BE=D0=C5=BC=FE=83=C8=C8=DD=A3=AC=81= K=D5=88=E4N=9A=A7=B4=CB=D0=C5=BC=FE=A1=A3 This email may contain confidenti= al information. Please do not use or disclose it in any way and delete it i= f you are not the intended recipient.