From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail1.windriver.com (mail1.windriver.com [147.11.146.13]) by dpdk.org (Postfix) with ESMTP id 7836E156 for ; Tue, 19 Nov 2013 22:37:24 +0100 (CET) Received: from ALA-HCB.corp.ad.wrs.com (ala-hcb.corp.ad.wrs.com [147.11.189.41]) by mail1.windriver.com (8.14.5/8.14.5) with ESMTP id rAJLcLLE010768 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 19 Nov 2013 13:38:21 -0800 (PST) Received: from ALA-MBB.corp.ad.wrs.com ([169.254.1.135]) by ALA-HCB.corp.ad.wrs.com ([147.11.189.41]) with mapi id 14.02.0347.000; Tue, 19 Nov 2013 13:38:21 -0800 From: "Wiles, Roger Keith" To: jinho hwang Thread-Topic: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports? Thread-Index: AQHO5UZklF8VZKDOsUOjZERLDpDmC5otS6GAgAACv4CAAAV2AIAAAyeAgAA+fQCAAAQagIAAAVQA Date: Tue, 19 Nov 2013 21:38:21 +0000 Message-ID: <46460D60-02E1-4D60-80F9-D38325396780@windriver.com> References: <65281A9A-D192-4C9E-9EF8-A02E0A999862@windriver.com> <3DD1A4F3-F133-401F-890C-033D7468D45F@windriver.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.25.40.167] MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev Subject: Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 21:37:25 -0000 I do not think a newer version will effect the performance, but you can try= it. git clone git://github.com/Pktgen/Pktgen-DPDK This one is 2.2.5 and DPDK 1.5.0 Keith Wiles, Principal Technologist for Networking member of the CTO office= , Wind River mobile 940.213.5533 [Powering 30 Years of Innovation] On Nov 19, 2013, at 3:33 PM, jinho hwang > wrote: On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith > wrote: Give this a try, if that does not work then something else is going on here= . I am trying to make sure we do not cross the QPI for any reason putting the RX/TX queues related to a port on the same core. sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m "[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua cores =3D [0, 1, 2, 8, 9, 10] sockets =3D [1, 0] Socket 1 Socket 0 --------- --------- Core 0 [0, 12] [1, 13] Core 1 [2, 14] [3, 15] Core 2 [4, 16] [5, 17] Core 8 [6, 18] [7, 19] Core 9 [8, 20] [9, 21] Core 10 [10, 22] [11, 23] Keith Wiles, Principal Technologist for Networking member of the CTO office= , Wind River mobile 940.213.5533 On Nov 19, 2013, at 11:35 AM, jinho hwang > wrote: On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith > wrote: Normally when I see this problem it means the the lcores are not mapped correctly. What can happen is you have a Rx and a TX on the same physical core or two RX/TX on the same physical core. Make sure you have a Rx or Tx running on a single core look at the cpu_layout.py output and verify the configuration is correct. If you have 8 physical cores in the then you need to make sure on one of the lcores on that core is being used. Let me know what happens. Keith Wiles, Principal Technologist for Networking member of the CTO office= , Wind River mobile 940.213.5533 On Nov 19, 2013, at 11:04 AM, jinho hwang > wrote: On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith > wrote: BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration. Keith Wiles, Principal Technologist for Networking member of the CTO office= , Wind River mobile 940.213.5533 On Nov 19, 2013, at 10:42 AM, jinho hwang > wrote: On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith > wrote: How do you have Pktgen configured in this case? On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC 82599x two ports. My machine has a PCIe bug that does not allow me to send on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but the forth port does is about 70% of wire rate because of the PCIe hardware bottle neck problem. Keith Wiles, Principal Technologist for Networking member of the CTO office= , Wind River direct 972.434.4136 mobile 940.213.5533 fax 000.000.0000 On Nov 19, 2013, at 10:09 AM, jinho hwang > wrote: Hi All, I have two NICs (82599) x two ports that are used as packet generators. I want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not seem to be able to do it when two port in a NIC are used simultaneously. Does anyone know how to generate 40Gbps without replicating packets in the switch? Thank you, Jinho Hi Keith, Thank you for the e-mail. I am not sure how I figure out whether my PCIe also has any problems to prevent me from sending full line-rates. I use Intel(R) Xeon(R) CPU E5649 @ 2.53GHz. It is hard for me to figure out where is the bottleneck. My configuration is: sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua =3D=3D=3D port to lcore mapping table (# lcores 9) =3D=3D=3D lcore: 0 1 2 3 4 5 6 7 8 port 0: D: T 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 =3D 1: 1 port 1: D: T 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 =3D 1: 1 port 2: D: T 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 =3D 1: 1 port 3: D: T 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 =3D 1: 1 Total : 0: 0 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 Display and Timer on lcore 0, rx:tx counts per port/lcore Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128 Lcore: 1, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 0: 0) , TX (pid:qid): 2, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 0: 0) 3, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 1: 0) , TX (pid:qid): 4, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 1: 0) 5, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 2: 0) , TX (pid:qid): 6, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 2: 0) 7, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 3: 0) , TX (pid:qid): 8, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 3: 0) Port : 0, nb_lcores 2, private 0x6fd5a0, lcores: 1 2 1, nb_lcores 2, private 0x700208, lcores: 3 4 2, nb_lcores 2, private 0x702e70, lcores: 5 6 3, nb_lcores 2, private 0x705ad8, lcores: 7 8 Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a4 Create: Default RX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Default TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Range TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Sequence TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 =3D 515 KB Port memory used =3D 10251 KB Initialize Port 1 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a5 Create: Default RX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Default TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Range TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Sequence TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Special TX 1:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 =3D 515 KB Port memory used =3D 10251 KB Initialize Port 2 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1c Create: Default RX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Default TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Range TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Sequence TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Special TX 2:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 =3D 515 KB Port memory used =3D 10251 KB Initialize Port 3 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1d Create: Default RX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Default TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Range TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Sequence TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 =3D 2435 KB Create: Special TX 3:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 =3D 515 KB Port memory used =3D 10251 KB Total memory used =3D 41003 KB Port 0: Link Up - speed 10000 Mbps - full-duplex Port 1: Link Up - speed 10000 Mbps - full-duplex Port 2: Link Up - speed 10000 Mbps - full-duplex Port 3: Link Up - speed 10000 Mbps - full-duplex =3D=3D=3D Display processing on lcore 0 =3D=3D=3D RX processing on lcore 1, rxcnt 1, port/qid, 0/0 =3D=3D=3D TX processing on lcore 2, txcnt 1, port/qid, 0/0 =3D=3D=3D RX processing on lcore 3, rxcnt 1, port/qid, 1/0 =3D=3D=3D TX processing on lcore 4, txcnt 1, port/qid, 1/0 =3D=3D=3D RX processing on lcore 5, rxcnt 1, port/qid, 2/0 =3D=3D=3D TX processing on lcore 6, txcnt 1, port/qid, 2/0 =3D=3D=3D RX processing on lcore 7, rxcnt 1, port/qid, 3/0 =3D=3D=3D TX processing on lcore 8, txcnt 1, port/qid, 3/0 Please, advise me if you have time. Thank you always for your help! Jinho The phenomenon is that when I start one port in one NIC, it reaches 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps each =3D 20Gbps. But, when I start two port in one NIC, it becomes 5.8Gbps each. This is persistent when cores are assigned differently---cross sockets and the same sockets. Since the size of huge pages are fixed, it will not be a problem. Should we say this is the limitation on NIC or bus? The reason I think this may be a hw limitation is that regardless of packet sizes, two ports in one NIC can only send 5.8Gbps maximum. Do you have any way that I can calculate the hw limitation? Jinho My cpu configuration is as follows: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Core and Socket Information (as reported by '/proc/cpuinfo') =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D cores =3D [0, 1, 2, 8, 9, 10] sockets =3D [1, 0] Socket 1 Socket 0 --------- --------- Core 0 [0, 12] [1, 13] Core 1 [2, 14] [3, 15] Core 2 [4, 16] [5, 17] Core 8 [6, 18] [7, 19] Core 9 [8, 20] [9, 21] Core 10 [10, 22] [11, 23] When I use just two ports for testing, I use this configuration. sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m "[2:4].0, [6:8].1" -f test/forward.lua As you can see the core numbers, 2, 4, 6, 8 are all in different physical cores and are assigned separately. I am not sure that happens with core configuration. Do have any other thoughts we may try? Thanks, Jinho Actually, I tried it before, and to make sure I tried it again just now. Still, it only shows me 5.8Gbps for each port. What other possibilities do you think I have to try? I am losing hopes how. Does the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my system.