From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) by dpdk.org (Postfix) with ESMTP id 9085F156 for ; Wed, 20 Nov 2013 21:57:29 +0100 (CET) Received: by mail-wg0-f46.google.com with SMTP id x12so9692415wgg.25 for ; Wed, 20 Nov 2013 12:58:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=FYVPOv78voHQV6mV8+PiSaoLbxViZRwHHADgBA+XQ8U=; b=cRbTr6i3QXNAO6xhRfNV4LTOeWAp7ZnZ1hRD99yR0FoiQt5jMnCoIkzHdmevA5jndc cg9QRI3jl4LyZcOXqr/PiWWQdDaZIpQo0UEDhsAvwLX/8gWZhLWmjmcuAZbN1UBzOlMO FaJgnwnfS42Ly5FB8pCSjGB/MvtkCWsXwk4IuA9/aic0ZDfiqKEyvDoc/31UW/SOydth FN/hbd61ERazazE6P6/bM7G3B7WJjeIGQojdCUpkce/kHnv8cft5kUbgb7qT+lk0pewo tuG8rZ/h3BUvSkzVbJV3NArhBNWE1fI/kbgIQhVJgO3gTZz+wqKYbjapcMgyNIGLozgR Olew== X-Received: by 10.180.74.45 with SMTP id q13mr2862291wiv.47.1384981107321; Wed, 20 Nov 2013 12:58:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.94.100 with HTTP; Wed, 20 Nov 2013 12:58:07 -0800 (PST) In-Reply-To: <46460D60-02E1-4D60-80F9-D38325396780@windriver.com> References: <65281A9A-D192-4C9E-9EF8-A02E0A999862@windriver.com> <3DD1A4F3-F133-401F-890C-033D7468D45F@windriver.com> <46460D60-02E1-4D60-80F9-D38325396780@windriver.com> From: jinho hwang Date: Wed, 20 Nov 2013 15:58:07 -0500 Message-ID: To: "Wiles, Roger Keith" Content-Type: text/plain; charset=ISO-8859-1 Cc: dev Subject: Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 20:57:30 -0000 On Tue, Nov 19, 2013 at 4:38 PM, Wiles, Roger Keith wrote: > > I do not think a newer version will effect the performance, but you can try it. > > git clone git://github.com/Pktgen/Pktgen-DPDK > > This one is 2.2.5 and DPDK 1.5.0 > > > Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River > mobile 940.213.5533 > > On Nov 19, 2013, at 3:33 PM, jinho hwang wrote: > > On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith > wrote: > > Give this a try, if that does not work then something else is going on here. > I am trying to make sure we do not cross the QPI for any reason putting the > RX/TX queues related to a port on the same core. > > sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m > "[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua > > sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m > "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua > > > cores = [0, 1, 2, 8, 9, 10] > sockets = [1, 0] > Socket 1 Socket 0 > --------- --------- > Core 0 [0, 12] [1, 13] > Core 1 [2, 14] [3, 15] > Core 2 [4, 16] [5, 17] > Core 8 [6, 18] [7, 19] > Core 9 [8, 20] [9, 21] > Core 10 [10, 22] [11, 23] > > > Keith Wiles, Principal Technologist for Networking member of the CTO office, > Wind River > mobile 940.213.5533 > > On Nov 19, 2013, at 11:35 AM, jinho hwang wrote: > > On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith > wrote: > > Normally when I see this problem it means the the lcores are not mapped > correctly. What can happen is you have a Rx and a TX on the same physical > core or two RX/TX on the same physical core. > > Make sure you have a Rx or Tx running on a single core look at the > cpu_layout.py output and verify the configuration is correct. If you have 8 > physical cores in the then you need to make sure on one of the lcores on > that core is being used. > > Let me know what happens. > > Keith Wiles, Principal Technologist for Networking member of the CTO office, > Wind River > mobile 940.213.5533 > > On Nov 19, 2013, at 11:04 AM, jinho hwang wrote: > > On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith > wrote: > > > BTW, the configuration looks fine, but you need to make sure the lcores are > not split between two different CPU sockets. You can use the > dpdk/tools/cpu_layout.py to do dump out the system configuration. > > > Keith Wiles, Principal Technologist for Networking member of the CTO office, > Wind River > mobile 940.213.5533 > > > On Nov 19, 2013, at 10:42 AM, jinho hwang wrote: > > On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith > wrote: > > How do you have Pktgen configured in this case? > > On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC > 82599x two ports. My machine has a PCIe bug that does not allow me to send > on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but > the forth port does is about 70% of wire rate because of the PCIe hardware > bottle neck problem. > > Keith Wiles, Principal Technologist for Networking member of the CTO office, > Wind River > direct 972.434.4136 mobile 940.213.5533 fax 000.000.0000 > > On Nov 19, 2013, at 10:09 AM, jinho hwang wrote: > > Hi All, > > I have two NICs (82599) x two ports that are used as packet generators. I > want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not > seem to be able to do it when two port in a NIC are used simultaneously. > Does anyone know how to generate 40Gbps without replicating packets in the > switch? > > Thank you, > > Jinho > > > > Hi Keith, > > Thank you for the e-mail. I am not sure how I figure out whether my > PCIe also has any problems to prevent me from sending full line-rates. > I use Intel(R) Xeon(R) CPU E5649 @ 2.53GHz. It is hard for > me to figure out where is the bottleneck. > > My configuration is: > > sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m > "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua > > > === port to lcore mapping table (# lcores 9) === > > lcore: 0 1 2 3 4 5 6 7 8 > > port 0: D: T 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 = 1: 1 > > port 1: D: T 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 = 1: 1 > > port 2: D: T 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 = 1: 1 > > port 3: D: T 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 = 1: 1 > > Total : 0: 0 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 > > Display and Timer on lcore 0, rx:tx counts per port/lcore > > > Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128 > > Lcore: > > 1, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( > 0: 0) , TX (pid:qid): > > 2, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , > TX (pid:qid): ( 0: 0) > > 3, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( > 1: 0) , TX (pid:qid): > > 4, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , > TX (pid:qid): ( 1: 0) > > 5, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( > 2: 0) , TX (pid:qid): > > 6, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , > TX (pid:qid): ( 2: 0) > > 7, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( > 3: 0) , TX (pid:qid): > > 8, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , > TX (pid:qid): ( 3: 0) > > > Port : > > 0, nb_lcores 2, private 0x6fd5a0, lcores: 1 2 > > 1, nb_lcores 2, private 0x700208, lcores: 3 4 > > 2, nb_lcores 2, private 0x702e70, lcores: 5 6 > > 3, nb_lcores 2, private 0x705ad8, lcores: 7 8 > > > > Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a4 > > Create: Default RX 0:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > > Create: Default TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Range TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Sequence TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 1984 + > Hdr 64)) + 395392 = 515 KB > > > > Port memory used = 10251 KB > > Initialize Port 1 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a5 > > Create: Default RX 1:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > > Create: Default TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Range TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Sequence TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Special TX 1:0 - Memory used (MBUFs 64 x (size 1984 + > Hdr 64)) + 395392 = 515 KB > > > > Port memory used = 10251 KB > > Initialize Port 2 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1c > > Create: Default RX 2:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > > Create: Default TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Range TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Sequence TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Special TX 2:0 - Memory used (MBUFs 64 x (size 1984 + > Hdr 64)) + 395392 = 515 KB > > > > Port memory used = 10251 KB > > Initialize Port 3 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1d > > Create: Default RX 3:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > > Create: Default TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Range TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Sequence TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + > Hdr 64)) + 395392 = 2435 KB > > Create: Special TX 3:0 - Memory used (MBUFs 64 x (size 1984 + > Hdr 64)) + 395392 = 515 KB > > > > Port memory used = 10251 KB > > > Total memory used = 41003 KB > > Port 0: Link Up - speed 10000 Mbps - full-duplex > > Port 1: Link Up - speed 10000 Mbps - full-duplex > > Port 2: Link Up - speed 10000 Mbps - full-duplex > > Port 3: Link Up - speed 10000 Mbps - full-duplex > > > === Display processing on lcore 0 > > === RX processing on lcore 1, rxcnt 1, port/qid, 0/0 > > === TX processing on lcore 2, txcnt 1, port/qid, 0/0 > > === RX processing on lcore 3, rxcnt 1, port/qid, 1/0 > > === TX processing on lcore 4, txcnt 1, port/qid, 1/0 > > === RX processing on lcore 5, rxcnt 1, port/qid, 2/0 > > === TX processing on lcore 6, txcnt 1, port/qid, 2/0 > > === RX processing on lcore 7, rxcnt 1, port/qid, 3/0 > > === TX processing on lcore 8, txcnt 1, port/qid, 3/0 > > > Please, advise me if you have time. > > Thank you always for your help! > > Jinho > > > > The phenomenon is that when I start one port in one NIC, it reaches > 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps > each = 20Gbps. But, when I start two port in one NIC, it becomes > 5.8Gbps each. This is persistent when cores are assigned > differently---cross sockets and the same sockets. Since the size of > huge pages are fixed, it will not be a problem. Should we say this is > the limitation on NIC or bus? The reason I think this may be a hw > limitation is that regardless of packet sizes, two ports in one NIC > can only send 5.8Gbps maximum. > > Do you have any way that I can calculate the hw limitation? > > Jinho > > > > My cpu configuration is as follows: > > ============================================================ > > Core and Socket Information (as reported by '/proc/cpuinfo') > > ============================================================ > cores = [0, 1, 2, 8, 9, 10] > sockets = [1, 0] > Socket 1 Socket 0 > --------- --------- > Core 0 [0, 12] [1, 13] > Core 1 [2, 14] [3, 15] > Core 2 [4, 16] [5, 17] > Core 8 [6, 18] [7, 19] > Core 9 [8, 20] [9, 21] > Core 10 [10, 22] [11, 23] > > When I use just two ports for testing, I use this configuration. > > sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m > "[2:4].0, [6:8].1" -f test/forward.lua > > As you can see the core numbers, 2, 4, 6, 8 are all in different > physical cores and are assigned separately. I am not sure that happens > with core configuration. Do have any other thoughts we may try? > > Thanks, > > Jinho > > > Actually, I tried it before, and to make sure I tried it again just > now. Still, it only shows me 5.8Gbps for each port. What other > possibilities do you think I have to try? I am losing hopes how. Does > the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my > system. > Keith, Yes, the newer version did not work either. Since I am able to send close to 24Gbps from two NICs, I do not think the limitation comes from bus or memory. It may be because of how I use the NICs. I am stick to this hypothesis now, and trying to use more core/queue for tx. The problem is that when I tried this sudo ./app/build/pktgen -c 3ff -n 3 $BLACK_LIST -- -p 0x30 -P -m "[2-5].0, [6-9].1" -f test/forward.lua , the log shows 4 tx/rx queues are assigned, but it seems that only 1/4 (2.5Gbps) are transmitted. My questions are: 1. It seems to me 4 queues/cores have 1/4 or work, but only one queue/core is working. Do you know the reason for this? and how to fix it? 2. Can I make a configuration with one core x multiple queues? 3. Is there any way to see more statistics in commands? Thank you, Jinho