From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.windriver.com (mail.windriver.com [147.11.1.11]) by dpdk.org (Postfix) with ESMTP id 1F759DE3 for ; Sun, 22 Sep 2013 19:56:36 +0200 (CEST) Received: from ALA-HCB.corp.ad.wrs.com (ala-hcb.corp.ad.wrs.com [147.11.189.41]) by mail.windriver.com (8.14.5/8.14.3) with ESMTP id r8MHvGP1011367 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Sun, 22 Sep 2013 10:57:16 -0700 (PDT) Received: from ALA-MBA.corp.ad.wrs.com ([169.254.2.42]) by ALA-HCB.corp.ad.wrs.com ([147.11.189.41]) with mapi id 14.02.0342.003; Sun, 22 Sep 2013 10:57:15 -0700 From: "Wiles, Roger Keith" To: Venky Venkatesan Thread-Topic: [dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen Thread-Index: AQHOt6CSbsSn1UN5xUqD/kigq1aCMJnSfBeAgAAEVgA= Date: Sun, 22 Sep 2013 17:57:15 +0000 Message-ID: References: <1FD9B82B8BF2CF418D9A1000154491D973F4DF62@ORSMSX102.amr.corp.intel.com> In-Reply-To: <1FD9B82B8BF2CF418D9A1000154491D973F4DF62@ORSMSX102.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [128.224.146.207] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Question regarding throughput number with DPDK l2fwd with Wind River System's pktgen X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Sep 2013 17:56:37 -0000 Hi Chis, Here is the email I replied to via pktgen@gmail.com to Jun Han, which happens to match Venky's statements as well :-) Let me = know if you see anything else that maybe wrong with Pktgen, but the numbers= are correct :-) -------------------------------------------------------- Hi Jun, That does make more sense with 12x10G ports. By other papers do you have so= me links or able to share those papers? >>From what I can tell you now have 12x10G which means you have 6x20GBits or = 120Gbits of bi-directional bandwidth in the system. My previous email still= holds true as Pktgen can send and receive traffic at 10Gbits/s for 64 byte= s packets, which for a full-duplex port that is 20GBits of bandwidth. Using= (PortCnt/2) * 20Gbit =3D 120Gbit is the way I calculate the performance. Y= ou can check with Intel, but the performance looks correct to me. Only getting 80Gbits of performance for 64 byte packets seems low to me, as= I would have expected 120Gbits or the same as 1500 byte packets. It is pos= sible with your system has hit some bottleneck around the number of total p= ackets per second. Normally this is memory bandwidth or PCI bandwidth or tr= ansactions per second on the PCI bus. Run the system with 10 ports, 8 port, 6 ports, ... and see if the 64byte pa= cket rate changes as this will tell you something about the system total ba= ndwidth. For 10 you should get (10/2) * 20 =3D 100Gbits, ... Thank you, ++Keith ------------------------------- Keith Wiles pktgen.dpdk@gmail.com Principal Technologist for Networking Wind River Systems On Sep 21, 2013, at 9:05 AM, Jun Han > wrote: Hi Keith, I think you misunderstood my setup. As mentioned in my previous email, I ha= ve 6 dual-port 10Gbps NICs, meaning a total of 12 10G ports per machine. Th= ey are connected back to back to another machine with identical setup. Henc= e, we get a total of 120Gbps for 1500 Byte packets, and 80 Gbps for 64 Byte= packets. We did our theoretical calculation and find that it should theore= tically be possible as it does not hit the PCIe bandwidth or our machine, n= or does it exceed QPI bandwidth when packets are forwarded over the NUMA no= de. Our machine block diagram is as shown below, with three NICs per riser = slot. We were careful to pin the NIC ports appropriately to the cores of CP= U sockets that are directly connected to their Riser Slots. Do these numbers make sense to you? As stated in the previous email, we fin= d that these numbers are much higher than other papers in this domain so I = wanted to ask for your input or thought in this. Thank you very much, Jun On Sat, Sep 21, 2013 at 4:58 AM, Pktgen DPDK > wrote: Hi Jun, I do not have any numbers with that many ports as I have a very limited num= ber of machines and 10G NICs. I can tell you that Pktgen if setup correctly= and send 14.885 Mpps (million packet per second) or wire rate for 64 byte= packets. DPDK L2FWD code is able to forward wire rate for 64 byte packets.= If each port is sending wire rate traffic and receiving wire rate traffic = then you could have 10Gbits each direction or 20Gbits per port pair. You h= ave 6 ports or 3 port pairs doing 20Gbits x 3 =3D 60Gbits of traffic at 64 = byte packets. Assuming you do not hit a limit on the PCIe bus or NIC. On my Westmere machine with total of 4 10G ports on two NIC cards I can not= get 40Gbits of data, but I hit a PCIe bug and can only get about 32Gbits i= f I remember correctly. The newer systems do not have this bug. Sending larger frames then 64bytes means you send fewer packets per second = to obtain 10Gbits of data throughput. You can not get more then 10Gbits or = 20Gbits (bi-directional traffic) per port. If Pktgen is reporting more then 60Gbits per second for 6 ports of throughp= ut then Pktgen has a bug. If Pktgen is reporting more then 10Gbits of traff= ic RX or Tx then Pktgen has a bug. I have never seen Pktgen report more the= n 10Gbits Rx or Tx. The most thoughtput for 6 ports in this forwarding configuration would be 6= 0Gbits (3 x 20Gbits). If you had each port sending and receiving traffic an= d not in a forwarding configuration then you could get 20Gbits per port or = 120Gbits. Does this make sense? Lets say on a single machine you loopback the Tx/Rx on each port so the pac= ket sent is received by the same port then you would have 20Gbits of bi-dir= ectional traffic per port. The problem is that is not how your system is co= nfigured you are consuming two ports per 20Gbits of traffic. I hope I have the above correct as it is late for me :-) If you see somethi= ng wrong with my statements please let me know what I did wrong in my logic= . Thank you, ++Keith ------------------------------- Keith Wiles pktgen.dpdk@gmail.com Principal Technologist for Networking Wind River Systems On Sep 20, 2013, at 2:11 PM, Jun Han > wrote: Hi Ketih, Thanks so much for all your prompt replies. Thanks to you, we are now utili= zing your packet gen code. We have a question about the performance numbers we are getting measured th= rough your packet gen program. The current setup is the following: We have two machines, each equipped with 6 dual-port 10 GbE NICs. Machine 0= runs DPDK L2FWD code, and Machine 1 runs your packet gen. L2FWD is modifie= d to forward the incoming packets to other statically assigned output port.= With this setup, we are getting 120 Gbps throughput measured by your packe= t gen with packet size 1500 Bytes. For 64 Byte packets, we are getting arou= nd 80 Gbps. Do these performance numbers make sense? We are reading related papers in t= his domain, and seems like our numbers are unusually high. Could you please= give us your thoughts on this or share your performance numbers with your = setup? Thank you so much, JunKeith Wiles, Principal Technologist for Networking member of the CTO off= ice, Wind River direct 972.434.4136 mobile 940.213.5533 fax 000.000.0000 [Powering 30 Years of Innovation] On Sep 22, 2013, at 1:41 PM, Venkatesan, Venky > wrote: Chris, The numbers you are getting are correct. :) Practically speaking, most motherboards pin out between 4 and 5 x8 slots to= every CPU socket. At PCI-E Gen 2 speeds (5 GT/s), each slot is capable of = carrying 20 Gb/s of traffic (limited to ~16 Gb/s of 64B packets). I would = have expected the 64-byte traffic capacity to be a bit higher than 80 Gb/s= , but either way the numbers you are achieving are well within the capabili= ty of the system if you are careful about pinning cores to ports, which you= seem to be doing. QPI is not a limiter either for the amount of traffic yo= u are generating currently. Regards, -Venky -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Chris Pappas Sent: Sunday, September 22, 2013 7:32 AM To: dev@dpdk.org Subject: [dpdk-dev] Question regarding throughput number with DPDK l2fwd wi= th Wind River System's pktgen Hi, We have a question about the performance numbers we are getting measured th= rough the pktgen application provided by Wind River Systems. The current se= tup is the following: We have two machines, each equipped with 6 dual-port 10 GbE NICs (with a to= tal of 12 ports). Machine 0 runs DPDK L2FWD code, and Machine 1 runs Wind R= iver System's pktgen. L2FWD is modified to forward the incoming packets to = other statically assigned output port. Our machines have two Intel Xeon E5-2600 CPUs connected via QPI, and has tw= o riser slots each having three 10Gbps NICs. Two NICS in riser slot 1 (NIC0 and NIC1) is connected to CPU 1 via PCIe Gen3, while the remaining NIC2 is connected to CPU2 also via PCIe Gen3. In riser slot 2, all NICs (NI= Cs 3,4, and 5) are connected to CPU2 via PCIe Gen3. We were careful to assi= gn the NIC ports to cores of CPU sockets that have direct physical connecti= on to achieve max performance. With this setup, we are getting 120 Gbps throughput measured by pktgen with= packet size 1500 Bytes. For 64 Byte packets, we are getting around 80 Gbps= . Do these performance numbers make sense? We are reading related papers in t= his domain, and seems like our numbers are unusually high. We did our theor= etical calculation and find that it should theoretically be possible becaus= e it does not hit the PCIe bandwidth or our machine, nor does it exceed QPI= bandwidth when packets are forwarded over the NUMA node. Can you share you= r thoughts / experience with this? Thank you, Chris