DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev]  DPDK Performance issue with l2fwd
@ 2014-07-10  8:07 Zachary.Jen
  2014-07-10  8:40 ` Alex Markuze
  0 siblings, 1 reply; 7+ messages in thread
From: Zachary.Jen @ 2014-07-10  8:07 UTC (permalink / raw)
  To: dev; +Cc: Alan.Yu

Hey Guys,

Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
got a strange pheromone in my test.

When I used 12 ports to test the performance of l2fwd, it can work fine
and achieve 120G.
But it got abnormal when I using over than 12 port. Part of ports seems
something wrong and no any Tx/Rx.
Has anyone know about this?

My testing Environment.
1. E5-2658 v2 (10 cores) * 2
http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-25M-Cache-2_40-GHz
2. one core handle one port. (In order to get best performance.)
3. No any QPI crossing  issue.
4. l2fwd parameters
     4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
     4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
work well.
     4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
work well.

BTW, I have tried lots of parameter sets and if I set the ports number
over than 12 ports, it only first 10 ports got work.
Else, everything got well.

Can anyone help me to solve the issue? Or DPDK only can set less equal
than 12 ports?
Or DPDK max throughput is 120G?

本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-10  8:07 [dpdk-dev] DPDK Performance issue with l2fwd Zachary.Jen
@ 2014-07-10  8:40 ` Alex Markuze
  2014-07-10  9:29   ` Zachary.Jen
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Markuze @ 2014-07-10  8:40 UTC (permalink / raw)
  To: Zachary.Jen; +Cc: dev, Alan.Yu

Hi Zachary,
Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
128Gb/s[3].
Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.

Several thoughts that may help you:

1. You can figure out the max b/w by running netsurf over the kernel
interfaces (w/o DPDK). Each CPU can handle the Netperf and the Completion
interrupts with grace (packets of 64K and all offloads on) for 10Gb nics.
With more then 10 Nics I would disable the IRQ balancer and make sure
interrupts are spread evenly by setting the  IRQ affinity manually [2].
As long as you have a physical core(NO hyperthreading) per NIC port you can
figure out the MAX B/W you can get with all the nics.

2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
(Mellanox), In this case for each Netperf flow you will need to separate
each Netperf Stream and the interrupts to different Cores to Reach wire
speed as long as both cores are on the same NUMA node(lscpu).

Hope this helps.

[1]
http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf
[2]
http://h50146.www5.hp.com/products/software/oe/linux/mainstream/support/whitepaper/pdfs/4AA4-9294ENW.pdf
[3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x


On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-well.com> wrote:

> Hey Guys,
>
> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
> got a strange pheromone in my test.
>
> When I used 12 ports to test the performance of l2fwd, it can work fine
> and achieve 120G.
> But it got abnormal when I using over than 12 port. Part of ports seems
> something wrong and no any Tx/Rx.
> Has anyone know about this?
>
> My testing Environment.
> 1. E5-2658 v2 (10 cores) * 2
>
> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-25M-Cache-2_40-GHz
> 2. one core handle one port. (In order to get best performance.)
> 3. No any QPI crossing  issue.
> 4. l2fwd parameters
>      4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
>      4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
> work well.
>      4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
> work well.
>
> BTW, I have tried lots of parameter sets and if I set the ports number
> over than 12 ports, it only first 10 ports got work.
> Else, everything got well.
>
> Can anyone help me to solve the issue? Or DPDK only can set less equal
> than 12 ports?
> Or DPDK max throughput is 120G?
>
> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain
> confidential information. Please do not use or disclose it in any way and
> delete it if you are not the intended recipient.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-10  8:40 ` Alex Markuze
@ 2014-07-10  9:29   ` Zachary.Jen
  2014-07-10 15:53     ` Richardson, Bruce
  0 siblings, 1 reply; 7+ messages in thread
From: Zachary.Jen @ 2014-07-10  9:29 UTC (permalink / raw)
  To: dev; +Cc: Alan.Yu

Hi Alex:

Thanks for your help.

I forget to describe some criteria in my original post.

At first, I has confirmed my 82599 has connected by PCIe Gen3 (Gen3 x8) speed. The theoretical bandwidth can support over 160G in total.
Hence, It should get full speed in my test.

Second, I have ever check the performance w/o DPDK in packet size 1518 in the same environment, and indeed it can get 160G totally (by IRQ balance method).
So, I was so surprised to get this kinds of result in DPDK (I also use size 1518 to test DPDK).

BTW, I can get 120G throughput in 12 ports already. But when I add more than 12 ports, I only can get 100G.
Why the performance gets less than 120G? Why only 10 ports works fine and NO Tx and Rx in the others?
Is it bugs or limitations in DPDK?

Has anyone every do the similar or the same test?


On 07/10/2014 04:40 PM, Alex Markuze wrote:
Hi Zachary,
Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to 128Gb/s[3].
Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.

Several thoughts that may help you:

1. You can figure out the max b/w by running netsurf over the kernel interfaces (w/o DPDK). Each CPU can handle the Netperf and the Completion interrupts with grace (packets of 64K and all offloads on) for 10Gb nics.
With more then 10 Nics I would disable the IRQ balancer and make sure interrupts are spread evenly by setting the  IRQ affinity manually [2].
As long as you have a physical core(NO hyperthreading) per NIC port you can figure out the MAX B/W you can get with all the nics.

2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics (Mellanox), In this case for each Netperf flow you will need to separate each Netperf Stream and the interrupts to different Cores to Reach wire speed as long as both cores are on the same NUMA node(lscpu).

Hope this helps.

[1]http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf
[2]http://h50146.www5.hp.com/products/software/oe/linux/mainstream/support/whitepaper/pdfs/4AA4-9294ENW.pdf
[3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x


On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-well.com<mailto:Zachary.Jen@cas-well.com>> wrote:
Hey Guys,

Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
got a strange pheromone in my test.

When I used 12 ports to test the performance of l2fwd, it can work fine
and achieve 120G.
But it got abnormal when I using over than 12 port. Part of ports seems
something wrong and no any Tx/Rx.
Has anyone know about this?

My testing Environment.
1. E5-2658 v2 (10 cores) * 2
http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-25M-Cache-2_40-GHz
2. one core handle one port. (In order to get best performance.)
3. No any QPI crossing  issue.
4. l2fwd parameters
     4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
     4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
work well.
     4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
work well.

BTW, I have tried lots of parameter sets and if I set the ports number
over than 12 ports, it only first 10 ports got work.
Else, everything got well.

Can anyone help me to solve the issue? Or DPDK only can set less equal
than 12 ports?
Or DPDK max throughput is 120G?

本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.



--
Best Regards,
Zachary Jen

Software RD
CAS-WELL Inc.
8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
Tel: +886-2-7705-8888#6305
Fax: +886-2-7731-9988

本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-10  9:29   ` Zachary.Jen
@ 2014-07-10 15:53     ` Richardson, Bruce
  2014-07-11 11:04       ` Zachary.Jen
  0 siblings, 1 reply; 7+ messages in thread
From: Richardson, Bruce @ 2014-07-10 15:53 UTC (permalink / raw)
  To: Zachary.Jen, dev; +Cc: Alan.Yu

Hi,

Have you tried running a test with 16 ports using any other applications, for example testpmd?

Regards,
/Bruce

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zachary.Jen@cas-
> well.com
> Sent: Thursday, July 10, 2014 2:29 AM
> To: dev@dpdk.org
> Cc: Alan.Yu@cas-well.com
> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
> 
> Hi Alex:
> 
> Thanks for your help.
> 
> I forget to describe some criteria in my original post.
> 
> At first, I has confirmed my 82599 has connected by PCIe Gen3 (Gen3 x8) speed.
> The theoretical bandwidth can support over 160G in total.
> Hence, It should get full speed in my test.
> 
> Second, I have ever check the performance w/o DPDK in packet size 1518 in the
> same environment, and indeed it can get 160G totally (by IRQ balance method).
> So, I was so surprised to get this kinds of result in DPDK (I also use size 1518 to
> test DPDK).
> 
> BTW, I can get 120G throughput in 12 ports already. But when I add more than
> 12 ports, I only can get 100G.
> Why the performance gets less than 120G? Why only 10 ports works fine and NO
> Tx and Rx in the others?
> Is it bugs or limitations in DPDK?
> 
> Has anyone every do the similar or the same test?
> 
> 
> On 07/10/2014 04:40 PM, Alex Markuze wrote:
> Hi Zachary,
> Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
> 128Gb/s[3].
> Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.
> 
> Several thoughts that may help you:
> 
> 1. You can figure out the max b/w by running netsurf over the kernel interfaces
> (w/o DPDK). Each CPU can handle the Netperf and the Completion interrupts
> with grace (packets of 64K and all offloads on) for 10Gb nics.
> With more then 10 Nics I would disable the IRQ balancer and make sure
> interrupts are spread evenly by setting the  IRQ affinity manually [2].
> As long as you have a physical core(NO hyperthreading) per NIC port you can
> figure out the MAX B/W you can get with all the nics.
> 
> 2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
> (Mellanox), In this case for each Netperf flow you will need to separate each
> Netperf Stream and the interrupts to different Cores to Reach wire speed as
> long as both cores are on the same NUMA node(lscpu).
> 
> Hope this helps.
> 
> [1]http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revis
> ion_3.0.pdf
> [2]http://h50146.www5.hp.com/products/software/oe/linux/mainstream/supp
> ort/whitepaper/pdfs/4AA4-9294ENW.pdf
> [3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x
> 
> 
> On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-
> well.com<mailto:Zachary.Jen@cas-well.com>> wrote:
> Hey Guys,
> 
> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
> got a strange pheromone in my test.
> 
> When I used 12 ports to test the performance of l2fwd, it can work fine
> and achieve 120G.
> But it got abnormal when I using over than 12 port. Part of ports seems
> something wrong and no any Tx/Rx.
> Has anyone know about this?
> 
> My testing Environment.
> 1. E5-2658 v2 (10 cores) * 2
> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-
> 25M-Cache-2_40-GHz
> 2. one core handle one port. (In order to get best performance.)
> 3. No any QPI crossing  issue.
> 4. l2fwd parameters
>      4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
>      4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
> work well.
>      4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
> work well.
> 
> BTW, I have tried lots of parameter sets and if I set the ports number
> over than 12 ports, it only first 10 ports got work.
> Else, everything got well.
> 
> Can anyone help me to solve the issue? Or DPDK only can set less equal
> than 12 ports?
> Or DPDK max throughput is 120G?
> 
> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
> 信件內容,並請銷毀此信件。 This email may contain confidential
> information. Please do not use or disclose it in any way and delete it if you are
> not the intended recipient.
> 
> 
> 
> --
> Best Regards,
> Zachary Jen
> 
> Software RD
> CAS-WELL Inc.
> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
> Tel: +886-2-7705-8888#6305
> Fax: +886-2-7731-9988
> 
> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
> 信件內容,並請銷毀此信件。 This email may contain confidential
> information. Please do not use or disclose it in any way and delete it if you are
> not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-10 15:53     ` Richardson, Bruce
@ 2014-07-11 11:04       ` Zachary.Jen
  2014-07-11 14:28         ` Richardson, Bruce
  0 siblings, 1 reply; 7+ messages in thread
From: Zachary.Jen @ 2014-07-11 11:04 UTC (permalink / raw)
  To: dev; +Cc: Alan.Yu

Hi Bruce:

Thanks for your suggestion.
I have tried to use testpmd to test 16 ports today.
The result is so Interested. It can work, although some ports get low
performance (only get 80%).

Besides, I also do another test in l2fwd.
I tried to use 82580 * 16 ports in the same platform with l2fwd test.
The same situation happened again.
So, it seems a big bug hidden in the l2fwd.

Have someone get the similar case?

BTW, may this issue relate with DPDK version?


On 07/10/2014 11:53 PM, Richardson, Bruce wrote:
> Hi,
>
> Have you tried running a test with 16 ports using any other applications, for example testpmd?
>
> Regards,
> /Bruce
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zachary.Jen@cas-
>> well.com
>> Sent: Thursday, July 10, 2014 2:29 AM
>> To: dev@dpdk.org
>> Cc: Alan.Yu@cas-well.com
>> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
>>
>> Hi Alex:
>>
>> Thanks for your help.
>>
>> I forget to describe some criteria in my original post.
>>
>> At first, I has confirmed my 82599 has connected by PCIe Gen3 (Gen3 x8) speed.
>> The theoretical bandwidth can support over 160G in total.
>> Hence, It should get full speed in my test.
>>
>> Second, I have ever check the performance w/o DPDK in packet size 1518 in the
>> same environment, and indeed it can get 160G totally (by IRQ balance method).
>> So, I was so surprised to get this kinds of result in DPDK (I also use size 1518 to
>> test DPDK).
>>
>> BTW, I can get 120G throughput in 12 ports already. But when I add more than
>> 12 ports, I only can get 100G.
>> Why the performance gets less than 120G? Why only 10 ports works fine and NO
>> Tx and Rx in the others?
>> Is it bugs or limitations in DPDK?
>>
>> Has anyone every do the similar or the same test?
>>
>>
>> On 07/10/2014 04:40 PM, Alex Markuze wrote:
>> Hi Zachary,
>> Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
>> 128Gb/s[3].
>> Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.
>>
>> Several thoughts that may help you:
>>
>> 1. You can figure out the max b/w by running netsurf over the kernel interfaces
>> (w/o DPDK). Each CPU can handle the Netperf and the Completion interrupts
>> with grace (packets of 64K and all offloads on) for 10Gb nics.
>> With more then 10 Nics I would disable the IRQ balancer and make sure
>> interrupts are spread evenly by setting the  IRQ affinity manually [2].
>> As long as you have a physical core(NO hyperthreading) per NIC port you can
>> figure out the MAX B/W you can get with all the nics.
>>
>> 2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
>> (Mellanox), In this case for each Netperf flow you will need to separate each
>> Netperf Stream and the interrupts to different Cores to Reach wire speed as
>> long as both cores are on the same NUMA node(lscpu).
>>
>> Hope this helps.
>>
>> [1]http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revis
>> ion_3.0.pdf
>> [2]http://h50146.www5.hp.com/products/software/oe/linux/mainstream/supp
>> ort/whitepaper/pdfs/4AA4-9294ENW.pdf
>> [3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x
>>
>>
>> On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-
>> well.com<mailto:Zachary.Jen@cas-well.com>> wrote:
>> Hey Guys,
>>
>> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
>> got a strange pheromone in my test.
>>
>> When I used 12 ports to test the performance of l2fwd, it can work fine
>> and achieve 120G.
>> But it got abnormal when I using over than 12 port. Part of ports seems
>> something wrong and no any Tx/Rx.
>> Has anyone know about this?
>>
>> My testing Environment.
>> 1. E5-2658 v2 (10 cores) * 2
>> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-
>> 25M-Cache-2_40-GHz
>> 2. one core handle one port. (In order to get best performance.)
>> 3. No any QPI crossing  issue.
>> 4. l2fwd parameters
>>       4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
>>       4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
>> work well.
>>       4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
>> work well.
>>
>> BTW, I have tried lots of parameter sets and if I set the ports number
>> over than 12 ports, it only first 10 ports got work.
>> Else, everything got well.
>>
>> Can anyone help me to solve the issue? Or DPDK only can set less equal
>> than 12 ports?
>> Or DPDK max throughput is 120G?
>>
>> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
>> 信件內容,並請銷毀此信件。 This email may contain confidential
>> information. Please do not use or disclose it in any way and delete it if you are
>> not the intended recipient.
>>
>>
>>
>> --
>> Best Regards,
>> Zachary Jen
>>
>> Software RD
>> CAS-WELL Inc.
>> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
>> Tel: +886-2-7705-8888#6305
>> Fax: +886-2-7731-9988
>>
>> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
>> 信件內容,並請銷毀此信件。 This email may contain confidential
>> information. Please do not use or disclose it in any way and delete it if you are
>> not the intended recipient.

--
Best Regards,
Zachary Jen

Software RD
CAS-WELL Inc.
8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
Tel: +886-2-7705-8888#6305
Fax: +886-2-7731-9988
本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-11 11:04       ` Zachary.Jen
@ 2014-07-11 14:28         ` Richardson, Bruce
  2014-07-12 17:55           ` Zachary.Jen
  0 siblings, 1 reply; 7+ messages in thread
From: Richardson, Bruce @ 2014-07-11 14:28 UTC (permalink / raw)
  To: Zachary.Jen, dev; +Cc: Alan.Yu



> -----Original Message-----
> From: Zachary.Jen@cas-well.com [mailto:Zachary.Jen@cas-well.com]
> Sent: Friday, July 11, 2014 4:05 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce; Alan.Yu@cas-well.com
> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
> 
> Hi Bruce:
> 
> Thanks for your suggestion.
> I have tried to use testpmd to test 16 ports today.
> The result is so Interested. It can work, although some ports get low
> performance (only get 80%).

For small packets, PCI bandwidth can be an issue. Do you get 80% of line rate with 64-byte packets or with larger packet sizes? How many testpmd cores were you using, and what parameters were you setting in testpmd?

> 
> Besides, I also do another test in l2fwd.
> I tried to use 82580 * 16 ports in the same platform with l2fwd test.
> The same situation happened again.
> So, it seems a big bug hidden in the l2fwd.

From what you report, that does indeed seem to be the case. We'll have to take another look at the code to see if there is an issue there.

/Bruce

> 
> Have someone get the similar case?
> 
> BTW, may this issue relate with DPDK version?
> 
> 
> On 07/10/2014 11:53 PM, Richardson, Bruce wrote:
> > Hi,
> >
> > Have you tried running a test with 16 ports using any other applications, for
> example testpmd?
> >
> > Regards,
> > /Bruce
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zachary.Jen@cas-
> >> well.com
> >> Sent: Thursday, July 10, 2014 2:29 AM
> >> To: dev@dpdk.org
> >> Cc: Alan.Yu@cas-well.com
> >> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
> >>
> >> Hi Alex:
> >>
> >> Thanks for your help.
> >>
> >> I forget to describe some criteria in my original post.
> >>
> >> At first, I has confirmed my 82599 has connected by PCIe Gen3 (Gen3 x8)
> speed.
> >> The theoretical bandwidth can support over 160G in total.
> >> Hence, It should get full speed in my test.
> >>
> >> Second, I have ever check the performance w/o DPDK in packet size 1518 in
> the
> >> same environment, and indeed it can get 160G totally (by IRQ balance
> method).
> >> So, I was so surprised to get this kinds of result in DPDK (I also use size 1518
> to
> >> test DPDK).
> >>
> >> BTW, I can get 120G throughput in 12 ports already. But when I add more
> than
> >> 12 ports, I only can get 100G.
> >> Why the performance gets less than 120G? Why only 10 ports works fine and
> NO
> >> Tx and Rx in the others?
> >> Is it bugs or limitations in DPDK?
> >>
> >> Has anyone every do the similar or the same test?
> >>
> >>
> >> On 07/10/2014 04:40 PM, Alex Markuze wrote:
> >> Hi Zachary,
> >> Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
> >> 128Gb/s[3].
> >> Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.
> >>
> >> Several thoughts that may help you:
> >>
> >> 1. You can figure out the max b/w by running netsurf over the kernel
> interfaces
> >> (w/o DPDK). Each CPU can handle the Netperf and the Completion interrupts
> >> with grace (packets of 64K and all offloads on) for 10Gb nics.
> >> With more then 10 Nics I would disable the IRQ balancer and make sure
> >> interrupts are spread evenly by setting the  IRQ affinity manually [2].
> >> As long as you have a physical core(NO hyperthreading) per NIC port you can
> >> figure out the MAX B/W you can get with all the nics.
> >>
> >> 2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
> >> (Mellanox), In this case for each Netperf flow you will need to separate each
> >> Netperf Stream and the interrupts to different Cores to Reach wire speed as
> >> long as both cores are on the same NUMA node(lscpu).
> >>
> >> Hope this helps.
> >>
> >>
> [1]http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revis
> >> ion_3.0.pdf
> >>
> [2]http://h50146.www5.hp.com/products/software/oe/linux/mainstream/supp
> >> ort/whitepaper/pdfs/4AA4-9294ENW.pdf
> >> [3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x
> >>
> >>
> >> On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-
> >> well.com<mailto:Zachary.Jen@cas-well.com>> wrote:
> >> Hey Guys,
> >>
> >> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
> >> got a strange pheromone in my test.
> >>
> >> When I used 12 ports to test the performance of l2fwd, it can work fine
> >> and achieve 120G.
> >> But it got abnormal when I using over than 12 port. Part of ports seems
> >> something wrong and no any Tx/Rx.
> >> Has anyone know about this?
> >>
> >> My testing Environment.
> >> 1. E5-2658 v2 (10 cores) * 2
> >> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-
> v2-
> >> 25M-Cache-2_40-GHz
> >> 2. one core handle one port. (In order to get best performance.)
> >> 3. No any QPI crossing  issue.
> >> 4. l2fwd parameters
> >>       4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
> >>       4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
> >> work well.
> >>       4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
> >> work well.
> >>
> >> BTW, I have tried lots of parameter sets and if I set the ports number
> >> over than 12 ports, it only first 10 ports got work.
> >> Else, everything got well.
> >>
> >> Can anyone help me to solve the issue? Or DPDK only can set less equal
> >> than 12 ports?
> >> Or DPDK max throughput is 120G?
> >>
> >> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露
> 本
> >> 信件內容,並請銷毀此信件。 This email may contain confidential
> >> information. Please do not use or disclose it in any way and delete it if you
> are
> >> not the intended recipient.
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Zachary Jen
> >>
> >> Software RD
> >> CAS-WELL Inc.
> >> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
> >> Tel: +886-2-7705-8888#6305
> >> Fax: +886-2-7731-9988
> >>
> >> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露
> 本
> >> 信件內容,並請銷毀此信件。 This email may contain confidential
> >> information. Please do not use or disclose it in any way and delete it if you
> are
> >> not the intended recipient.
> 
> --
> Best Regards,
> Zachary Jen
> 
> Software RD
> CAS-WELL Inc.
> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
> Tel: +886-2-7705-8888#6305
> Fax: +886-2-7731-9988
> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
> 信件內容,並請銷毀此信件。 This email may contain confidential
> information. Please do not use or disclose it in any way and delete it if you are
> not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] DPDK Performance issue with l2fwd
  2014-07-11 14:28         ` Richardson, Bruce
@ 2014-07-12 17:55           ` Zachary.Jen
  0 siblings, 0 replies; 7+ messages in thread
From: Zachary.Jen @ 2014-07-12 17:55 UTC (permalink / raw)
  To: dev; +Cc: Alan.Yu

Hi Bruce:

Thanks for your reply so much.

All my test are using 1518 byte packets and keep the criteria which I
told before due to I need to know how many Gb can be handle in our platform.
Originally, I think it should no any limitations in DPDK. But, it seems
a little surprised for me when I get 160Gb by IRQ balance method but
DPDK can't.
After testpmd app. experiment, I found it can achieve ~160Gb, although
some ports can not get full speed.
At least, it proves one of my aspect that it should no any limitations
in DPDK.
Therefore, it means there should have some thing wrong in l2fwd.

BTW, could you give me some direction for where the root cause may
locate in.
Maybe I can try to trace the code to check where is the problems.

Anyway, thank for your great help



On 07/11/2014 10:28 PM, Richardson, Bruce wrote:
>
>> -----Original Message-----
>> From: Zachary.Jen@cas-well.com [mailto:Zachary.Jen@cas-well.com]
>> Sent: Friday, July 11, 2014 4:05 AM
>> To: dev@dpdk.org
>> Cc: Richardson, Bruce; Alan.Yu@cas-well.com
>> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
>>
>> Hi Bruce:
>>
>> Thanks for your suggestion.
>> I have tried to use testpmd to test 16 ports today.
>> The result is so Interested. It can work, although some ports get low
>> performance (only get 80%).
> For small packets, PCI bandwidth can be an issue. Do you get 80% of line rate with 64-byte packets or with larger packet sizes? How many testpmd cores were you using, and what parameters were you setting in testpmd?
>
>> Besides, I also do another test in l2fwd.
>> I tried to use 82580 * 16 ports in the same platform with l2fwd test.
>> The same situation happened again.
>> So, it seems a big bug hidden in the l2fwd.
>  From what you report, that does indeed seem to be the case. We'll have to take another look at the code to see if there is an issue there.
>
> /Bruce
>
>> Have someone get the similar case?
>>
>> BTW, may this issue relate with DPDK version?
>>
>>
>> On 07/10/2014 11:53 PM, Richardson, Bruce wrote:
>>> Hi,
>>>
>>> Have you tried running a test with 16 ports using any other applications, for
>> example testpmd?
>>> Regards,
>>> /Bruce
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zachary.Jen@cas-
>>>> well.com
>>>> Sent: Thursday, July 10, 2014 2:29 AM
>>>> To: dev@dpdk.org
>>>> Cc: Alan.Yu@cas-well.com
>>>> Subject: Re: [dpdk-dev] DPDK Performance issue with l2fwd
>>>>
>>>> Hi Alex:
>>>>
>>>> Thanks for your help.
>>>>
>>>> I forget to describe some criteria in my original post.
>>>>
>>>> At first, I has confirmed my 82599 has connected by PCIe Gen3 (Gen3 x8)
>> speed.
>>>> The theoretical bandwidth can support over 160G in total.
>>>> Hence, It should get full speed in my test.
>>>>
>>>> Second, I have ever check the performance w/o DPDK in packet size 1518 in
>> the
>>>> same environment, and indeed it can get 160G totally (by IRQ balance
>> method).
>>>> So, I was so surprised to get this kinds of result in DPDK (I also use size 1518
>> to
>>>> test DPDK).
>>>>
>>>> BTW, I can get 120G throughput in 12 ports already. But when I add more
>> than
>>>> 12 ports, I only can get 100G.
>>>> Why the performance gets less than 120G? Why only 10 ports works fine and
>> NO
>>>> Tx and Rx in the others?
>>>> Is it bugs or limitations in DPDK?
>>>>
>>>> Has anyone every do the similar or the same test?
>>>>
>>>>
>>>> On 07/10/2014 04:40 PM, Alex Markuze wrote:
>>>> Hi Zachary,
>>>> Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
>>>> 128Gb/s[3].
>>>> Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.
>>>>
>>>> Several thoughts that may help you:
>>>>
>>>> 1. You can figure out the max b/w by running netsurf over the kernel
>> interfaces
>>>> (w/o DPDK). Each CPU can handle the Netperf and the Completion interrupts
>>>> with grace (packets of 64K and all offloads on) for 10Gb nics.
>>>> With more then 10 Nics I would disable the IRQ balancer and make sure
>>>> interrupts are spread evenly by setting the  IRQ affinity manually [2].
>>>> As long as you have a physical core(NO hyperthreading) per NIC port you can
>>>> figure out the MAX B/W you can get with all the nics.
>>>>
>>>> 2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
>>>> (Mellanox), In this case for each Netperf flow you will need to separate each
>>>> Netperf Stream and the interrupts to different Cores to Reach wire speed as
>>>> long as both cores are on the same NUMA node(lscpu).
>>>>
>>>> Hope this helps.
>>>>
>>>>
>> [1]http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revis
>>>> ion_3.0.pdf
>>>>
>> [2]http://h50146.www5.hp.com/products/software/oe/linux/mainstream/supp
>>>> ort/whitepaper/pdfs/4AA4-9294ENW.pdf
>>>> [3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x
>>>>
>>>>
>>>> On Thu, Jul 10, 2014 at 11:07 AM, <Zachary.Jen@cas-
>>>> well.com<mailto:Zachary.Jen@cas-well.com>> wrote:
>>>> Hey Guys,
>>>>
>>>> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
>>>> got a strange pheromone in my test.
>>>>
>>>> When I used 12 ports to test the performance of l2fwd, it can work fine
>>>> and achieve 120G.
>>>> But it got abnormal when I using over than 12 port. Part of ports seems
>>>> something wrong and no any Tx/Rx.
>>>> Has anyone know about this?
>>>>
>>>> My testing Environment.
>>>> 1. E5-2658 v2 (10 cores) * 2
>>>> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-
>> v2-
>>>> 25M-Cache-2_40-GHz
>>>> 2. one core handle one port. (In order to get best performance.)
>>>> 3. No any QPI crossing  issue.
>>>> 4. l2fwd parameters
>>>>        4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
>>>>        4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
>>>> work well.
>>>>        4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
>>>> work well.
>>>>
>>>> BTW, I have tried lots of parameter sets and if I set the ports number
>>>> over than 12 ports, it only first 10 ports got work.
>>>> Else, everything got well.
>>>>
>>>> Can anyone help me to solve the issue? Or DPDK only can set less equal
>>>> than 12 ports?
>>>> Or DPDK max throughput is 120G?
>>>>
>>>> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露
>> 本
>>>> 信件內容,並請銷毀此信件。 This email may contain confidential
>>>> information. Please do not use or disclose it in any way and delete it if you
>> are
>>>> not the intended recipient.
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Zachary Jen
>>>>
>>>> Software RD
>>>> CAS-WELL Inc.
>>>> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
>>>> Tel: +886-2-7705-8888#6305
>>>> Fax: +886-2-7731-9988
>>>>
>>>> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露
>> 本
>>>> 信件內容,並請銷毀此信件。 This email may contain confidential
>>>> information. Please do not use or disclose it in any way and delete it if you
>> are
>>>> not the intended recipient.
>> --
>> Best Regards,
>> Zachary Jen
>>
>> Software RD
>> CAS-WELL Inc.
>> 8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
>> Tel: +886-2-7705-8888#6305
>> Fax: +886-2-7731-9988
>> 本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本
>> 信件內容,並請銷毀此信件。 This email may contain confidential
>> information. Please do not use or disclose it in any way and delete it if you are
>> not the intended recipient.

--
Best Regards,
Zachary Jen

Software RD
CAS-WELL Inc.
8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
Tel: +886-2-7705-8888#6305
Fax: +886-2-7731-9988
本信件可能包含瑞祺電通機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-12 17:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-10  8:07 [dpdk-dev] DPDK Performance issue with l2fwd Zachary.Jen
2014-07-10  8:40 ` Alex Markuze
2014-07-10  9:29   ` Zachary.Jen
2014-07-10 15:53     ` Richardson, Bruce
2014-07-11 11:04       ` Zachary.Jen
2014-07-11 14:28         ` Richardson, Bruce
2014-07-12 17:55           ` Zachary.Jen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).