DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev]  ways to generate 40Gbps with two NICs x two ports?
@ 2013-11-19 16:09 jinho hwang
  2013-11-19 16:31 ` Wiles, Roger Keith
  0 siblings, 1 reply; 12+ messages in thread
From: jinho hwang @ 2013-11-19 16:09 UTC (permalink / raw)
  To: dev

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 16:09 [dpdk-dev] ways to generate 40Gbps with two NICs x two ports? jinho hwang
@ 2013-11-19 16:31 ` Wiles, Roger Keith
  2013-11-19 16:42   ` jinho hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 16:31 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC 82599x two ports. My machine has a PCIe bug that does not allow me to send on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but the forth port does is about 70% of wire rate because of the PCIe hardware bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 16:31 ` Wiles, Roger Keith
@ 2013-11-19 16:42   ` jinho hwang
  2013-11-19 16:52     ` Wiles, Roger Keith
  2013-11-19 16:54     ` Wiles, Roger Keith
  0 siblings, 2 replies; 12+ messages in thread
From: jinho hwang @ 2013-11-19 16:42 UTC (permalink / raw)
  To: Wiles, Roger Keith; +Cc: dev

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com> wrote:
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>

Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

   lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

    Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

    1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

    2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

    3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

    4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

    5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

    6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

    7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

    8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

    0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

    1, nb_lcores  2, private 0x700208, lcores:  3  4

    2, nb_lcores  2, private 0x702e70, lcores:  5  6

    3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

    Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


    Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

    Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


    Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

    Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


    Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

    Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


    Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

    Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 16:42   ` jinho hwang
@ 2013-11-19 16:52     ` Wiles, Roger Keith
  2013-11-19 16:54     ` Wiles, Roger Keith
  1 sibling, 0 replies; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 16:52 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

Sorry I miss-typed the speed of my machine it is 2.4Ghz not 3.4GHz, but that should not change the problem here.

I am not sure how to determine your machine has a problem other then starting up one port at a time and see if the rate drops when you start up the forth port.

Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:
How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

  lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

   Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

   1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

   2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

   3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

   4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

   5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

   6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

   7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

   8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

   0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

   1, nb_lcores  2, private 0x700208, lcores:  3  4

   2, nb_lcores  2, private 0x702e70, lcores:  5  6

   3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

   Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

   Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

   Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

   Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 16:42   ` jinho hwang
  2013-11-19 16:52     ` Wiles, Roger Keith
@ 2013-11-19 16:54     ` Wiles, Roger Keith
  2013-11-19 17:04       ` jinho hwang
  1 sibling, 1 reply; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 16:54 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration.


Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:
How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

  lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

   Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

   1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

   2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

   3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

   4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

   5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

   6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

   7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

   8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

   0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

   1, nb_lcores  2, private 0x700208, lcores:  3  4

   2, nb_lcores  2, private 0x702e70, lcores:  5  6

   3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

   Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

   Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

   Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

   Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


   Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

   Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 16:54     ` Wiles, Roger Keith
@ 2013-11-19 17:04       ` jinho hwang
  2013-11-19 17:24         ` Wiles, Roger Keith
  0 siblings, 1 reply; 12+ messages in thread
From: jinho hwang @ 2013-11-19 17:04 UTC (permalink / raw)
  To: Wiles, Roger Keith; +Cc: dev

On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles@windriver.com> wrote:
>
> BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
>   lcore:     0     1     2     3     4     5     6     7     8
>
> port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1
>
> port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1
>
> Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1
>
>    Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
>    1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
>    2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
>    3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
>    4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
>    5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
>    6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
>    7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
>    8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
>    0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2
>
>    1, nb_lcores  2, private 0x700208, lcores:  3  4
>
>    2, nb_lcores  2, private 0x702e70, lcores:  5  6
>
>    3, nb_lcores  2, private 0x705ad8, lcores:  7  8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4
>
>    Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>    Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5
>
>    Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>    Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c
>
>    Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>    Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d
>
>    Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>    Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>    Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
>
> Total memory used =  41003 KB
>
> Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore  1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore  2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore  3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore  4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore  5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore  6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore  7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore  8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>

The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.

Do you have any way that I can calculate the hw limitation?

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 17:04       ` jinho hwang
@ 2013-11-19 17:24         ` Wiles, Roger Keith
  2013-11-19 17:35           ` jinho hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 17:24 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

Normally when I see this problem it means the the lcores are not mapped correctly. What can happen is you have a Rx and a TX on the same physical core or two RX/TX on the same physical core.

Make sure you have a Rx or Tx running on a single core look at the cpu_layout.py output and verify the configuration is correct. If you have 8 physical cores in the then you need to make sure on one of the lcores on that core is being used.

Let me know what happens.

Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:

BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration.


Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533


On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:

How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

 lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

  Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

  1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

  2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

  3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

  4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

  5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

  6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

  7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

  8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

  0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

  1, nb_lcores  2, private 0x700208, lcores:  3  4

  2, nb_lcores  2, private 0x702e70, lcores:  5  6

  3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

  Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


  Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

  Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


  Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

  Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


  Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

  Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


  Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

  Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho



The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.

Do you have any way that I can calculate the hw limitation?

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 17:24         ` Wiles, Roger Keith
@ 2013-11-19 17:35           ` jinho hwang
  2013-11-19 21:18             ` Wiles, Roger Keith
  0 siblings, 1 reply; 12+ messages in thread
From: jinho hwang @ 2013-11-19 17:35 UTC (permalink / raw)
  To: Wiles, Roger Keith; +Cc: dev

On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
<keith.wiles@windriver.com> wrote:
> Normally when I see this problem it means the the lcores are not mapped
> correctly. What can happen is you have a Rx and a TX on the same physical
> core or two RX/TX on the same physical core.
>
> Make sure you have a Rx or Tx running on a single core look at the
> cpu_layout.py output and verify the configuration is correct. If you have 8
> physical cores in the then you need to make sure on one of the lcores on
> that core is being used.
>
> Let me know what happens.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
>
> BTW, the configuration looks fine, but you need to make sure the lcores are
> not split between two different CPU sockets. You can use the
> dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
>  lcore:     0     1     2     3     4     5     6     7     8
>
> port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1
>
> port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1
>
> Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1
>
>   Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
>   1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
>   2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
>   3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
>   4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
>   5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
>   6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
>   7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
>   8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
>   0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2
>
>   1, nb_lcores  2, private 0x700208, lcores:  3  4
>
>   2, nb_lcores  2, private 0x702e70, lcores:  5  6
>
>   3, nb_lcores  2, private 0x705ad8, lcores:  7  8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4
>
>   Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>   Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5
>
>   Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>   Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c
>
>   Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>   Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d
>
>   Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>   Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>   Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
>
> Total memory used =  41003 KB
>
> Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore  1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore  2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore  3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore  4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore  5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore  6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore  7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore  8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>
>
> The phenomenon is that when I start one port in one NIC, it reaches
> 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
> each = 20Gbps. But, when I start two port in one NIC, it becomes
> 5.8Gbps each. This is persistent when cores are assigned
> differently---cross sockets and the same sockets. Since the size of
> huge pages are fixed, it will not be a problem. Should we say this is
> the limitation on NIC or bus? The reason I think this may be a hw
> limitation is that regardless of packet sizes, two ports in one NIC
> can only send 5.8Gbps maximum.
>
> Do you have any way that I can calculate the hw limitation?
>
> Jinho
>
>

My cpu configuration is as follows:

============================================================

Core and Socket Information (as reported by '/proc/cpuinfo')

============================================================
cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
        Socket 1        Socket 0
        ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

When I use just two ports for testing, I use this configuration.

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2:4].0, [6:8].1" -f test/forward.lua

As you can see the core numbers, 2, 4, 6, 8 are all in different
physical cores and are assigned separately. I am not sure that happens
with core configuration. Do have any other thoughts we may try?

Thanks,

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 17:35           ` jinho hwang
@ 2013-11-19 21:18             ` Wiles, Roger Keith
  2013-11-19 21:33               ` jinho hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 21:18 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

Give this a try, if that does not work then something else is going on here. I am trying to make sure we do not cross the QPI for any reason putting the RX/TX queues related to a port on the same core.

sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua

cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
       Socket 1        Socket 0
       ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:
Normally when I see this problem it means the the lcores are not mapped
correctly. What can happen is you have a Rx and a TX on the same physical
core or two RX/TX on the same physical core.

Make sure you have a Rx or Tx running on a single core look at the
cpu_layout.py output and verify the configuration is correct. If you have 8
physical cores in the then you need to make sure on one of the lcores on
that core is being used.

Let me know what happens.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533

On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:


BTW, the configuration looks fine, but you need to make sure the lcores are
not split between two different CPU sockets. You can use the
dpdk/tools/cpu_layout.py to do dump out the system configuration.


Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533


On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:

How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

 Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

 1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

 2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

 3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

 4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

 5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

 6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

 7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

 8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

 0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

 1, nb_lcores  2, private 0x700208, lcores:  3  4

 2, nb_lcores  2, private 0x702e70, lcores:  5  6

 3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

 Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

 Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

 Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

 Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


 Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

 Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho



The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.

Do you have any way that I can calculate the hw limitation?

Jinho



My cpu configuration is as follows:

============================================================

Core and Socket Information (as reported by '/proc/cpuinfo')

============================================================
cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
       Socket 1        Socket 0
       ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

When I use just two ports for testing, I use this configuration.

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2:4].0, [6:8].1" -f test/forward.lua

As you can see the core numbers, 2, 4, 6, 8 are all in different
physical cores and are assigned separately. I am not sure that happens
with core configuration. Do have any other thoughts we may try?

Thanks,

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 21:18             ` Wiles, Roger Keith
@ 2013-11-19 21:33               ` jinho hwang
  2013-11-19 21:38                 ` Wiles, Roger Keith
  0 siblings, 1 reply; 12+ messages in thread
From: jinho hwang @ 2013-11-19 21:33 UTC (permalink / raw)
  To: Wiles, Roger Keith; +Cc: dev

On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith
<keith.wiles@windriver.com> wrote:
> Give this a try, if that does not work then something else is going on here.
> I am trying to make sure we do not cross the QPI for any reason putting the
> RX/TX queues related to a port on the same core.
>
> sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>        Socket 1        Socket 0
>        ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> Normally when I see this problem it means the the lcores are not mapped
> correctly. What can happen is you have a Rx and a TX on the same physical
> core or two RX/TX on the same physical core.
>
> Make sure you have a Rx or Tx running on a single core look at the
> cpu_layout.py output and verify the configuration is correct. If you have 8
> physical cores in the then you need to make sure on one of the lcores on
> that core is being used.
>
> Let me know what happens.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
>
> BTW, the configuration looks fine, but you need to make sure the lcores are
> not split between two different CPU sockets. You can use the
> dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
> lcore:     0     1     2     3     4     5     6     7     8
>
> port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1
>
> port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1
>
> Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1
>
>  Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
>  1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
>  2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
>  3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
>  4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
>  5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
>  6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
>  7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
>  8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
>  0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2
>
>  1, nb_lcores  2, private 0x700208, lcores:  3  4
>
>  2, nb_lcores  2, private 0x702e70, lcores:  5  6
>
>  3, nb_lcores  2, private 0x705ad8, lcores:  7  8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4
>
>  Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>  Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5
>
>  Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>  Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c
>
>  Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>  Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d
>
>  Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
>  Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>  Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
>
> Total memory used =  41003 KB
>
> Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore  1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore  2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore  3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore  4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore  5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore  6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore  7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore  8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>
>
> The phenomenon is that when I start one port in one NIC, it reaches
> 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
> each = 20Gbps. But, when I start two port in one NIC, it becomes
> 5.8Gbps each. This is persistent when cores are assigned
> differently---cross sockets and the same sockets. Since the size of
> huge pages are fixed, it will not be a problem. Should we say this is
> the limitation on NIC or bus? The reason I think this may be a hw
> limitation is that regardless of packet sizes, two ports in one NIC
> can only send 5.8Gbps maximum.
>
> Do you have any way that I can calculate the hw limitation?
>
> Jinho
>
>
>
> My cpu configuration is as follows:
>
> ============================================================
>
> Core and Socket Information (as reported by '/proc/cpuinfo')
>
> ============================================================
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>        Socket 1        Socket 0
>        ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
> When I use just two ports for testing, I use this configuration.
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
> "[2:4].0, [6:8].1" -f test/forward.lua
>
> As you can see the core numbers, 2, 4, 6, 8 are all in different
> physical cores and are assigned separately. I am not sure that happens
> with core configuration. Do have any other thoughts we may try?
>
> Thanks,
>
> Jinho
>

Actually, I tried it before, and to make sure I tried it again just
now. Still, it only shows me 5.8Gbps for each port. What other
possibilities do you think I have to try? I am losing hopes how. Does
the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my
system.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 21:33               ` jinho hwang
@ 2013-11-19 21:38                 ` Wiles, Roger Keith
  2013-11-20 20:58                   ` jinho hwang
  0 siblings, 1 reply; 12+ messages in thread
From: Wiles, Roger Keith @ 2013-11-19 21:38 UTC (permalink / raw)
  To: jinho hwang; +Cc: dev

I do not think a newer version will effect the performance, but you can try it.

git clone git://github.com/Pktgen/Pktgen-DPDK

This one is 2.2.5 and DPDK 1.5.0


Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
mobile 940.213.5533
[Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/>

On Nov 19, 2013, at 3:33 PM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:
Give this a try, if that does not work then something else is going on here.
I am trying to make sure we do not cross the QPI for any reason putting the
RX/TX queues related to a port on the same core.

sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
      Socket 1        Socket 0
      ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]


Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533

On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:

Normally when I see this problem it means the the lcores are not mapped
correctly. What can happen is you have a Rx and a TX on the same physical
core or two RX/TX on the same physical core.

Make sure you have a Rx or Tx running on a single core look at the
cpu_layout.py output and verify the configuration is correct. If you have 8
physical cores in the then you need to make sure on one of the lcores on
that core is being used.

Let me know what happens.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533

On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:


BTW, the configuration looks fine, but you need to make sure the lcores are
not split between two different CPU sockets. You can use the
dpdk/tools/cpu_layout.py to do dump out the system configuration.


Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
mobile 940.213.5533


On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
<keith.wiles@windriver.com<mailto:keith.wiles@windriver.com>> wrote:

How do you have Pktgen configured in this case?

On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
82599x two ports. My machine has a PCIe bug that does not allow me to send
on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
the forth port does is about 70% of wire rate because of the PCIe hardware
bottle neck problem.

Keith Wiles, Principal Technologist for Networking member of the CTO office,
Wind River
direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000

On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com<mailto:hwang.jinho@gmail.com>> wrote:

Hi All,

I have two NICs (82599) x two ports that are used as packet generators. I
want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
seem to be able to do it when two port in a NIC are used simultaneously.
Does anyone know how to generate 40Gbps without replicating packets in the
switch?

Thank you,

Jinho



Hi Keith,

Thank you for the e-mail. I am not sure how I figure out whether my
PCIe also has any problems to prevent me from sending full line-rates.
I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
me to figure out where is the bottleneck.

My configuration is:

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
"[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua


=== port to lcore mapping table (# lcores 9) ===

lcore:     0     1     2     3     4     5     6     7     8

port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1

port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1

port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1

Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1

Display and Timer on lcore 0, rx:tx counts per port/lcore


Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128

Lcore:

1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
0: 0) , TX (pid:qid):

2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 0: 0)

3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
1: 0) , TX (pid:qid):

4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 1: 0)

5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
2: 0) , TX (pid:qid):

6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 2: 0)

7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
3: 0) , TX (pid:qid):

8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
TX (pid:qid): ( 3: 0)


Port :

0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2

1, nb_lcores  2, private 0x700208, lcores:  3  4

2, nb_lcores  2, private 0x702e70, lcores:  5  6

3, nb_lcores  2, private 0x705ad8, lcores:  7  8



Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4

Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5

Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c

Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB

Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d

Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB


Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
Hdr 64)) + 395392 =   2435 KB

Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
Hdr 64)) + 395392 =    515 KB



Port memory used =  10251 KB


Total memory used =  41003 KB

Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>

Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>


=== Display processing on lcore 0

=== RX processing on lcore  1, rxcnt 1, port/qid, 0/0

=== TX processing on lcore  2, txcnt 1, port/qid, 0/0

=== RX processing on lcore  3, rxcnt 1, port/qid, 1/0

=== TX processing on lcore  4, txcnt 1, port/qid, 1/0

=== RX processing on lcore  5, rxcnt 1, port/qid, 2/0

=== TX processing on lcore  6, txcnt 1, port/qid, 2/0

=== RX processing on lcore  7, rxcnt 1, port/qid, 3/0

=== TX processing on lcore  8, txcnt 1, port/qid, 3/0


Please, advise me if you have time.

Thank you always for your help!

Jinho



The phenomenon is that when I start one port in one NIC, it reaches
10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
each = 20Gbps. But, when I start two port in one NIC, it becomes
5.8Gbps each. This is persistent when cores are assigned
differently---cross sockets and the same sockets. Since the size of
huge pages are fixed, it will not be a problem. Should we say this is
the limitation on NIC or bus? The reason I think this may be a hw
limitation is that regardless of packet sizes, two ports in one NIC
can only send 5.8Gbps maximum.

Do you have any way that I can calculate the hw limitation?

Jinho



My cpu configuration is as follows:

============================================================

Core and Socket Information (as reported by '/proc/cpuinfo')

============================================================
cores =  [0, 1, 2, 8, 9, 10]
sockets =  [1, 0]
      Socket 1        Socket 0
      ---------       ---------
Core 0  [0, 12]         [1, 13]
Core 1  [2, 14]         [3, 15]
Core 2  [4, 16]         [5, 17]
Core 8  [6, 18]         [7, 19]
Core 9  [8, 20]         [9, 21]
Core 10         [10, 22]        [11, 23]

When I use just two ports for testing, I use this configuration.

sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2:4].0, [6:8].1" -f test/forward.lua

As you can see the core numbers, 2, 4, 6, 8 are all in different
physical cores and are assigned separately. I am not sure that happens
with core configuration. Do have any other thoughts we may try?

Thanks,

Jinho


Actually, I tried it before, and to make sure I tried it again just
now. Still, it only shows me 5.8Gbps for each port. What other
possibilities do you think I have to try? I am losing hopes how. Does
the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my
system.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] ways to generate 40Gbps with two NICs x two ports?
  2013-11-19 21:38                 ` Wiles, Roger Keith
@ 2013-11-20 20:58                   ` jinho hwang
  0 siblings, 0 replies; 12+ messages in thread
From: jinho hwang @ 2013-11-20 20:58 UTC (permalink / raw)
  To: Wiles, Roger Keith; +Cc: dev

On Tue, Nov 19, 2013 at 4:38 PM, Wiles, Roger Keith
<keith.wiles@windriver.com> wrote:
>
> I do not think a newer version will effect the performance, but you can try it.
>
> git clone git://github.com/Pktgen/Pktgen-DPDK
>
> This one is 2.2.5 and DPDK 1.5.0
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 3:33 PM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 4:18 PM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> Give this a try, if that does not work then something else is going on here.
> I am trying to make sure we do not cross the QPI for any reason putting the
> RX/TX queues related to a port on the same core.
>
> sudo ./app/build/pktgen -c 3ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[2:4].0, [6:8].1, [3:5].2, [7:9].3" -f test/forward.lua
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>       Socket 1        Socket 0
>       ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:35 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 12:24 PM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> Normally when I see this problem it means the the lcores are not mapped
> correctly. What can happen is you have a Rx and a TX on the same physical
> core or two RX/TX on the same physical core.
>
> Make sure you have a Rx or Tx running on a single core look at the
> cpu_layout.py output and verify the configuration is correct. If you have 8
> physical cores in the then you need to make sure on one of the lcores on
> that core is being used.
>
> Let me know what happens.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
> On Nov 19, 2013, at 11:04 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:54 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
>
> BTW, the configuration looks fine, but you need to make sure the lcores are
> not split between two different CPU sockets. You can use the
> dpdk/tools/cpu_layout.py to do dump out the system configuration.
>
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> mobile 940.213.5533
>
>
> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith
> <keith.wiles@windriver.com> wrote:
>
> How do you have Pktgen configured in this case?
>
> On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC
> 82599x two ports. My machine has a PCIe bug that does not allow me to send
> on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but
> the forth port does is about 70% of wire rate because of the PCIe hardware
> bottle neck problem.
>
> Keith Wiles, Principal Technologist for Networking member of the CTO office,
> Wind River
> direct 972.434.4136  mobile 940.213.5533  fax 000.000.0000
>
> On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho@gmail.com> wrote:
>
> Hi All,
>
> I have two NICs (82599) x two ports that are used as packet generators. I
> want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not
> seem to be able to do it when two port in a NIC are used simultaneously.
> Does anyone know how to generate 40Gbps without replicating packets in the
> switch?
>
> Thank you,
>
> Jinho
>
>
>
> Hi Keith,
>
> Thank you for the e-mail. I am not sure how I figure out whether my
> PCIe also has any problems to prevent me from sending full line-rates.
> I use Intel(R) Xeon(R) CPU           E5649  @ 2.53GHz. It is hard for
> me to figure out where is the bottleneck.
>
> My configuration is:
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m
> "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua
>
>
> === port to lcore mapping table (# lcores 9) ===
>
> lcore:     0     1     2     3     4     5     6     7     8
>
> port   0:  D: T  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   1:  D: T  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0  0: 0  0: 0 =  1: 1
>
> port   2:  D: T  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1  0: 0  0: 0 =  1: 1
>
> port   3:  D: T  0: 0  0: 0  0: 0  0: 0  0: 0  0: 0  1: 0  0: 1 =  1: 1
>
> Total   :  0: 0  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1  1: 0  0: 1
>
> Display and Timer on lcore 0, rx:tx counts per port/lcore
>
>
> Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128
>
> Lcore:
>
> 1, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 0: 0) , TX (pid:qid):
>
> 2, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 0: 0)
>
> 3, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 1: 0) , TX (pid:qid):
>
> 4, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 1: 0)
>
> 5, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 2: 0) , TX (pid:qid):
>
> 6, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 2: 0)
>
> 7, type  RX , rx_cnt  1, tx_cnt  0 private (nil), RX (pid:qid): (
> 3: 0) , TX (pid:qid):
>
> 8, type  TX , rx_cnt  0, tx_cnt  1 private (nil), RX (pid:qid): ,
> TX (pid:qid): ( 3: 0)
>
>
> Port :
>
> 0, nb_lcores  2, private 0x6fd5a0, lcores:  1  2
>
> 1, nb_lcores  2, private 0x700208, lcores:  3  4
>
> 2, nb_lcores  2, private 0x702e70, lcores:  5  6
>
> 3, nb_lcores  2, private 0x705ad8, lcores:  7  8
>
>
>
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a4
>
> Create: Default RX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 0:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  0:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 1 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:2f:f2:a5
>
> Create: Default RX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 1:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  1:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 2 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1c
>
> Create: Default RX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 2:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  2:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
> Initialize Port 3 -- TxQ 1, RxQ 1,  Src MAC 90:e2:ba:4a:e6:1d
>
> Create: Default RX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
>
> Create: Default TX  3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Range TX    3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Sequence TX 3:0  - Memory used (MBUFs 1024 x (size 1984 +
> Hdr 64)) + 395392 =   2435 KB
>
> Create: Special TX  3:0  - Memory used (MBUFs   64 x (size 1984 +
> Hdr 64)) + 395392 =    515 KB
>
>
>
> Port memory used =  10251 KB
>
>
> Total memory used =  41003 KB
>
> Port  0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
> Port  3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode>
>
>
> === Display processing on lcore 0
>
> === RX processing on lcore  1, rxcnt 1, port/qid, 0/0
>
> === TX processing on lcore  2, txcnt 1, port/qid, 0/0
>
> === RX processing on lcore  3, rxcnt 1, port/qid, 1/0
>
> === TX processing on lcore  4, txcnt 1, port/qid, 1/0
>
> === RX processing on lcore  5, rxcnt 1, port/qid, 2/0
>
> === TX processing on lcore  6, txcnt 1, port/qid, 2/0
>
> === RX processing on lcore  7, rxcnt 1, port/qid, 3/0
>
> === TX processing on lcore  8, txcnt 1, port/qid, 3/0
>
>
> Please, advise me if you have time.
>
> Thank you always for your help!
>
> Jinho
>
>
>
> The phenomenon is that when I start one port in one NIC, it reaches
> 10Gbps. Also, when I start one port per each NIC, they achieve 10Gbps
> each = 20Gbps. But, when I start two port in one NIC, it becomes
> 5.8Gbps each. This is persistent when cores are assigned
> differently---cross sockets and the same sockets. Since the size of
> huge pages are fixed, it will not be a problem. Should we say this is
> the limitation on NIC or bus? The reason I think this may be a hw
> limitation is that regardless of packet sizes, two ports in one NIC
> can only send 5.8Gbps maximum.
>
> Do you have any way that I can calculate the hw limitation?
>
> Jinho
>
>
>
> My cpu configuration is as follows:
>
> ============================================================
>
> Core and Socket Information (as reported by '/proc/cpuinfo')
>
> ============================================================
> cores =  [0, 1, 2, 8, 9, 10]
> sockets =  [1, 0]
>       Socket 1        Socket 0
>       ---------       ---------
> Core 0  [0, 12]         [1, 13]
> Core 1  [2, 14]         [3, 15]
> Core 2  [4, 16]         [5, 17]
> Core 8  [6, 18]         [7, 19]
> Core 9  [8, 20]         [9, 21]
> Core 10         [10, 22]        [11, 23]
>
> When I use just two ports for testing, I use this configuration.
>
> sudo ./app/build/pktgen -c 1ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
> "[2:4].0, [6:8].1" -f test/forward.lua
>
> As you can see the core numbers, 2, 4, 6, 8 are all in different
> physical cores and are assigned separately. I am not sure that happens
> with core configuration. Do have any other thoughts we may try?
>
> Thanks,
>
> Jinho
>
>
> Actually, I tried it before, and to make sure I tried it again just
> now. Still, it only shows me 5.8Gbps for each port. What other
> possibilities do you think I have to try? I am losing hopes how. Does
> the version matter? I am using Pktgen Ver:2.1.3(DPDK-1.3.0) in my
> system.
>

Keith,

Yes, the newer version did not work either. Since I am able to send
close to 24Gbps from two NICs, I do not think the limitation comes
from bus or memory. It may be because of how I use the NICs. I am
stick to this hypothesis now, and trying to use more core/queue for
tx. The problem is that when I tried this

sudo ./app/build/pktgen -c 3ff -n 3 $BLACK_LIST -- -p 0x30 -P -m
"[2-5].0, [6-9].1" -f test/forward.lua

, the log shows 4 tx/rx queues are assigned, but it seems that only
1/4 (2.5Gbps) are transmitted. My questions are:

1. It seems to me 4 queues/cores have 1/4 or work, but only one
queue/core is working. Do you know the reason for this? and how to fix
it?
2. Can I make a configuration with one core x multiple queues?
3. Is there any way to see more statistics in commands?

Thank you,

Jinho

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-11-20 20:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-19 16:09 [dpdk-dev] ways to generate 40Gbps with two NICs x two ports? jinho hwang
2013-11-19 16:31 ` Wiles, Roger Keith
2013-11-19 16:42   ` jinho hwang
2013-11-19 16:52     ` Wiles, Roger Keith
2013-11-19 16:54     ` Wiles, Roger Keith
2013-11-19 17:04       ` jinho hwang
2013-11-19 17:24         ` Wiles, Roger Keith
2013-11-19 17:35           ` jinho hwang
2013-11-19 21:18             ` Wiles, Roger Keith
2013-11-19 21:33               ` jinho hwang
2013-11-19 21:38                 ` Wiles, Roger Keith
2013-11-20 20:58                   ` jinho hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).