DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Unable to get RSS to work in testpmd and load balancing question
@ 2014-01-08 23:24 Dan Kan
  2014-01-09 18:49 ` Daniel Kan
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Kan @ 2014-01-08 23:24 UTC (permalink / raw)
  To: dev

I'm evaluating DPDK using dpdk-1.5.1r1. I have been playing around with the
test-pmd sample app. I'm having a hard time to get RSS to work. I have a
2-port 82599 Intel X540-DA2 NIC. I'm running the following command to start
the app.

sudo ./testpmd -c 0x1f -n 2 -- -i --portmask=0x3 --nb-cores=4 --rxq=4
--txq=4

I have a packet generator that sends udp packets with various src IP.
According testpmd, I'm only receiving packets in port 0's queue 0. Packets
are not going into any other queues. I have attached the output from
testpmd.


  ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0
-------
  RX-packets: 1000000        TX-packets: 1000000        TX-dropped:
0
  ---------------------- Forward statistics for port 0
----------------------
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1
----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On a separate note, I also find that the CPU utilization using 1 forwarding
core for 2 ports seems to be better (in the aggregate sense) than using 2
forwarding cores for 2 ports. Running at 10gbps line rate of pktlen=400,
with 1 core, the core's utilization is 40%. With 2 cores, each core's
utilization would 30%, giving an aggregate of 60%.

I have a use case of only doing rxonly packet processing. From my initial
test, it seems that it's more efficient to have a single core read packets
from both ports, and distribute the packet using rte_ring instead of having
each core read from its port. The rte_eth_rx operations appear to be much
CPU intensive than rte_ring_dequeue operations.

Thanks in advance.

Dan

^ permalink raw reply	[flat|nested] 7+ messages in thread
* [dpdk-dev] Unable to get RSS to work in testpmd and load balancing question
@ 2014-01-10  2:07 Choi, Sy Jong
  2014-01-10  2:35 ` Daniel Kan
  0 siblings, 1 reply; 7+ messages in thread
From: Choi, Sy Jong @ 2014-01-10  2:07 UTC (permalink / raw)
  To: dev


Hi Dan,

I have tested with 6 flows with identical ip address, but varies UDP port number. I can see both queues with traffic.
Using the following command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i -rss-udp --portmask=0x03 --nb-cores=4 --rxq=2 --txq=2


I have started with RSS IPv4, which is enabled by default.
The critical part is the traffic, since I only 2 queues, I am sending 6 flows with different IP addresses in order to see the flow got distributed evenly. Or else you might see only 1 queues if you have 2 flows they might load to a single queue only.  

My Command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4 --rxq=2 --txq=2
-	Using 4 cores
-	Rxq = 2 for each port, so 4 queues to 4 cores.



testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets:              6306519648    RX-errors:  757945685    RX-bytes: 309383840254
  TX-packets:               132592678    TX-errors:          0    TX-bytes: 8485925376

  Stats reg  0 RX-packets: 2556150208    RX-errors:          0    RX-bytes: 116477417471
  Stats reg  1 RX-packets: 3750369440    RX-errors:          0    RX-bytes: 192906422783
  Stats reg  2 RX-packets:          0    RX-errors:          0    RX-bytes:          0
.
.
.
  Stats reg 15 RX-packets:          0    RX-errors:          0    RX-bytes:          0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets:               132594048    RX-errors:   13825889    RX-bytes: 8486020288
  TX-packets:              6306522739    TX-errors:          0    TX-bytes: 231983528894

  Stats reg  0 RX-packets:   83615783    RX-errors:          0    RX-bytes: 5351410624
  Stats reg  1 RX-packets:   48978265    RX-errors:          0    RX-bytes: 3134609664
  Stats reg  2 RX-packets:          0    RX-errors:          0    RX-bytes:          0
.
.
.
  Stats reg 15 RX-packets:          0    RX-errors:          0    RX-bytes:          0
  ############################################################################
testpmd>




My Command:-
sudo ./app/testpmd -c 0x1f -n 4 -- -i --portmask=0x03 --nb-cores=4 --rxq=2 --txq=2
- Using 4 cores
- Rxq = 2 for each port, so 4 queues to 4 cores.

I use this command to map the queue statistic.
testpmd> set stat_qmap rx 0 0 0
testpmd> set stat_qmap rx 0 1 1
testpmd> set stat_qmap rx 1 0 0
testpmd> set stat_qmap rx 1 1 1
testpmd> start
  io packet forwarding - CRC stripping disabled - packets/burst=16
  nb forwarding cores=2 - nb forwarding ports=2
  RX queues=2 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=8 hthresh=8 wthresh=4
  TX queues=2 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=36 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0

testpmd> show port stats all



Regards,
Choi, Sy Jong
Platform Application Engineer

From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Dan Kan
Sent: Wednesday, January 08, 2014 3:25 PM
To: dev@dpdk.org
Subject: [dpdk-dev] Unable to get RSS to work in testpmd and load balancing question

I'm evaluating DPDK using dpdk-1.5.1r1. I have been playing around with the test-pmd sample app. I'm having a hard time to get RSS to work. I have a 2-port 82599 Intel X540-DA2 NIC. I'm running the following command to start the app.

sudo ./testpmd -c 0x1f -n 2 -- -i --portmask=0x3 --nb-cores=4 --rxq=4
--txq=4

I have a packet generator that sends udp packets with various src IP.
According testpmd, I'm only receiving packets in port 0's queue 0. Packets are not going into any other queues. I have attached the output from testpmd.


  ------- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0
-------
  RX-packets: 1000000        TX-packets: 1000000        TX-dropped:
0
  ---------------------- Forward statistics for port 0
----------------------
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 0              TX-dropped: 0             TX-total: 0

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1
----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
  RX-packets: 1000000        RX-dropped: 0             RX-total: 1000000
  TX-packets: 1000000        TX-dropped: 0             TX-total: 1000000

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

On a separate note, I also find that the CPU utilization using 1 forwarding core for 2 ports seems to be better (in the aggregate sense) than using 2 forwarding cores for 2 ports. Running at 10gbps line rate of pktlen=400, with 1 core, the core's utilization is 40%. With 2 cores, each core's utilization would 30%, giving an aggregate of 60%.

I have a use case of only doing rxonly packet processing. From my initial test, it seems that it's more efficient to have a single core read packets from both ports, and distribute the packet using rte_ring instead of having each core read from its port. The rte_eth_rx operations appear to be much CPU intensive than rte_ring_dequeue operations.

Thanks in advance.

Dan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-10 16:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-08 23:24 [dpdk-dev] Unable to get RSS to work in testpmd and load balancing question Dan Kan
2014-01-09 18:49 ` Daniel Kan
2014-01-09 23:11   ` Thomas Monjalon
2014-01-10  1:02     ` Dan Kan
2014-01-10  2:07 Choi, Sy Jong
2014-01-10  2:35 ` Daniel Kan
2014-01-10 16:04   ` Michael Quicquaro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).