DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] generic load balancing
@ 2013-12-04 17:53 Michael Quicquaro
  2013-12-04 20:48 ` elevran
  2013-12-04 21:04 ` François-Frédéric Ozog
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Quicquaro @ 2013-12-04 17:53 UTC (permalink / raw)
  To: dev

Hi all,
I am writing a dpdk application that will receive packets from one
interface and process them.  It does not forward packets in the traditional
sense.  However, I do need to process them at full line rate and therefore
need more than one core.  The packets can be somewhat generic in nature and
can be nearly identical (especially at the beginning of the packet).  I've
used the rxonly function of testpmd as a model.

I've run into problems in processing a full line rate of data since the
nature of the data causes all the data to be presented to only one core.  I
get a large percentage of dropped packets (shows up as Rx-Errors in "port
stats") because of this.  I've tried modifying the data so that packets
have different UDP ports and that seems to work when I use --rss-udp

My questions are:
1) Is there a way to configure RSS so that it alternates packets to all
configured cores regardless of the packet data?

2)  Where is the best place to learn more about RSS and how to configure
it? I have not found much in the DPDK documentation.

Thanks for the help,
- Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-04 17:53 [dpdk-dev] generic load balancing Michael Quicquaro
@ 2013-12-04 20:48 ` elevran
  2013-12-04 21:04 ` François-Frédéric Ozog
  1 sibling, 0 replies; 16+ messages in thread
From: elevran @ 2013-12-04 20:48 UTC (permalink / raw)
  To: Michael Quicquaro; +Cc: dev

Hi Michael,

As far as I know, RSS is used to distribute packets between cores based on
hashing the packets' initial bytes, so round robin distribution is not
possible in hardware. You can configure the hash seed and which fields to
use in the hash. If the input packets have same or very similar bytes, the
hash distribution won't be good, as you had observed.

The Intel network card manuals provide the details.

You may want to consider distributing packets from the PMD queue into
secondary rings/queues using software policy running on the first core.
Look at the multiprocessor samples in the dpdk for reference. Performance
depends on your HW.

Regards,
Etai
בתאריך 4 בדצמ 2013 19:53, "Michael Quicquaro" <michael.quicquaro@gmail.com>
כתב:

> Hi all,
> I am writing a dpdk application that will receive packets from one
> interface and process them.  It does not forward packets in the traditional
> sense.  However, I do need to process them at full line rate and therefore
> need more than one core.  The packets can be somewhat generic in nature and
> can be nearly identical (especially at the beginning of the packet).  I've
> used the rxonly function of testpmd as a model.
>
> I've run into problems in processing a full line rate of data since the
> nature of the data causes all the data to be presented to only one core.  I
> get a large percentage of dropped packets (shows up as Rx-Errors in "port
> stats") because of this.  I've tried modifying the data so that packets
> have different UDP ports and that seems to work when I use --rss-udp
>
> My questions are:
> 1) Is there a way to configure RSS so that it alternates packets to all
> configured cores regardless of the packet data?
>
> 2)  Where is the best place to learn more about RSS and how to configure
> it? I have not found much in the DPDK documentation.
>
> Thanks for the help,
> - Mike
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-04 17:53 [dpdk-dev] generic load balancing Michael Quicquaro
  2013-12-04 20:48 ` elevran
@ 2013-12-04 21:04 ` François-Frédéric Ozog
  2013-12-05  4:31   ` Prashant Upadhyaya
  1 sibling, 1 reply; 16+ messages in thread
From: François-Frédéric Ozog @ 2013-12-04 21:04 UTC (permalink / raw)
  To: 'Michael Quicquaro'; +Cc: dev

Hi,

As far as I can tell, this is really hardware dependent. Some hash functions
allow uplink and downlink packets of the same "session" to go to the same
queue (I know Chelsio can do this).

For the Intel card, you may find what you want in:
http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-con
troller-datasheet.html

Other cards require NDA or other agreements to get details of RSS.

If you have a performance problem, may I suggest you use kernel 3.10 then
monitor system activity with "perf" command. For instance you can start with
"perf top -a" this will give you nice information. Then your creativity will
do the rest ;-) You may be surprised what comes on the top hot points...
(the most unexpected hot function I found here was Linux syscall
gettimeofday!!!)

François-Frédéric

> -----Message d'origine-----
> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael Quicquaro
> Envoyé : mercredi 4 décembre 2013 18:53
> À : dev@dpdk.org
> Objet : [dpdk-dev] generic load balancing
> 
> Hi all,
> I am writing a dpdk application that will receive packets from one
> interface and process them.  It does not forward packets in the
traditional
> sense.  However, I do need to process them at full line rate and therefore
> need more than one core.  The packets can be somewhat generic in nature
and
> can be nearly identical (especially at the beginning of the packet).  I've
> used the rxonly function of testpmd as a model.
> 
> I've run into problems in processing a full line rate of data since the
> nature of the data causes all the data to be presented to only one core.
I
> get a large percentage of dropped packets (shows up as Rx-Errors in "port
> stats") because of this.  I've tried modifying the data so that packets
> have different UDP ports and that seems to work when I use --rss-udp
> 
> My questions are:
> 1) Is there a way to configure RSS so that it alternates packets to all
> configured cores regardless of the packet data?
> 
> 2)  Where is the best place to learn more about RSS and how to configure
> it? I have not found much in the DPDK documentation.
> 
> Thanks for the help,
> - Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-04 21:04 ` François-Frédéric Ozog
@ 2013-12-05  4:31   ` Prashant Upadhyaya
  2013-12-05  4:54     ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: Prashant Upadhyaya @ 2013-12-05  4:31 UTC (permalink / raw)
  To: François-Frédéric Ozog, 'Michael Quicquaro'; +Cc: dev

Hi,

It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.

I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
The above is very useful in lot of load balancing cases.

Regards
-Prashant


-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric Ozog
Sent: Thursday, December 05, 2013 2:35 AM
To: 'Michael Quicquaro'
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] generic load balancing

Hi,

As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).

For the Intel card, you may find what you want in:
http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-con
troller-datasheet.html

Other cards require NDA or other agreements to get details of RSS.

If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
(the most unexpected hot function I found here was Linux syscall
gettimeofday!!!)

François-Frédéric

> -----Message d'origine-----
> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael Quicquaro
> Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org Objet :
> [dpdk-dev] generic load balancing
>
> Hi all,
> I am writing a dpdk application that will receive packets from one
> interface and process them.  It does not forward packets in the
traditional
> sense.  However, I do need to process them at full line rate and
> therefore need more than one core.  The packets can be somewhat
> generic in nature
and
> can be nearly identical (especially at the beginning of the packet).
> I've used the rxonly function of testpmd as a model.
>
> I've run into problems in processing a full line rate of data since
> the nature of the data causes all the data to be presented to only one core.
I
> get a large percentage of dropped packets (shows up as Rx-Errors in
> "port
> stats") because of this.  I've tried modifying the data so that
> packets have different UDP ports and that seems to work when I use
> --rss-udp
>
> My questions are:
> 1) Is there a way to configure RSS so that it alternates packets to
> all configured cores regardless of the packet data?
>
> 2)  Where is the best place to learn more about RSS and how to
> configure it? I have not found much in the DPDK documentation.
>
> Thanks for the help,
> - Mike





===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  4:31   ` Prashant Upadhyaya
@ 2013-12-05  4:54     ` Stephen Hemminger
  2013-12-05  5:29       ` Prashant Upadhyaya
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2013-12-05  4:54 UTC (permalink / raw)
  To: Prashant Upadhyaya; +Cc: dev

Round robin would actually be awful for any protocol because it would
cause out of order packets.
That is why flow based algorithms like flow director and RSS work much better.

On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
<prashant.upadhyaya@aricent.com> wrote:
> Hi,
>
> It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.
>
> I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> The above is very useful in lot of load balancing cases.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric Ozog
> Sent: Thursday, December 05, 2013 2:35 AM
> To: 'Michael Quicquaro'
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
>
> Hi,
>
> As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).
>
> For the Intel card, you may find what you want in:
> http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-con
> troller-datasheet.html
>
> Other cards require NDA or other agreements to get details of RSS.
>
> If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
> (the most unexpected hot function I found here was Linux syscall
> gettimeofday!!!)
>
> François-Frédéric
>
>> -----Message d'origine-----
>> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael Quicquaro
>> Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org Objet :
>> [dpdk-dev] generic load balancing
>>
>> Hi all,
>> I am writing a dpdk application that will receive packets from one
>> interface and process them.  It does not forward packets in the
> traditional
>> sense.  However, I do need to process them at full line rate and
>> therefore need more than one core.  The packets can be somewhat
>> generic in nature
> and
>> can be nearly identical (especially at the beginning of the packet).
>> I've used the rxonly function of testpmd as a model.
>>
>> I've run into problems in processing a full line rate of data since
>> the nature of the data causes all the data to be presented to only one core.
> I
>> get a large percentage of dropped packets (shows up as Rx-Errors in
>> "port
>> stats") because of this.  I've tried modifying the data so that
>> packets have different UDP ports and that seems to work when I use
>> --rss-udp
>>
>> My questions are:
>> 1) Is there a way to configure RSS so that it alternates packets to
>> all configured cores regardless of the packet data?
>>
>> 2)  Where is the best place to learn more about RSS and how to
>> configure it? I have not found much in the DPDK documentation.
>>
>> Thanks for the help,
>> - Mike
>
>
>
>
>
> ===============================================================================
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  4:54     ` Stephen Hemminger
@ 2013-12-05  5:29       ` Prashant Upadhyaya
  2013-12-05  7:44         ` Benson, Bryan
  2013-12-05  8:45         ` François-Frédéric Ozog
  0 siblings, 2 replies; 16+ messages in thread
From: Prashant Upadhyaya @ 2013-12-05  5:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stepher,

The awfulness depends upon the 'usecase'
I have eg. a usecase where I want this roundrobin behaviour.

I just want the NIC to give me a facility to use this.

Regards
-Prashant


-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org]
Sent: Thursday, December 05, 2013 10:25 AM
To: Prashant Upadhyaya
Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
Subject: Re: [dpdk-dev] generic load balancing

Round robin would actually be awful for any protocol because it would cause out of order packets.
That is why flow based algorithms like flow director and RSS work much better.

On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya <prashant.upadhyaya@aricent.com> wrote:
> Hi,
>
> It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.
>
> I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> The above is very useful in lot of load balancing cases.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric
> Ozog
> Sent: Thursday, December 05, 2013 2:35 AM
> To: 'Michael Quicquaro'
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
>
> Hi,
>
> As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).
>
> For the Intel card, you may find what you want in:
> http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g
> be-con
> troller-datasheet.html
>
> Other cards require NDA or other agreements to get details of RSS.
>
> If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
> (the most unexpected hot function I found here was Linux syscall
> gettimeofday!!!)
>
> François-Frédéric
>
>> -----Message d'origine-----
>> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
>> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org Objet :
>> [dpdk-dev] generic load balancing
>>
>> Hi all,
>> I am writing a dpdk application that will receive packets from one
>> interface and process them.  It does not forward packets in the
> traditional
>> sense.  However, I do need to process them at full line rate and
>> therefore need more than one core.  The packets can be somewhat
>> generic in nature
> and
>> can be nearly identical (especially at the beginning of the packet).
>> I've used the rxonly function of testpmd as a model.
>>
>> I've run into problems in processing a full line rate of data since
>> the nature of the data causes all the data to be presented to only one core.
> I
>> get a large percentage of dropped packets (shows up as Rx-Errors in
>> "port
>> stats") because of this.  I've tried modifying the data so that
>> packets have different UDP ports and that seems to work when I use
>> --rss-udp
>>
>> My questions are:
>> 1) Is there a way to configure RSS so that it alternates packets to
>> all configured cores regardless of the packet data?
>>
>> 2)  Where is the best place to learn more about RSS and how to
>> configure it? I have not found much in the DPDK documentation.
>>
>> Thanks for the help,
>> - Mike
>
>
>
>
>
> ======================================================================
> ========= Please refer to
> http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ======================================================================
> =========




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  5:29       ` Prashant Upadhyaya
@ 2013-12-05  7:44         ` Benson, Bryan
  2013-12-05 14:16           ` Prashant Upadhyaya
  2013-12-05  8:45         ` François-Frédéric Ozog
  1 sibling, 1 reply; 16+ messages in thread
From: Benson, Bryan @ 2013-12-05  7:44 UTC (permalink / raw)
  To: Prashant Upadhyaya, Stephen Hemminger; +Cc: dev

Prashant,
I assume your use case is not of one IP/UDP/TCP - or if it is, you are dealing with a single tuple that is not evenly distributed.

You have a few options with the NIC that I can think of.

1) Use a single core to RX each port's frames and use your own software solution to RR to worker rings.  There is an example of this in the Load Balancer sample application.

2) If your packets/frames have an evenly distributed field in the first 64 bytes of the frame, you can use the 2 byte match feature of flow director to send to different queues (with multiple match signatures).  This will give even distribution, but not round robin behavior.

3) Modify the RSS redirection table for the NIC in the order you desire.  I am unsure how often this can happen, or if there are performance issues with reprogramming it.  Definitely would need some experimentation.

What is it you are trying to achieve with Round Robin?  A distribution of packets to multiple cores for processing, or something else?

Without knowing the use case, my main suggestion is to use the LB sample application - that way you can distribute in any way you please.

Thanks,
Bryan Benson


-------- Original message --------
From: Prashant Upadhyaya
Date:12/04/2013 9:30 PM (GMT-08:00)
To: Stephen Hemminger
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] generic load balancing

Hi Stepher,

The awfulness depends upon the 'usecase'
I have eg. a usecase where I want this roundrobin behaviour.

I just want the NIC to give me a facility to use this.

Regards
-Prashant


-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org]
Sent: Thursday, December 05, 2013 10:25 AM
To: Prashant Upadhyaya
Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
Subject: Re: [dpdk-dev] generic load balancing

Round robin would actually be awful for any protocol because it would cause out of order packets.
That is why flow based algorithms like flow director and RSS work much better.

On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya <prashant.upadhyaya@aricent.com> wrote:
> Hi,
>
> It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.
>
> I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> The above is very useful in lot of load balancing cases.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric
> Ozog
> Sent: Thursday, December 05, 2013 2:35 AM
> To: 'Michael Quicquaro'
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
>
> Hi,
>
> As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).
>
> For the Intel card, you may find what you want in:
> http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g
> be-con
> troller-datasheet.html
>
> Other cards require NDA or other agreements to get details of RSS.
>
> If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
> (the most unexpected hot function I found here was Linux syscall
> gettimeofday!!!)
>
> François-Frédéric
>
>> -----Message d'origine-----
>> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
>> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org Objet :
>> [dpdk-dev] generic load balancing
>>
>> Hi all,
>> I am writing a dpdk application that will receive packets from one
>> interface and process them.  It does not forward packets in the
> traditional
>> sense.  However, I do need to process them at full line rate and
>> therefore need more than one core.  The packets can be somewhat
>> generic in nature
> and
>> can be nearly identical (especially at the beginning of the packet).
>> I've used the rxonly function of testpmd as a model.
>>
>> I've run into problems in processing a full line rate of data since
>> the nature of the data causes all the data to be presented to only one core.
> I
>> get a large percentage of dropped packets (shows up as Rx-Errors in
>> "port
>> stats") because of this.  I've tried modifying the data so that
>> packets have different UDP ports and that seems to work when I use
>> --rss-udp
>>
>> My questions are:
>> 1) Is there a way to configure RSS so that it alternates packets to
>> all configured cores regardless of the packet data?
>>
>> 2)  Where is the best place to learn more about RSS and how to
>> configure it? I have not found much in the DPDK documentation.
>>
>> Thanks for the help,
>> - Mike
>
>
>
>
>
> ======================================================================
> ========= Please refer to
> http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ======================================================================
> =========




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  5:29       ` Prashant Upadhyaya
  2013-12-05  7:44         ` Benson, Bryan
@ 2013-12-05  8:45         ` François-Frédéric Ozog
  2013-12-05 14:29           ` Prashant Upadhyaya
  1 sibling, 1 reply; 16+ messages in thread
From: François-Frédéric Ozog @ 2013-12-05  8:45 UTC (permalink / raw)
  To: 'Prashant Upadhyaya'; +Cc: dev

Hi,

If the traffic you manage is above MPLS or GTP encapsulations, then you can
use cards that provide flexible hash functions. Chelsio cxgb5 provides
combination of "offset", length and tuple that may help. 

The only reason I would have loved to get a pure round robin feature was to
pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)  tests
where the traffic issue was multicast from a single source... But that is
not real life traffic.

If you could share the use case...

François-Frédéric

> -----Message d'origine-----
> De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com]
> Envoyé : jeudi 5 décembre 2013 06:30
> À : Stephen Hemminger
> Cc : François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
> Objet : RE: [dpdk-dev] generic load balancing
> 
> Hi Stepher,
> 
> The awfulness depends upon the 'usecase'
> I have eg. a usecase where I want this roundrobin behaviour.
> 
> I just want the NIC to give me a facility to use this.
> 
> Regards
> -Prashant
> 
> 
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, December 05, 2013 10:25 AM
> To: Prashant Upadhyaya
> Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
> 
> Round robin would actually be awful for any protocol because it would
cause
> out of order packets.
> That is why flow based algorithms like flow director and RSS work much
> better.
> 
> On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> <prashant.upadhyaya@aricent.com> wrote:
> > Hi,
> >
> > It's a real pity that Intel 82599 NIC (and possibly others) don't have a
> simple round robin scheduling of packets on the configured queues.
> >
> > I have requested Intel earlier, and using this forum requesting again --
> please please put this facility in the NIC that if I drop N queues there
> and configure  the NIC for some round robin scheduling on queues, then NIC
> should simply put the received packets one by one on queue 1, then on
> queue2,....,then on queueN, and then back on queue 1.
> > The above is very useful in lot of load balancing cases.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric
> > Ozog
> > Sent: Thursday, December 05, 2013 2:35 AM
> > To: 'Michael Quicquaro'
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Hi,
> >
> > As far as I can tell, this is really hardware dependent. Some hash
> functions allow uplink and downlink packets of the same "session" to go to
> the same queue (I know Chelsio can do this).
> >
> > For the Intel card, you may find what you want in:
> > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g
> > be-con
> > troller-datasheet.html
> >
> > Other cards require NDA or other agreements to get details of RSS.
> >
> > If you have a performance problem, may I suggest you use kernel 3.10
then
> monitor system activity with "perf" command. For instance you can start
> with "perf top -a" this will give you nice information. Then your
> creativity will do the rest ;-) You may be surprised what comes on the top
> hot points...
> > (the most unexpected hot function I found here was Linux syscall
> > gettimeofday!!!)
> >
> > François-Frédéric
> >
> >> -----Message d'origine-----
> >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
> >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org
Objet
> :
> >> [dpdk-dev] generic load balancing
> >>
> >> Hi all,
> >> I am writing a dpdk application that will receive packets from one
> >> interface and process them.  It does not forward packets in the
> > traditional
> >> sense.  However, I do need to process them at full line rate and
> >> therefore need more than one core.  The packets can be somewhat
> >> generic in nature
> > and
> >> can be nearly identical (especially at the beginning of the packet).
> >> I've used the rxonly function of testpmd as a model.
> >>
> >> I've run into problems in processing a full line rate of data since
> >> the nature of the data causes all the data to be presented to only one
> core.
> > I
> >> get a large percentage of dropped packets (shows up as Rx-Errors in
> >> "port
> >> stats") because of this.  I've tried modifying the data so that
> >> packets have different UDP ports and that seems to work when I use
> >> --rss-udp
> >>
> >> My questions are:
> >> 1) Is there a way to configure RSS so that it alternates packets to
> >> all configured cores regardless of the packet data?
> >>
> >> 2)  Where is the best place to learn more about RSS and how to
> >> configure it? I have not found much in the DPDK documentation.
> >>
> >> Thanks for the help,
> >> - Mike
> >
> >
> >
> >
> >
> > ======================================================================
> > ========= Please refer to
> > http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> > ======================================================================
> > =========
> 
> 
> 
> 
>
===========================================================================
> ====
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
===========================================================================
> ====

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  7:44         ` Benson, Bryan
@ 2013-12-05 14:16           ` Prashant Upadhyaya
  2013-12-05 18:33             ` Benson, Bryan
  0 siblings, 1 reply; 16+ messages in thread
From: Prashant Upadhyaya @ 2013-12-05 14:16 UTC (permalink / raw)
  To: Benson, Bryan, Stephen Hemminger; +Cc: dev

Hi Bryan,

Regarding your 1st point, the single core becomes the rx bottleneck which is clearly not desirable.

I am not sure regarding how to use the stuff you mentioned in 2nd point, is there some DPDK api which lets me configure this, kindly let me know.

Regards
-Prashant


From: Benson, Bryan [mailto:bmbenson@amazon.com]
Sent: Thursday, December 05, 2013 1:14 PM
To: Prashant Upadhyaya; Stephen Hemminger
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] generic load balancing

Prashant,
I assume your use case is not of one IP/UDP/TCP - or if it is, you are dealing with a single tuple that is not evenly distributed.

You have a few options with the NIC that I can think of.

1) Use a single core to RX each port's frames and use your own software solution to RR to worker rings.  There is an example of this in the Load Balancer sample application.

2) If your packets/frames have an evenly distributed field in the first 64 bytes of the frame, you can use the 2 byte match feature of flow director to send to different queues (with multiple match signatures).  This will give even distribution, but not round robin behavior.

3) Modify the RSS redirection table for the NIC in the order you desire.  I am unsure how often this can happen, or if there are performance issues with reprogramming it.  Definitely would need some experimentation.

What is it you are trying to achieve with Round Robin?  A distribution of packets to multiple cores for processing, or something else?

Without knowing the use case, my main suggestion is to use the LB sample application - that way you can distribute in any way you please.

Thanks,
Bryan Benson


-------- Original message --------
From: Prashant Upadhyaya
Date:12/04/2013 9:30 PM (GMT-08:00)
To: Stephen Hemminger
Cc: dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing
Hi Stepher,

The awfulness depends upon the 'usecase'
I have eg. a usecase where I want this roundrobin behaviour.

I just want the NIC to give me a facility to use this.

Regards
-Prashant


-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org]
Sent: Thursday, December 05, 2013 10:25 AM
To: Prashant Upadhyaya
Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing

Round robin would actually be awful for any protocol because it would cause out of order packets.
That is why flow based algorithms like flow director and RSS work much better.

On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya <prashant.upadhyaya@aricent.com<mailto:prashant.upadhyaya@aricent.com>> wrote:
> Hi,
>
> It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.
>
> I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> The above is very useful in lot of load balancing cases.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric
> Ozog
> Sent: Thursday, December 05, 2013 2:35 AM
> To: 'Michael Quicquaro'
> Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> Subject: Re: [dpdk-dev] generic load balancing
>
> Hi,
>
> As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).
>
> For the Intel card, you may find what you want in:
> http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g
> be-con
> troller-datasheet.html
>
> Other cards require NDA or other agreements to get details of RSS.
>
> If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
> (the most unexpected hot function I found here was Linux syscall
> gettimeofday!!!)
>
> François-Frédéric
>
>> -----Message d'origine-----
>> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
>> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org<mailto:dev@dpdk.org> Objet :
>> [dpdk-dev] generic load balancing
>>
>> Hi all,
>> I am writing a dpdk application that will receive packets from one
>> interface and process them.  It does not forward packets in the
> traditional
>> sense.  However, I do need to process them at full line rate and
>> therefore need more than one core.  The packets can be somewhat
>> generic in nature
> and
>> can be nearly identical (especially at the beginning of the packet).
>> I've used the rxonly function of testpmd as a model.
>>
>> I've run into problems in processing a full line rate of data since
>> the nature of the data causes all the data to be presented to only one core.
> I
>> get a large percentage of dropped packets (shows up as Rx-Errors in
>> "port
>> stats") because of this.  I've tried modifying the data so that
>> packets have different UDP ports and that seems to work when I use
>> --rss-udp
>>
>> My questions are:
>> 1) Is there a way to configure RSS so that it alternates packets to
>> all configured cores regardless of the packet data?
>>
>> 2)  Where is the best place to learn more about RSS and how to
>> configure it? I have not found much in the DPDK documentation.
>>
>> Thanks for the help,
>> - Mike
>
>
>
>
>
> ======================================================================
> ========= Please refer to
> http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ======================================================================
> =========




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05  8:45         ` François-Frédéric Ozog
@ 2013-12-05 14:29           ` Prashant Upadhyaya
  2013-12-05 15:42             ` Michael Quicquaro
  0 siblings, 1 reply; 16+ messages in thread
From: Prashant Upadhyaya @ 2013-12-05 14:29 UTC (permalink / raw)
  To: François-Frédéric Ozog; +Cc: dev

Hi,

Well, GTP is the main usecase.
We end up with a GTP tunnel between the two machines.
And ordinarily with 82599, all the data will land up on a single queue and therefore must be polled on a single core. Bottleneck.

But in general, if I want to employ all the CPU cores horsepower simultaneously to pickup the packets from NIC, then it is natural that I drop a queue each for every core into the NIC and if the NIC does a round robin then it naturally fans out and I can use all the cores to lift packets from NIC in a load balanced fashion.

Imagine a theoretical usecase, where I have to lift the packets from the NIC, inspect it myself in the application and then switch them to the right core for further processing. So my cores have two jobs, one is to poll the NIC and then switch the packets to the right core. Here I would simply love to poll the queue and the intercore ring from each core to achieve the processing. No single core will become the bottleneck as far as polling the NIC is concerned. You might argue on what basis I switch to the relevant core for further processing, but that's _my_ usecase and headache to further equally distribute amongst the cores.

Imagine an LTE usecase where I am on the core side (SGW), the packets come over GTP from thousands of mobiles (via eNB). I can employ all the cores to pickup the GTP packets (if NIC gives me round robin) and then based on the inner IP packet's src IP address (the mobile IP address), I can take it to the further relevant core for processing. This way I will get a complete load balancing done not only for polling from NIC but also for processing of the inner IP packets.

I have also worked a lot on Cavium processors. Those of you who are familiar with that would know that the POW scheduler gives the packets to whichever core is requesting for work so the packets can go to any core in Cavium Octeon processor. The only way to achieve similar functionality in DPDK is to drop a queue per core into the NIC and then let NIC do round robin on those queues blindly. What's the harm if this feature is added, let those who want to use it, use, and those who hate it or think it is useless, ignore.

Regards
-Prashant

-----Original Message-----
From: François-Frédéric Ozog [mailto:ff@ozog.com]
Sent: Thursday, December 05, 2013 2:16 PM
To: Prashant Upadhyaya
Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org
Subject: RE: [dpdk-dev] generic load balancing

Hi,

If the traffic you manage is above MPLS or GTP encapsulations, then you can use cards that provide flexible hash functions. Chelsio cxgb5 provides combination of "offset", length and tuple that may help.

The only reason I would have loved to get a pure round robin feature was to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)  tests where the traffic issue was multicast from a single source... But that is not real life traffic.

If you could share the use case...

François-Frédéric

> -----Message d'origine-----
> De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com]
> Envoyé : jeudi 5 décembre 2013 06:30
> À : Stephen Hemminger
> Cc : François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org Objet :
> RE: [dpdk-dev] generic load balancing
>
> Hi Stepher,
>
> The awfulness depends upon the 'usecase'
> I have eg. a usecase where I want this roundrobin behaviour.
>
> I just want the NIC to give me a facility to use this.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, December 05, 2013 10:25 AM
> To: Prashant Upadhyaya
> Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
>
> Round robin would actually be awful for any protocol because it would
cause
> out of order packets.
> That is why flow based algorithms like flow director and RSS work much
> better.
>
> On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> <prashant.upadhyaya@aricent.com> wrote:
> > Hi,
> >
> > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > have a
> simple round robin scheduling of packets on the configured queues.
> >
> > I have requested Intel earlier, and using this forum requesting
> > again --
> please please put this facility in the NIC that if I drop N queues
> there and configure  the NIC for some round robin scheduling on
> queues, then NIC should simply put the received packets one by one on
> queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > The above is very useful in lot of load balancing cases.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > François-Frédéric Ozog
> > Sent: Thursday, December 05, 2013 2:35 AM
> > To: 'Michael Quicquaro'
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Hi,
> >
> > As far as I can tell, this is really hardware dependent. Some hash
> functions allow uplink and downlink packets of the same "session" to
> go to the same queue (I know Chelsio can do this).
> >
> > For the Intel card, you may find what you want in:
> > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > -g
> > be-con
> > troller-datasheet.html
> >
> > Other cards require NDA or other agreements to get details of RSS.
> >
> > If you have a performance problem, may I suggest you use kernel 3.10
then
> monitor system activity with "perf" command. For instance you can
> start with "perf top -a" this will give you nice information. Then
> your creativity will do the rest ;-) You may be surprised what comes
> on the top hot points...
> > (the most unexpected hot function I found here was Linux syscall
> > gettimeofday!!!)
> >
> > François-Frédéric
> >
> >> -----Message d'origine-----
> >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
> >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org
Objet
> :
> >> [dpdk-dev] generic load balancing
> >>
> >> Hi all,
> >> I am writing a dpdk application that will receive packets from one
> >> interface and process them.  It does not forward packets in the
> > traditional
> >> sense.  However, I do need to process them at full line rate and
> >> therefore need more than one core.  The packets can be somewhat
> >> generic in nature
> > and
> >> can be nearly identical (especially at the beginning of the packet).
> >> I've used the rxonly function of testpmd as a model.
> >>
> >> I've run into problems in processing a full line rate of data since
> >> the nature of the data causes all the data to be presented to only
> >> one
> core.
> > I
> >> get a large percentage of dropped packets (shows up as Rx-Errors in
> >> "port
> >> stats") because of this.  I've tried modifying the data so that
> >> packets have different UDP ports and that seems to work when I use
> >> --rss-udp
> >>
> >> My questions are:
> >> 1) Is there a way to configure RSS so that it alternates packets to
> >> all configured cores regardless of the packet data?
> >>
> >> 2)  Where is the best place to learn more about RSS and how to
> >> configure it? I have not found much in the DPDK documentation.
> >>
> >> Thanks for the help,
> >> - Mike
> >
> >
> >
> >
> >
> > ====================================================================
> > ==
> > ========= Please refer to
> > http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> > ====================================================================
> > ==
> > =========
>
>
>
>
>
===========================================================================
> ====
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
===========================================================================
> ====





===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05 14:29           ` Prashant Upadhyaya
@ 2013-12-05 15:42             ` Michael Quicquaro
  2013-12-05 16:21               ` Thomas Monjalon
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Quicquaro @ 2013-12-05 15:42 UTC (permalink / raw)
  To: Prashant Upadhyaya; +Cc: dev

This is a good discussion and I hope Intel can see and benefit from it.
 For my "usecase", I don't necessarily need round robin on a per packet
level, but simply some normalized distribution among core queues that has
nothing to do with anything inside the packet.  A good solution perhaps
could be to allow the NIC to switch to another core's queue after a certain
number of packets have been received... perhaps using something like the
burst rate.  I just see this as being something that is of the most
fundamental in functionality and lacking in the DPDK.  I'm sure there are
many "usecase"s that don't involve routing/forwarding/switching/etc. but of
course need to maximize throughput.

- Mike


On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya <
prashant.upadhyaya@aricent.com> wrote:

> Hi,
>
> Well, GTP is the main usecase.
> We end up with a GTP tunnel between the two machines.
> And ordinarily with 82599, all the data will land up on a single queue and
> therefore must be polled on a single core. Bottleneck.
>
> But in general, if I want to employ all the CPU cores horsepower
> simultaneously to pickup the packets from NIC, then it is natural that I
> drop a queue each for every core into the NIC and if the NIC does a round
> robin then it naturally fans out and I can use all the cores to lift
> packets from NIC in a load balanced fashion.
>
> Imagine a theoretical usecase, where I have to lift the packets from the
> NIC, inspect it myself in the application and then switch them to the right
> core for further processing. So my cores have two jobs, one is to poll the
> NIC and then switch the packets to the right core. Here I would simply love
> to poll the queue and the intercore ring from each core to achieve the
> processing. No single core will become the bottleneck as far as polling the
> NIC is concerned. You might argue on what basis I switch to the relevant
> core for further processing, but that's _my_ usecase and headache to
> further equally distribute amongst the cores.
>
> Imagine an LTE usecase where I am on the core side (SGW), the packets come
> over GTP from thousands of mobiles (via eNB). I can employ all the cores to
> pickup the GTP packets (if NIC gives me round robin) and then based on the
> inner IP packet's src IP address (the mobile IP address), I can take it to
> the further relevant core for processing. This way I will get a complete
> load balancing done not only for polling from NIC but also for processing
> of the inner IP packets.
>
> I have also worked a lot on Cavium processors. Those of you who are
> familiar with that would know that the POW scheduler gives the packets to
> whichever core is requesting for work so the packets can go to any core in
> Cavium Octeon processor. The only way to achieve similar functionality in
> DPDK is to drop a queue per core into the NIC and then let NIC do round
> robin on those queues blindly. What's the harm if this feature is added,
> let those who want to use it, use, and those who hate it or think it is
> useless, ignore.
>
> Regards
> -Prashant
>
> -----Original Message-----
> From: François-Frédéric Ozog [mailto:ff@ozog.com]
> Sent: Thursday, December 05, 2013 2:16 PM
> To: Prashant Upadhyaya
> Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org
> Subject: RE: [dpdk-dev] generic load balancing
>
> Hi,
>
> If the traffic you manage is above MPLS or GTP encapsulations, then you
> can use cards that provide flexible hash functions. Chelsio cxgb5 provides
> combination of "offset", length and tuple that may help.
>
> The only reason I would have loved to get a pure round robin feature was
> to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)
>  tests where the traffic issue was multicast from a single source... But
> that is not real life traffic.
>
> If you could share the use case...
>
> François-Frédéric
>
> > -----Message d'origine-----
> > De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com]
> > Envoyé : jeudi 5 décembre 2013 06:30
> > À : Stephen Hemminger
> > Cc : François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org Objet :
> > RE: [dpdk-dev] generic load balancing
> >
> > Hi Stepher,
> >
> > The awfulness depends upon the 'usecase'
> > I have eg. a usecase where I want this roundrobin behaviour.
> >
> > I just want the NIC to give me a facility to use this.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, December 05, 2013 10:25 AM
> > To: Prashant Upadhyaya
> > Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Round robin would actually be awful for any protocol because it would
> cause
> > out of order packets.
> > That is why flow based algorithms like flow director and RSS work much
> > better.
> >
> > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> > <prashant.upadhyaya@aricent.com> wrote:
> > > Hi,
> > >
> > > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > > have a
> > simple round robin scheduling of packets on the configured queues.
> > >
> > > I have requested Intel earlier, and using this forum requesting
> > > again --
> > please please put this facility in the NIC that if I drop N queues
> > there and configure  the NIC for some round robin scheduling on
> > queues, then NIC should simply put the received packets one by one on
> > queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > > The above is very useful in lot of load balancing cases.
> > >
> > > Regards
> > > -Prashant
> > >
> > >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > > François-Frédéric Ozog
> > > Sent: Thursday, December 05, 2013 2:35 AM
> > > To: 'Michael Quicquaro'
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] generic load balancing
> > >
> > > Hi,
> > >
> > > As far as I can tell, this is really hardware dependent. Some hash
> > functions allow uplink and downlink packets of the same "session" to
> > go to the same queue (I know Chelsio can do this).
> > >
> > > For the Intel card, you may find what you want in:
> > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > > -g
> > > be-con
> > > troller-datasheet.html
> > >
> > > Other cards require NDA or other agreements to get details of RSS.
> > >
> > > If you have a performance problem, may I suggest you use kernel 3.10
> then
> > monitor system activity with "perf" command. For instance you can
> > start with "perf top -a" this will give you nice information. Then
> > your creativity will do the rest ;-) You may be surprised what comes
> > on the top hot points...
> > > (the most unexpected hot function I found here was Linux syscall
> > > gettimeofday!!!)
> > >
> > > François-Frédéric
> > >
> > >> -----Message d'origine-----
> > >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
> > >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org
> Objet
> > :
> > >> [dpdk-dev] generic load balancing
> > >>
> > >> Hi all,
> > >> I am writing a dpdk application that will receive packets from one
> > >> interface and process them.  It does not forward packets in the
> > > traditional
> > >> sense.  However, I do need to process them at full line rate and
> > >> therefore need more than one core.  The packets can be somewhat
> > >> generic in nature
> > > and
> > >> can be nearly identical (especially at the beginning of the packet).
> > >> I've used the rxonly function of testpmd as a model.
> > >>
> > >> I've run into problems in processing a full line rate of data since
> > >> the nature of the data causes all the data to be presented to only
> > >> one
> > core.
> > > I
> > >> get a large percentage of dropped packets (shows up as Rx-Errors in
> > >> "port
> > >> stats") because of this.  I've tried modifying the data so that
> > >> packets have different UDP ports and that seems to work when I use
> > >> --rss-udp
> > >>
> > >> My questions are:
> > >> 1) Is there a way to configure RSS so that it alternates packets to
> > >> all configured cores regardless of the packet data?
> > >>
> > >> 2)  Where is the best place to learn more about RSS and how to
> > >> configure it? I have not found much in the DPDK documentation.
> > >>
> > >> Thanks for the help,
> > >> - Mike
> > >
> > >
> > >
> > >
> > >
> > > ====================================================================
> > > ==
> > > ========= Please refer to
> > > http://www.aricent.com/legal/email_disclaimer.html
> > > for important disclosures regarding this electronic communication.
> > > ====================================================================
> > > ==
> > > =========
> >
> >
> >
> >
> >
> ===========================================================================
> > ====
> > Please refer to http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> >
> ===========================================================================
> > ====
>
>
>
>
>
>
> ===============================================================================
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
> ===============================================================================
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05 15:42             ` Michael Quicquaro
@ 2013-12-05 16:21               ` Thomas Monjalon
  2013-12-06  2:16                 ` 吴亚东
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Monjalon @ 2013-12-05 16:21 UTC (permalink / raw)
  To: Michael Quicquaro, Prashant Upadhyaya; +Cc: dev

Hello,

05/12/2013 16:42, Michael Quicquaro :
> This is a good discussion and I hope Intel can see and benefit from it.

Don't forget that this project is Open Source.
So you can submit your patches for review.

Thanks for participating
-- 
Thomas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05 14:16           ` Prashant Upadhyaya
@ 2013-12-05 18:33             ` Benson, Bryan
  0 siblings, 0 replies; 16+ messages in thread
From: Benson, Bryan @ 2013-12-05 18:33 UTC (permalink / raw)
  To: Prashant Upadhyaya, Stephen Hemminger; +Cc: dev

Prashant,
1) I thought the same, but I was pleasantly surprised at how much a single core can RX and distribute (from a single 10G port). It was a while back, but in my experimentation with well distributed incoming flows, I found nearly identical bottleneck points between polling using one core and using RSS for multiple cores.
2) Here are a few links - The flex byte offset looks like it is global for each port, but the value it matches is part of each in the signature, as is the output port.
Adding the sig:
http://dpdk.org/doc/api/rte__ethdev_8h.html#a4ef515ffe18b57bed5493bcea90f16d7
Filter definition:
http://dpdk.org/doc/api/structrte__fdir__filter.html

- Now, I have not yet used the flow director features, so YMMV. Also, I vaguely remember hearing a bit about a performance issue when using more than 4 output queues, due to the number of PCI-e lanes.  I don't recall the exact details, so if you see something like that when using more than 4 queues, don't be surprised (but let us know what you find!).

I hope this helps,
- Bryan

________________________________
From: Prashant Upadhyaya [prashant.upadhyaya@aricent.com]
Sent: Thursday, December 05, 2013 6:16 AM
To: Benson, Bryan; Stephen Hemminger
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] generic load balancing

Hi Bryan,

Regarding your 1st point, the single core becomes the rx bottleneck which is clearly not desirable.

I am not sure regarding how to use the stuff you mentioned in 2nd point, is there some DPDK api which lets me configure this, kindly let me know.

Regards
-Prashant


From: Benson, Bryan [mailto:bmbenson@amazon.com]
Sent: Thursday, December 05, 2013 1:14 PM
To: Prashant Upadhyaya; Stephen Hemminger
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] generic load balancing

Prashant,
I assume your use case is not of one IP/UDP/TCP - or if it is, you are dealing with a single tuple that is not evenly distributed.

You have a few options with the NIC that I can think of.

1) Use a single core to RX each port's frames and use your own software solution to RR to worker rings.  There is an example of this in the Load Balancer sample application.

2) If your packets/frames have an evenly distributed field in the first 64 bytes of the frame, you can use the 2 byte match feature of flow director to send to different queues (with multiple match signatures).  This will give even distribution, but not round robin behavior.

3) Modify the RSS redirection table for the NIC in the order you desire.  I am unsure how often this can happen, or if there are performance issues with reprogramming it.  Definitely would need some experimentation.

What is it you are trying to achieve with Round Robin?  A distribution of packets to multiple cores for processing, or something else?

Without knowing the use case, my main suggestion is to use the LB sample application - that way you can distribute in any way you please.

Thanks,
Bryan Benson


-------- Original message --------
From: Prashant Upadhyaya
Date:12/04/2013 9:30 PM (GMT-08:00)
To: Stephen Hemminger
Cc: dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing
Hi Stepher,

The awfulness depends upon the 'usecase'
I have eg. a usecase where I want this roundrobin behaviour.

I just want the NIC to give me a facility to use this.

Regards
-Prashant


-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org]
Sent: Thursday, December 05, 2013 10:25 AM
To: Prashant Upadhyaya
Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing

Round robin would actually be awful for any protocol because it would cause out of order packets.
That is why flow based algorithms like flow director and RSS work much better.

On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya <prashant.upadhyaya@aricent.com<mailto:prashant.upadhyaya@aricent.com>> wrote:
> Hi,
>
> It's a real pity that Intel 82599 NIC (and possibly others) don't have a simple round robin scheduling of packets on the configured queues.
>
> I have requested Intel earlier, and using this forum requesting again -- please please put this facility in the NIC that if I drop N queues there and configure  the NIC for some round robin scheduling on queues, then NIC should simply put the received packets one by one on queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> The above is very useful in lot of load balancing cases.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of François-Frédéric
> Ozog
> Sent: Thursday, December 05, 2013 2:35 AM
> To: 'Michael Quicquaro'
> Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> Subject: Re: [dpdk-dev] generic load balancing
>
> Hi,
>
> As far as I can tell, this is really hardware dependent. Some hash functions allow uplink and downlink packets of the same "session" to go to the same queue (I know Chelsio can do this).
>
> For the Intel card, you may find what you want in:
> http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g
> be-con
> troller-datasheet.html
>
> Other cards require NDA or other agreements to get details of RSS.
>
> If you have a performance problem, may I suggest you use kernel 3.10 then monitor system activity with "perf" command. For instance you can start with "perf top -a" this will give you nice information. Then your creativity will do the rest ;-) You may be surprised what comes on the top hot points...
> (the most unexpected hot function I found here was Linux syscall
> gettimeofday!!!)
>
> François-Frédéric
>
>> -----Message d'origine-----
>> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
>> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org<mailto:dev@dpdk.org> Objet :
>> [dpdk-dev] generic load balancing
>>
>> Hi all,
>> I am writing a dpdk application that will receive packets from one
>> interface and process them.  It does not forward packets in the
> traditional
>> sense.  However, I do need to process them at full line rate and
>> therefore need more than one core.  The packets can be somewhat
>> generic in nature
> and
>> can be nearly identical (especially at the beginning of the packet).
>> I've used the rxonly function of testpmd as a model.
>>
>> I've run into problems in processing a full line rate of data since
>> the nature of the data causes all the data to be presented to only one core.
> I
>> get a large percentage of dropped packets (shows up as Rx-Errors in
>> "port
>> stats") because of this.  I've tried modifying the data so that
>> packets have different UDP ports and that seems to work when I use
>> --rss-udp
>>
>> My questions are:
>> 1) Is there a way to configure RSS so that it alternates packets to
>> all configured cores regardless of the packet data?
>>
>> 2)  Where is the best place to learn more about RSS and how to
>> configure it? I have not found much in the DPDK documentation.
>>
>> Thanks for the help,
>> - Mike
>
>
>
>
>
> ======================================================================
> ========= Please refer to
> http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ======================================================================
> =========




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================




===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-05 16:21               ` Thomas Monjalon
@ 2013-12-06  2:16                 ` 吴亚东
  2013-12-06  4:03                   ` Prashant Upadhyaya
  0 siblings, 1 reply; 16+ messages in thread
From: 吴亚东 @ 2013-12-06  2:16 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

RSS is a way to distribute packets to multi cores while packets order in
the same flow still get maintained.

Round robin distribution of packets may cause ooo(out of order) of packets
in the same flow.
We also meet this problem in ipsec vpn case.
The tunneled packets are rss to the same queue if they are on the same
tunnel.
But if we dispatch the packets to the other cores to process, ooo packets
may occur and tcp performance may be greatly hurt.

If you enable rss on udp packets and some udp packets are ip fragmented,
rss of udp fragments(hash only calculated from ip addr) may be different
fom rss of udp non-fragment packets(hash with information of udp ports),
ooo may occur too.
So in kernel driver disables udp rss by default.

If intel supports round robin distribution of packets in the same flow,
Intel needs to provide some way like Cavium's SSO(tag switch) to maintain
packet order in the same flow. And it is hard to do so because intel's cpu
and nic are decoupled.





2013/12/6 Thomas Monjalon <thomas.monjalon@6wind.com>

> Hello,
>
> 05/12/2013 16:42, Michael Quicquaro :
> > This is a good discussion and I hope Intel can see and benefit from it.
>
> Don't forget that this project is Open Source.
> So you can submit your patches for review.
>
> Thanks for participating
> --
> Thomas
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-06  2:16                 ` 吴亚东
@ 2013-12-06  4:03                   ` Prashant Upadhyaya
  2013-12-06  7:53                     ` François-Frédéric Ozog
  0 siblings, 1 reply; 16+ messages in thread
From: Prashant Upadhyaya @ 2013-12-06  4:03 UTC (permalink / raw)
  To: 吴亚东, Thomas Monjalon; +Cc: dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 2754 bytes --]

Hi,

Regarding this point ¨C

If intel supports round robin distribution of packets in the same flow, Intel needs to provide some way like Cavium's SSO(tag switch) to maintain packet order in the same flow. And it is hard to do so because intel's cpu and nic are decoupled

My main submission is ¨C I understand there are issues like the above and ooo stuff you pointed out.
But that is for the usecase implementer to solve in software logic. The equivalent of tag switch can be attempted to be developed in the software if the usecase so desires.
But atleast ¡®give¡¯ the facility in the NIC to fan out on round robin on queues.
Somehow we are trying to find out reasons why we should not have it.
I am saying, give it in the NIC and let people use it in innovative ways. People who don¡¯t want to use it can always have the choice to not use it.

Regards
-Prashant


From: ÎâÑǶ« [mailto:ydwoo0722@gmail.com]
Sent: Friday, December 06, 2013 7:47 AM
To: Thomas Monjalon
Cc: Michael Quicquaro; Prashant Upadhyaya; dev@dpdk.org
Subject: Re: [dpdk-dev] generic load balancing

RSS is a way to distribute packets to multi cores while packets order in the same flow still get maintained.

Round robin distribution of packets may cause ooo(out of order) of packets in the same flow.
We also meet this problem in ipsec vpn case.
The tunneled packets are rss to the same queue if they are on the same tunnel.
But if we dispatch the packets to the other cores to process, ooo packets may occur and tcp performance may be greatly hurt.

If you enable rss on udp packets and some udp packets are ip fragmented, rss of udp fragments(hash only calculated from ip addr) may be different fom rss of udp non-fragment packets(hash with information of udp ports), ooo may occur too.
So in kernel driver disables udp rss by default.

If intel supports round robin distribution of packets in the same flow, Intel needs to provide some way like Cavium's SSO(tag switch) to maintain packet order in the same flow. And it is hard to do so because intel's cpu and nic are decoupled.




2013/12/6 Thomas Monjalon <thomas.monjalon@6wind.com<mailto:thomas.monjalon@6wind.com>>
Hello,

05/12/2013 16:42, Michael Quicquaro :
> This is a good discussion and I hope Intel can see and benefit from it.
Don't forget that this project is Open Source.
So you can submit your patches for review.

Thanks for participating
--
Thomas





===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] generic load balancing
  2013-12-06  4:03                   ` Prashant Upadhyaya
@ 2013-12-06  7:53                     ` François-Frédéric Ozog
  0 siblings, 0 replies; 16+ messages in thread
From: François-Frédéric Ozog @ 2013-12-06  7:53 UTC (permalink / raw)
  To: 'Prashant Upadhyaya', '吴亚东',
	'Thomas Monjalon'
  Cc: dev

Can we (as a community) be leading the way for the NIC vendors?

I mean, a few years ago I had the discussion with Chelsio to solve MPLS and GTP load balancing.
They were happy to integrate the "requirements" in the roadmap....

So could we build a list of such "requirements" and publish it? NIC vendors are looking ways to differentiate from one another, so I assume this may help us get what we want.

In addition to the NIC requirements we may polish an API to control those features in a standard way from DPDK.


François-Frédéric


> -----Message d'origine-----
> De : dev [mailto:dev-bounces@dpdk.org] De la part de Prashant Upadhyaya
> Envoyé : vendredi 6 décembre 2013 05:04
> À : 吴亚东; Thomas Monjalon
> Cc : dev@dpdk.org
> Objet : Re: [dpdk-dev] generic load balancing
> 
> Hi,
> 
> Regarding this point –
> 
> If intel supports round robin distribution of packets in the same flow,
> Intel needs to provide some way like Cavium's SSO(tag switch) to maintain
> packet order in the same flow. And it is hard to do so because intel's cpu
> and nic are decoupled
> 
> My main submission is – I understand there are issues like the above and
> ooo stuff you pointed out.
> But that is for the usecase implementer to solve in software logic. The
> equivalent of tag switch can be attempted to be developed in the software
> if the usecase so desires.
> But atleast ‘give’ the facility in the NIC to fan out on round robin on
> queues.
> Somehow we are trying to find out reasons why we should not have it.
> I am saying, give it in the NIC and let people use it in innovative ways.
> People who don’t want to use it can always have the choice to not use it.
> 
> Regards
> -Prashant
> 
> 
> From: 吴亚东 [mailto:ydwoo0722@gmail.com]
> Sent: Friday, December 06, 2013 7:47 AM
> To: Thomas Monjalon
> Cc: Michael Quicquaro; Prashant Upadhyaya; dev@dpdk.org
> Subject: Re: [dpdk-dev] generic load balancing
> 
> RSS is a way to distribute packets to multi cores while packets order in
> the same flow still get maintained.
> 
> Round robin distribution of packets may cause ooo(out of order) of packets
> in the same flow.
> We also meet this problem in ipsec vpn case.
> The tunneled packets are rss to the same queue if they are on the same
> tunnel.
> But if we dispatch the packets to the other cores to process, ooo packets
> may occur and tcp performance may be greatly hurt.
> 
> If you enable rss on udp packets and some udp packets are ip fragmented,
> rss of udp fragments(hash only calculated from ip addr) may be different
> fom rss of udp non-fragment packets(hash with information of udp ports),
> ooo may occur too.
> So in kernel driver disables udp rss by default.
> 
> If intel supports round robin distribution of packets in the same flow,
> Intel needs to provide some way like Cavium's SSO(tag switch) to maintain
> packet order in the same flow. And it is hard to do so because intel's cpu
> and nic are decoupled.
> 
> 
> 
> 
> 2013/12/6 Thomas Monjalon
> <thomas.monjalon@6wind.com<mailto:thomas.monjalon@6wind.com>>
> Hello,
> 
> 05/12/2013 16:42, Michael Quicquaro :
> > This is a good discussion and I hope Intel can see and benefit from it.
> Don't forget that this project is Open Source.
> So you can submit your patches for review.
> 
> Thanks for participating
> --
> Thomas
> 
> 
> 
> 
> 
> ===========================================================================
> ====
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
> ===========================================================================
> ====

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-12-06  7:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-04 17:53 [dpdk-dev] generic load balancing Michael Quicquaro
2013-12-04 20:48 ` elevran
2013-12-04 21:04 ` François-Frédéric Ozog
2013-12-05  4:31   ` Prashant Upadhyaya
2013-12-05  4:54     ` Stephen Hemminger
2013-12-05  5:29       ` Prashant Upadhyaya
2013-12-05  7:44         ` Benson, Bryan
2013-12-05 14:16           ` Prashant Upadhyaya
2013-12-05 18:33             ` Benson, Bryan
2013-12-05  8:45         ` François-Frédéric Ozog
2013-12-05 14:29           ` Prashant Upadhyaya
2013-12-05 15:42             ` Michael Quicquaro
2013-12-05 16:21               ` Thomas Monjalon
2013-12-06  2:16                 ` 吴亚东
2013-12-06  4:03                   ` Prashant Upadhyaya
2013-12-06  7:53                     ` François-Frédéric Ozog

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).