DPDK patches and discussions
 help / color / mirror / Atom feed
From: Michael Quicquaro <michael.quicquaro@gmail.com>
To: Prashant Upadhyaya <prashant.upadhyaya@aricent.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing
Date: Thu, 5 Dec 2013 10:42:49 -0500	[thread overview]
Message-ID: <CAAD-K95OAw2zAhW5J16CD79HBK4dkeofG1CVudyMBx2SoVC48Q@mail.gmail.com> (raw)
In-Reply-To: <C7CE7EEF248E2B48BBA63D0ABEEE700C5353AEF970@GUREXMB01.ASIAN.AD.ARICENT.COM>

This is a good discussion and I hope Intel can see and benefit from it.
 For my "usecase", I don't necessarily need round robin on a per packet
level, but simply some normalized distribution among core queues that has
nothing to do with anything inside the packet.  A good solution perhaps
could be to allow the NIC to switch to another core's queue after a certain
number of packets have been received... perhaps using something like the
burst rate.  I just see this as being something that is of the most
fundamental in functionality and lacking in the DPDK.  I'm sure there are
many "usecase"s that don't involve routing/forwarding/switching/etc. but of
course need to maximize throughput.

- Mike


On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya <
prashant.upadhyaya@aricent.com> wrote:

> Hi,
>
> Well, GTP is the main usecase.
> We end up with a GTP tunnel between the two machines.
> And ordinarily with 82599, all the data will land up on a single queue and
> therefore must be polled on a single core. Bottleneck.
>
> But in general, if I want to employ all the CPU cores horsepower
> simultaneously to pickup the packets from NIC, then it is natural that I
> drop a queue each for every core into the NIC and if the NIC does a round
> robin then it naturally fans out and I can use all the cores to lift
> packets from NIC in a load balanced fashion.
>
> Imagine a theoretical usecase, where I have to lift the packets from the
> NIC, inspect it myself in the application and then switch them to the right
> core for further processing. So my cores have two jobs, one is to poll the
> NIC and then switch the packets to the right core. Here I would simply love
> to poll the queue and the intercore ring from each core to achieve the
> processing. No single core will become the bottleneck as far as polling the
> NIC is concerned. You might argue on what basis I switch to the relevant
> core for further processing, but that's _my_ usecase and headache to
> further equally distribute amongst the cores.
>
> Imagine an LTE usecase where I am on the core side (SGW), the packets come
> over GTP from thousands of mobiles (via eNB). I can employ all the cores to
> pickup the GTP packets (if NIC gives me round robin) and then based on the
> inner IP packet's src IP address (the mobile IP address), I can take it to
> the further relevant core for processing. This way I will get a complete
> load balancing done not only for polling from NIC but also for processing
> of the inner IP packets.
>
> I have also worked a lot on Cavium processors. Those of you who are
> familiar with that would know that the POW scheduler gives the packets to
> whichever core is requesting for work so the packets can go to any core in
> Cavium Octeon processor. The only way to achieve similar functionality in
> DPDK is to drop a queue per core into the NIC and then let NIC do round
> robin on those queues blindly. What's the harm if this feature is added,
> let those who want to use it, use, and those who hate it or think it is
> useless, ignore.
>
> Regards
> -Prashant
>
> -----Original Message-----
> From: François-Frédéric Ozog [mailto:ff@ozog.com]
> Sent: Thursday, December 05, 2013 2:16 PM
> To: Prashant Upadhyaya
> Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org
> Subject: RE: [dpdk-dev] generic load balancing
>
> Hi,
>
> If the traffic you manage is above MPLS or GTP encapsulations, then you
> can use cards that provide flexible hash functions. Chelsio cxgb5 provides
> combination of "offset", length and tuple that may help.
>
> The only reason I would have loved to get a pure round robin feature was
> to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)
>  tests where the traffic issue was multicast from a single source... But
> that is not real life traffic.
>
> If you could share the use case...
>
> François-Frédéric
>
> > -----Message d'origine-----
> > De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com]
> > Envoyé : jeudi 5 décembre 2013 06:30
> > À : Stephen Hemminger
> > Cc : François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org Objet :
> > RE: [dpdk-dev] generic load balancing
> >
> > Hi Stepher,
> >
> > The awfulness depends upon the 'usecase'
> > I have eg. a usecase where I want this roundrobin behaviour.
> >
> > I just want the NIC to give me a facility to use this.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, December 05, 2013 10:25 AM
> > To: Prashant Upadhyaya
> > Cc: François-Frédéric Ozog; Michael Quicquaro; dev@dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Round robin would actually be awful for any protocol because it would
> cause
> > out of order packets.
> > That is why flow based algorithms like flow director and RSS work much
> > better.
> >
> > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> > <prashant.upadhyaya@aricent.com> wrote:
> > > Hi,
> > >
> > > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > > have a
> > simple round robin scheduling of packets on the configured queues.
> > >
> > > I have requested Intel earlier, and using this forum requesting
> > > again --
> > please please put this facility in the NIC that if I drop N queues
> > there and configure  the NIC for some round robin scheduling on
> > queues, then NIC should simply put the received packets one by one on
> > queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > > The above is very useful in lot of load balancing cases.
> > >
> > > Regards
> > > -Prashant
> > >
> > >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > > François-Frédéric Ozog
> > > Sent: Thursday, December 05, 2013 2:35 AM
> > > To: 'Michael Quicquaro'
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] generic load balancing
> > >
> > > Hi,
> > >
> > > As far as I can tell, this is really hardware dependent. Some hash
> > functions allow uplink and downlink packets of the same "session" to
> > go to the same queue (I know Chelsio can do this).
> > >
> > > For the Intel card, you may find what you want in:
> > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > > -g
> > > be-con
> > > troller-datasheet.html
> > >
> > > Other cards require NDA or other agreements to get details of RSS.
> > >
> > > If you have a performance problem, may I suggest you use kernel 3.10
> then
> > monitor system activity with "perf" command. For instance you can
> > start with "perf top -a" this will give you nice information. Then
> > your creativity will do the rest ;-) You may be surprised what comes
> > on the top hot points...
> > > (the most unexpected hot function I found here was Linux syscall
> > > gettimeofday!!!)
> > >
> > > François-Frédéric
> > >
> > >> -----Message d'origine-----
> > >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
> > >> Quicquaro Envoyé : mercredi 4 décembre 2013 18:53 À : dev@dpdk.org
> Objet
> > :
> > >> [dpdk-dev] generic load balancing
> > >>
> > >> Hi all,
> > >> I am writing a dpdk application that will receive packets from one
> > >> interface and process them.  It does not forward packets in the
> > > traditional
> > >> sense.  However, I do need to process them at full line rate and
> > >> therefore need more than one core.  The packets can be somewhat
> > >> generic in nature
> > > and
> > >> can be nearly identical (especially at the beginning of the packet).
> > >> I've used the rxonly function of testpmd as a model.
> > >>
> > >> I've run into problems in processing a full line rate of data since
> > >> the nature of the data causes all the data to be presented to only
> > >> one
> > core.
> > > I
> > >> get a large percentage of dropped packets (shows up as Rx-Errors in
> > >> "port
> > >> stats") because of this.  I've tried modifying the data so that
> > >> packets have different UDP ports and that seems to work when I use
> > >> --rss-udp
> > >>
> > >> My questions are:
> > >> 1) Is there a way to configure RSS so that it alternates packets to
> > >> all configured cores regardless of the packet data?
> > >>
> > >> 2)  Where is the best place to learn more about RSS and how to
> > >> configure it? I have not found much in the DPDK documentation.
> > >>
> > >> Thanks for the help,
> > >> - Mike
> > >
> > >
> > >
> > >
> > >
> > > ====================================================================
> > > ==
> > > ========= Please refer to
> > > http://www.aricent.com/legal/email_disclaimer.html
> > > for important disclosures regarding this electronic communication.
> > > ====================================================================
> > > ==
> > > =========
> >
> >
> >
> >
> >
> ===========================================================================
> > ====
> > Please refer to http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> >
> ===========================================================================
> > ====
>
>
>
>
>
>
> ===============================================================================
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
> ===============================================================================
>

  reply	other threads:[~2013-12-05 15:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-04 17:53 Michael Quicquaro
2013-12-04 20:48 ` elevran
2013-12-04 21:04 ` François-Frédéric Ozog
2013-12-05  4:31   ` Prashant Upadhyaya
2013-12-05  4:54     ` Stephen Hemminger
2013-12-05  5:29       ` Prashant Upadhyaya
2013-12-05  7:44         ` Benson, Bryan
2013-12-05 14:16           ` Prashant Upadhyaya
2013-12-05 18:33             ` Benson, Bryan
2013-12-05  8:45         ` François-Frédéric Ozog
2013-12-05 14:29           ` Prashant Upadhyaya
2013-12-05 15:42             ` Michael Quicquaro [this message]
2013-12-05 16:21               ` Thomas Monjalon
2013-12-06  2:16                 ` 吴亚东
2013-12-06  4:03                   ` Prashant Upadhyaya
2013-12-06  7:53                     ` François-Frédéric Ozog

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAD-K95OAw2zAhW5J16CD79HBK4dkeofG1CVudyMBx2SoVC48Q@mail.gmail.com \
    --to=michael.quicquaro@gmail.com \
    --cc=dev@dpdk.org \
    --cc=prashant.upadhyaya@aricent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).