From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <michael.quicquaro@gmail.com>
Received: from mail-bk0-x232.google.com (mail-bk0-x232.google.com
 [IPv6:2a00:1450:4008:c01::232])
 by dpdk.org (Postfix) with ESMTP id 8E2089DE
 for <dev@dpdk.org>; Thu,  5 Dec 2013 16:42:07 +0100 (CET)
Received: by mail-bk0-f50.google.com with SMTP id e11so7083056bkh.23
 for <dev@dpdk.org>; Thu, 05 Dec 2013 07:43:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=903oP3Z8QdRxPyz2UIrGZ3Fv0l6py/TgAT6+qO+rrMg=;
 b=mbR3RlqeWKv1sre2LcIgiqLRRusWjczmyfJ1EyQLLKI8HOWxgoz4zpX+5bU+r6pRmm
 D9DRC+XEpC7dKxSBdpetrsxqeNRoSpQtPkwRWpBHshSVbd/2Vmf/XVg/4HqiRpUUx8vF
 ibj7XSqTNXIPIHUonb+DC0AOd980Py4AsjOqw1x/jxmHiVJU0rYZHkQ1UAEfvI6Gwwww
 i+wXQ4A9gYeKMvulQrKKDpaQoGiozwhW3rEa3QfaVAJuK5+4jrpDdPnJnndoa+mFhoYY
 onhMBWJNELL5XQ6sjsTVrpH5tjT/IrKcGJrGjvU2VJNvO1alZK3XaQJxibN/nGsKZqYI
 +5Ug==
X-Received: by 10.204.162.7 with SMTP id t7mr38509bkx.84.1386258189862; Thu,
 05 Dec 2013 07:43:09 -0800 (PST)
MIME-Version: 1.0
Received: by 10.205.24.5 with HTTP; Thu, 5 Dec 2013 07:42:49 -0800 (PST)
In-Reply-To: <C7CE7EEF248E2B48BBA63D0ABEEE700C5353AEF970@GUREXMB01.ASIAN.AD.ARICENT.COM>
References: <CAAD-K94YUY6aUPzvJyqJ7w4W2_81d0Fq7EvwJ1xKOzzd3Ld4Lw@mail.gmail.com>
 <03f701cef134$7e564720$7b02d560$@com>
 <C7CE7EEF248E2B48BBA63D0ABEEE700C5353AEF76F@GUREXMB01.ASIAN.AD.ARICENT.COM>
 <CAOaVG16OSgtUOTiv6nqOfuz7MgDP+Hygqv7hEKUMxMaFctnCpg@mail.gmail.com>
 <C7CE7EEF248E2B48BBA63D0ABEEE700C5353AEF790@GUREXMB01.ASIAN.AD.ARICENT.COM>
 <003901cef196$604e5da0$20eb18e0$@com>
 <C7CE7EEF248E2B48BBA63D0ABEEE700C5353AEF970@GUREXMB01.ASIAN.AD.ARICENT.COM>
From: Michael Quicquaro <michael.quicquaro@gmail.com>
Date: Thu, 5 Dec 2013 10:42:49 -0500
Message-ID: <CAAD-K95OAw2zAhW5J16CD79HBK4dkeofG1CVudyMBx2SoVC48Q@mail.gmail.com>
To: Prashant Upadhyaya <prashant.upadhyaya@aricent.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] generic load balancing
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Dec 2013 15:42:08 -0000

This is a good discussion and I hope Intel can see and benefit from it.
 For my "usecase", I don't necessarily need round robin on a per packet
level, but simply some normalized distribution among core queues that has
nothing to do with anything inside the packet.  A good solution perhaps
could be to allow the NIC to switch to another core's queue after a certain
number of packets have been received... perhaps using something like the
burst rate.  I just see this as being something that is of the most
fundamental in functionality and lacking in the DPDK.  I'm sure there are
many "usecase"s that don't involve routing/forwarding/switching/etc. but of
course need to maximize throughput.

- Mike


On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya <
prashant.upadhyaya@aricent.com> wrote:

> Hi,
>
> Well, GTP is the main usecase.
> We end up with a GTP tunnel between the two machines.
> And ordinarily with 82599, all the data will land up on a single queue an=
d
> therefore must be polled on a single core. Bottleneck.
>
> But in general, if I want to employ all the CPU cores horsepower
> simultaneously to pickup the packets from NIC, then it is natural that I
> drop a queue each for every core into the NIC and if the NIC does a round
> robin then it naturally fans out and I can use all the cores to lift
> packets from NIC in a load balanced fashion.
>
> Imagine a theoretical usecase, where I have to lift the packets from the
> NIC, inspect it myself in the application and then switch them to the rig=
ht
> core for further processing. So my cores have two jobs, one is to poll th=
e
> NIC and then switch the packets to the right core. Here I would simply lo=
ve
> to poll the queue and the intercore ring from each core to achieve the
> processing. No single core will become the bottleneck as far as polling t=
he
> NIC is concerned. You might argue on what basis I switch to the relevant
> core for further processing, but that's _my_ usecase and headache to
> further equally distribute amongst the cores.
>
> Imagine an LTE usecase where I am on the core side (SGW), the packets com=
e
> over GTP from thousands of mobiles (via eNB). I can employ all the cores =
to
> pickup the GTP packets (if NIC gives me round robin) and then based on th=
e
> inner IP packet's src IP address (the mobile IP address), I can take it t=
o
> the further relevant core for processing. This way I will get a complete
> load balancing done not only for polling from NIC but also for processing
> of the inner IP packets.
>
> I have also worked a lot on Cavium processors. Those of you who are
> familiar with that would know that the POW scheduler gives the packets to
> whichever core is requesting for work so the packets can go to any core i=
n
> Cavium Octeon processor. The only way to achieve similar functionality in
> DPDK is to drop a queue per core into the NIC and then let NIC do round
> robin on those queues blindly. What's the harm if this feature is added,
> let those who want to use it, use, and those who hate it or think it is
> useless, ignore.
>
> Regards
> -Prashant
>
> -----Original Message-----
> From: Fran=E7ois-Fr=E9d=E9ric Ozog [mailto:ff@ozog.com]
> Sent: Thursday, December 05, 2013 2:16 PM
> To: Prashant Upadhyaya
> Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org
> Subject: RE: [dpdk-dev] generic load balancing
>
> Hi,
>
> If the traffic you manage is above MPLS or GTP encapsulations, then you
> can use cards that provide flexible hash functions. Chelsio cxgb5 provide=
s
> combination of "offset", length and tuple that may help.
>
> The only reason I would have loved to get a pure round robin feature was
> to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint)
>  tests where the traffic issue was multicast from a single source... But
> that is not real life traffic.
>
> If you could share the use case...
>
> Fran=E7ois-Fr=E9d=E9ric
>
> > -----Message d'origine-----
> > De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com]
> > Envoy=E9 : jeudi 5 d=E9cembre 2013 06:30
> > =C0 : Stephen Hemminger
> > Cc : Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org Obje=
t :
> > RE: [dpdk-dev] generic load balancing
> >
> > Hi Stepher,
> >
> > The awfulness depends upon the 'usecase'
> > I have eg. a usecase where I want this roundrobin behaviour.
> >
> > I just want the NIC to give me a facility to use this.
> >
> > Regards
> > -Prashant
> >
> >
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, December 05, 2013 10:25 AM
> > To: Prashant Upadhyaya
> > Cc: Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org
> > Subject: Re: [dpdk-dev] generic load balancing
> >
> > Round robin would actually be awful for any protocol because it would
> cause
> > out of order packets.
> > That is why flow based algorithms like flow director and RSS work much
> > better.
> >
> > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya
> > <prashant.upadhyaya@aricent.com> wrote:
> > > Hi,
> > >
> > > It's a real pity that Intel 82599 NIC (and possibly others) don't
> > > have a
> > simple round robin scheduling of packets on the configured queues.
> > >
> > > I have requested Intel earlier, and using this forum requesting
> > > again --
> > please please put this facility in the NIC that if I drop N queues
> > there and configure  the NIC for some round robin scheduling on
> > queues, then NIC should simply put the received packets one by one on
> > queue 1, then on queue2,....,then on queueN, and then back on queue 1.
> > > The above is very useful in lot of load balancing cases.
> > >
> > > Regards
> > > -Prashant
> > >
> > >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > > Fran=E7ois-Fr=E9d=E9ric Ozog
> > > Sent: Thursday, December 05, 2013 2:35 AM
> > > To: 'Michael Quicquaro'
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] generic load balancing
> > >
> > > Hi,
> > >
> > > As far as I can tell, this is really hardware dependent. Some hash
> > functions allow uplink and downlink packets of the same "session" to
> > go to the same queue (I know Chelsio can do this).
> > >
> > > For the Intel card, you may find what you want in:
> > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10
> > > -g
> > > be-con
> > > troller-datasheet.html
> > >
> > > Other cards require NDA or other agreements to get details of RSS.
> > >
> > > If you have a performance problem, may I suggest you use kernel 3.10
> then
> > monitor system activity with "perf" command. For instance you can
> > start with "perf top -a" this will give you nice information. Then
> > your creativity will do the rest ;-) You may be surprised what comes
> > on the top hot points...
> > > (the most unexpected hot function I found here was Linux syscall
> > > gettimeofday!!!)
> > >
> > > Fran=E7ois-Fr=E9d=E9ric
> > >
> > >> -----Message d'origine-----
> > >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael
> > >> Quicquaro Envoy=E9 : mercredi 4 d=E9cembre 2013 18:53 =C0 : dev@dpdk=
.org
> Objet
> > :
> > >> [dpdk-dev] generic load balancing
> > >>
> > >> Hi all,
> > >> I am writing a dpdk application that will receive packets from one
> > >> interface and process them.  It does not forward packets in the
> > > traditional
> > >> sense.  However, I do need to process them at full line rate and
> > >> therefore need more than one core.  The packets can be somewhat
> > >> generic in nature
> > > and
> > >> can be nearly identical (especially at the beginning of the packet).
> > >> I've used the rxonly function of testpmd as a model.
> > >>
> > >> I've run into problems in processing a full line rate of data since
> > >> the nature of the data causes all the data to be presented to only
> > >> one
> > core.
> > > I
> > >> get a large percentage of dropped packets (shows up as Rx-Errors in
> > >> "port
> > >> stats") because of this.  I've tried modifying the data so that
> > >> packets have different UDP ports and that seems to work when I use
> > >> --rss-udp
> > >>
> > >> My questions are:
> > >> 1) Is there a way to configure RSS so that it alternates packets to
> > >> all configured cores regardless of the packet data?
> > >>
> > >> 2)  Where is the best place to learn more about RSS and how to
> > >> configure it? I have not found much in the DPDK documentation.
> > >>
> > >> Thanks for the help,
> > >> - Mike
> > >
> > >
> > >
> > >
> > >
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > =3D=3D
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D Please refer to
> > > http://www.aricent.com/legal/email_disclaimer.html
> > > for important disclosures regarding this electronic communication.
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > =3D=3D
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D
> >
> >
> >
> >
> >
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> > =3D=3D=3D=3D
> > Please refer to http://www.aricent.com/legal/email_disclaimer.html
> > for important disclosures regarding this electronic communication.
> >
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> > =3D=3D=3D=3D
>
>
>
>
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
>