From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-bk0-x232.google.com (mail-bk0-x232.google.com [IPv6:2a00:1450:4008:c01::232]) by dpdk.org (Postfix) with ESMTP id 8E2089DE for ; Thu, 5 Dec 2013 16:42:07 +0100 (CET) Received: by mail-bk0-f50.google.com with SMTP id e11so7083056bkh.23 for ; Thu, 05 Dec 2013 07:43:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=903oP3Z8QdRxPyz2UIrGZ3Fv0l6py/TgAT6+qO+rrMg=; b=mbR3RlqeWKv1sre2LcIgiqLRRusWjczmyfJ1EyQLLKI8HOWxgoz4zpX+5bU+r6pRmm D9DRC+XEpC7dKxSBdpetrsxqeNRoSpQtPkwRWpBHshSVbd/2Vmf/XVg/4HqiRpUUx8vF ibj7XSqTNXIPIHUonb+DC0AOd980Py4AsjOqw1x/jxmHiVJU0rYZHkQ1UAEfvI6Gwwww i+wXQ4A9gYeKMvulQrKKDpaQoGiozwhW3rEa3QfaVAJuK5+4jrpDdPnJnndoa+mFhoYY onhMBWJNELL5XQ6sjsTVrpH5tjT/IrKcGJrGjvU2VJNvO1alZK3XaQJxibN/nGsKZqYI +5Ug== X-Received: by 10.204.162.7 with SMTP id t7mr38509bkx.84.1386258189862; Thu, 05 Dec 2013 07:43:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.205.24.5 with HTTP; Thu, 5 Dec 2013 07:42:49 -0800 (PST) In-Reply-To: References: <03f701cef134$7e564720$7b02d560$@com> <003901cef196$604e5da0$20eb18e0$@com> From: Michael Quicquaro Date: Thu, 5 Dec 2013 10:42:49 -0500 Message-ID: To: Prashant Upadhyaya Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] generic load balancing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 15:42:08 -0000 This is a good discussion and I hope Intel can see and benefit from it. For my "usecase", I don't necessarily need round robin on a per packet level, but simply some normalized distribution among core queues that has nothing to do with anything inside the packet. A good solution perhaps could be to allow the NIC to switch to another core's queue after a certain number of packets have been received... perhaps using something like the burst rate. I just see this as being something that is of the most fundamental in functionality and lacking in the DPDK. I'm sure there are many "usecase"s that don't involve routing/forwarding/switching/etc. but of course need to maximize throughput. - Mike On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya < prashant.upadhyaya@aricent.com> wrote: > Hi, > > Well, GTP is the main usecase. > We end up with a GTP tunnel between the two machines. > And ordinarily with 82599, all the data will land up on a single queue an= d > therefore must be polled on a single core. Bottleneck. > > But in general, if I want to employ all the CPU cores horsepower > simultaneously to pickup the packets from NIC, then it is natural that I > drop a queue each for every core into the NIC and if the NIC does a round > robin then it naturally fans out and I can use all the cores to lift > packets from NIC in a load balanced fashion. > > Imagine a theoretical usecase, where I have to lift the packets from the > NIC, inspect it myself in the application and then switch them to the rig= ht > core for further processing. So my cores have two jobs, one is to poll th= e > NIC and then switch the packets to the right core. Here I would simply lo= ve > to poll the queue and the intercore ring from each core to achieve the > processing. No single core will become the bottleneck as far as polling t= he > NIC is concerned. You might argue on what basis I switch to the relevant > core for further processing, but that's _my_ usecase and headache to > further equally distribute amongst the cores. > > Imagine an LTE usecase where I am on the core side (SGW), the packets com= e > over GTP from thousands of mobiles (via eNB). I can employ all the cores = to > pickup the GTP packets (if NIC gives me round robin) and then based on th= e > inner IP packet's src IP address (the mobile IP address), I can take it t= o > the further relevant core for processing. This way I will get a complete > load balancing done not only for polling from NIC but also for processing > of the inner IP packets. > > I have also worked a lot on Cavium processors. Those of you who are > familiar with that would know that the POW scheduler gives the packets to > whichever core is requesting for work so the packets can go to any core i= n > Cavium Octeon processor. The only way to achieve similar functionality in > DPDK is to drop a queue per core into the NIC and then let NIC do round > robin on those queues blindly. What's the harm if this feature is added, > let those who want to use it, use, and those who hate it or think it is > useless, ignore. > > Regards > -Prashant > > -----Original Message----- > From: Fran=E7ois-Fr=E9d=E9ric Ozog [mailto:ff@ozog.com] > Sent: Thursday, December 05, 2013 2:16 PM > To: Prashant Upadhyaya > Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org > Subject: RE: [dpdk-dev] generic load balancing > > Hi, > > If the traffic you manage is above MPLS or GTP encapsulations, then you > can use cards that provide flexible hash functions. Chelsio cxgb5 provide= s > combination of "offset", length and tuple that may help. > > The only reason I would have loved to get a pure round robin feature was > to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint) > tests where the traffic issue was multicast from a single source... But > that is not real life traffic. > > If you could share the use case... > > Fran=E7ois-Fr=E9d=E9ric > > > -----Message d'origine----- > > De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com] > > Envoy=E9 : jeudi 5 d=E9cembre 2013 06:30 > > =C0 : Stephen Hemminger > > Cc : Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org Obje= t : > > RE: [dpdk-dev] generic load balancing > > > > Hi Stepher, > > > > The awfulness depends upon the 'usecase' > > I have eg. a usecase where I want this roundrobin behaviour. > > > > I just want the NIC to give me a facility to use this. > > > > Regards > > -Prashant > > > > > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Thursday, December 05, 2013 10:25 AM > > To: Prashant Upadhyaya > > Cc: Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org > > Subject: Re: [dpdk-dev] generic load balancing > > > > Round robin would actually be awful for any protocol because it would > cause > > out of order packets. > > That is why flow based algorithms like flow director and RSS work much > > better. > > > > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya > > wrote: > > > Hi, > > > > > > It's a real pity that Intel 82599 NIC (and possibly others) don't > > > have a > > simple round robin scheduling of packets on the configured queues. > > > > > > I have requested Intel earlier, and using this forum requesting > > > again -- > > please please put this facility in the NIC that if I drop N queues > > there and configure the NIC for some round robin scheduling on > > queues, then NIC should simply put the received packets one by one on > > queue 1, then on queue2,....,then on queueN, and then back on queue 1. > > > The above is very useful in lot of load balancing cases. > > > > > > Regards > > > -Prashant > > > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of > > > Fran=E7ois-Fr=E9d=E9ric Ozog > > > Sent: Thursday, December 05, 2013 2:35 AM > > > To: 'Michael Quicquaro' > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] generic load balancing > > > > > > Hi, > > > > > > As far as I can tell, this is really hardware dependent. Some hash > > functions allow uplink and downlink packets of the same "session" to > > go to the same queue (I know Chelsio can do this). > > > > > > For the Intel card, you may find what you want in: > > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10 > > > -g > > > be-con > > > troller-datasheet.html > > > > > > Other cards require NDA or other agreements to get details of RSS. > > > > > > If you have a performance problem, may I suggest you use kernel 3.10 > then > > monitor system activity with "perf" command. For instance you can > > start with "perf top -a" this will give you nice information. Then > > your creativity will do the rest ;-) You may be surprised what comes > > on the top hot points... > > > (the most unexpected hot function I found here was Linux syscall > > > gettimeofday!!!) > > > > > > Fran=E7ois-Fr=E9d=E9ric > > > > > >> -----Message d'origine----- > > >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael > > >> Quicquaro Envoy=E9 : mercredi 4 d=E9cembre 2013 18:53 =C0 : dev@dpdk= .org > Objet > > : > > >> [dpdk-dev] generic load balancing > > >> > > >> Hi all, > > >> I am writing a dpdk application that will receive packets from one > > >> interface and process them. It does not forward packets in the > > > traditional > > >> sense. However, I do need to process them at full line rate and > > >> therefore need more than one core. The packets can be somewhat > > >> generic in nature > > > and > > >> can be nearly identical (especially at the beginning of the packet). > > >> I've used the rxonly function of testpmd as a model. > > >> > > >> I've run into problems in processing a full line rate of data since > > >> the nature of the data causes all the data to be presented to only > > >> one > > core. > > > I > > >> get a large percentage of dropped packets (shows up as Rx-Errors in > > >> "port > > >> stats") because of this. I've tried modifying the data so that > > >> packets have different UDP ports and that seems to work when I use > > >> --rss-udp > > >> > > >> My questions are: > > >> 1) Is there a way to configure RSS so that it alternates packets to > > >> all configured cores regardless of the packet data? > > >> > > >> 2) Where is the best place to learn more about RSS and how to > > >> configure it? I have not found much in the DPDK documentation. > > >> > > >> Thanks for the help, > > >> - Mike > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > =3D=3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D Please refer to > > > http://www.aricent.com/legal/email_disclaimer.html > > > for important disclosures regarding this electronic communication. > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > =3D=3D > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > =3D=3D=3D=3D > > Please refer to http://www.aricent.com/legal/email_disclaimer.html > > for important disclosures regarding this electronic communication. > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > =3D=3D=3D=3D > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > Please refer to http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >