From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from jaguar.aricent.com (jaguar.aricent.com [180.151.2.24]) by dpdk.org (Postfix) with ESMTP id 5E1379DE for ; Thu, 5 Dec 2013 15:28:52 +0100 (CET) Received: from jaguar.aricent.com (localhost [127.0.0.1]) by postfix.imss71 (Postfix) with ESMTP id 4219936B65; Thu, 5 Dec 2013 19:59:36 +0530 (IST) Received: from GUREXHT01.ASIAN.AD.ARICENT.COM (gurexht01.asian.ad.aricent.com [10.203.171.136]) by jaguar.aricent.com (Postfix) with ESMTP id 2BCDF36B27; Thu, 5 Dec 2013 19:59:36 +0530 (IST) Received: from GUREXMB01.asian.ad.aricent.com ([10.203.171.132]) by GUREXHT01.ASIAN.AD.ARICENT.COM ([10.203.171.136]) with mapi; Thu, 5 Dec 2013 19:59:36 +0530 From: Prashant Upadhyaya To: =?iso-8859-1?Q?Fran=E7ois-Fr=E9d=E9ric_Ozog?= Date: Thu, 5 Dec 2013 19:59:34 +0530 Thread-Topic: [dpdk-dev] generic load balancing Thread-Index: Ac7xdhmdU7sKtZzHQ/21UCnTI6UAzAABMGwQAAaZqdAAC9hqIA== Message-ID: References: <03f701cef134$7e564720$7b02d560$@com> <003901cef196$604e5da0$20eb18e0$@com> In-Reply-To: <003901cef196$604e5da0$20eb18e0$@com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-TM-AS-MML: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] generic load balancing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 14:28:53 -0000 Hi, Well, GTP is the main usecase. We end up with a GTP tunnel between the two machines. And ordinarily with 82599, all the data will land up on a single queue and = therefore must be polled on a single core. Bottleneck. But in general, if I want to employ all the CPU cores horsepower simultaneo= usly to pickup the packets from NIC, then it is natural that I drop a queue= each for every core into the NIC and if the NIC does a round robin then it= naturally fans out and I can use all the cores to lift packets from NIC in= a load balanced fashion. Imagine a theoretical usecase, where I have to lift the packets from the NI= C, inspect it myself in the application and then switch them to the right c= ore for further processing. So my cores have two jobs, one is to poll the N= IC and then switch the packets to the right core. Here I would simply love = to poll the queue and the intercore ring from each core to achieve the proc= essing. No single core will become the bottleneck as far as polling the NIC= is concerned. You might argue on what basis I switch to the relevant core = for further processing, but that's _my_ usecase and headache to further equ= ally distribute amongst the cores. Imagine an LTE usecase where I am on the core side (SGW), the packets come = over GTP from thousands of mobiles (via eNB). I can employ all the cores to= pickup the GTP packets (if NIC gives me round robin) and then based on the= inner IP packet's src IP address (the mobile IP address), I can take it to= the further relevant core for processing. This way I will get a complete l= oad balancing done not only for polling from NIC but also for processing of= the inner IP packets. I have also worked a lot on Cavium processors. Those of you who are familia= r with that would know that the POW scheduler gives the packets to whicheve= r core is requesting for work so the packets can go to any core in Cavium O= cteon processor. The only way to achieve similar functionality in DPDK is t= o drop a queue per core into the NIC and then let NIC do round robin on tho= se queues blindly. What's the harm if this feature is added, let those who = want to use it, use, and those who hate it or think it is useless, ignore. Regards -Prashant -----Original Message----- From: Fran=E7ois-Fr=E9d=E9ric Ozog [mailto:ff@ozog.com] Sent: Thursday, December 05, 2013 2:16 PM To: Prashant Upadhyaya Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev@dpdk.org Subject: RE: [dpdk-dev] generic load balancing Hi, If the traffic you manage is above MPLS or GTP encapsulations, then you can= use cards that provide flexible hash functions. Chelsio cxgb5 provides com= bination of "offset", length and tuple that may help. The only reason I would have loved to get a pure round robin feature was to= pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint) test= s where the traffic issue was multicast from a single source... But that is= not real life traffic. If you could share the use case... Fran=E7ois-Fr=E9d=E9ric > -----Message d'origine----- > De : Prashant Upadhyaya [mailto:prashant.upadhyaya@aricent.com] > Envoy=E9 : jeudi 5 d=E9cembre 2013 06:30 > =C0 : Stephen Hemminger > Cc : Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org Objet = : > RE: [dpdk-dev] generic load balancing > > Hi Stepher, > > The awfulness depends upon the 'usecase' > I have eg. a usecase where I want this roundrobin behaviour. > > I just want the NIC to give me a facility to use this. > > Regards > -Prashant > > > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, December 05, 2013 10:25 AM > To: Prashant Upadhyaya > Cc: Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org > Subject: Re: [dpdk-dev] generic load balancing > > Round robin would actually be awful for any protocol because it would cause > out of order packets. > That is why flow based algorithms like flow director and RSS work much > better. > > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya > wrote: > > Hi, > > > > It's a real pity that Intel 82599 NIC (and possibly others) don't > > have a > simple round robin scheduling of packets on the configured queues. > > > > I have requested Intel earlier, and using this forum requesting > > again -- > please please put this facility in the NIC that if I drop N queues > there and configure the NIC for some round robin scheduling on > queues, then NIC should simply put the received packets one by one on > queue 1, then on queue2,....,then on queueN, and then back on queue 1. > > The above is very useful in lot of load balancing cases. > > > > Regards > > -Prashant > > > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of > > Fran=E7ois-Fr=E9d=E9ric Ozog > > Sent: Thursday, December 05, 2013 2:35 AM > > To: 'Michael Quicquaro' > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] generic load balancing > > > > Hi, > > > > As far as I can tell, this is really hardware dependent. Some hash > functions allow uplink and downlink packets of the same "session" to > go to the same queue (I know Chelsio can do this). > > > > For the Intel card, you may find what you want in: > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10 > > -g > > be-con > > troller-datasheet.html > > > > Other cards require NDA or other agreements to get details of RSS. > > > > If you have a performance problem, may I suggest you use kernel 3.10 then > monitor system activity with "perf" command. For instance you can > start with "perf top -a" this will give you nice information. Then > your creativity will do the rest ;-) You may be surprised what comes > on the top hot points... > > (the most unexpected hot function I found here was Linux syscall > > gettimeofday!!!) > > > > Fran=E7ois-Fr=E9d=E9ric > > > >> -----Message d'origine----- > >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael > >> Quicquaro Envoy=E9 : mercredi 4 d=E9cembre 2013 18:53 =C0 : dev@dpdk.o= rg Objet > : > >> [dpdk-dev] generic load balancing > >> > >> Hi all, > >> I am writing a dpdk application that will receive packets from one > >> interface and process them. It does not forward packets in the > > traditional > >> sense. However, I do need to process them at full line rate and > >> therefore need more than one core. The packets can be somewhat > >> generic in nature > > and > >> can be nearly identical (especially at the beginning of the packet). > >> I've used the rxonly function of testpmd as a model. > >> > >> I've run into problems in processing a full line rate of data since > >> the nature of the data causes all the data to be presented to only > >> one > core. > > I > >> get a large percentage of dropped packets (shows up as Rx-Errors in > >> "port > >> stats") because of this. I've tried modifying the data so that > >> packets have different UDP ports and that seems to work when I use > >> --rss-udp > >> > >> My questions are: > >> 1) Is there a way to configure RSS so that it alternates packets to > >> all configured cores regardless of the packet data? > >> > >> 2) Where is the best place to learn more about RSS and how to > >> configure it? I have not found much in the DPDK documentation. > >> > >> Thanks for the help, > >> - Mike > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > =3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D Please refer to > > http://www.aricent.com/legal/email_disclaimer.html > > for important disclosures regarding this electronic communication. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > =3D=3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D > Please refer to http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D