From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by dpdk.org (Postfix) with ESMTP id 676839DE for ; Thu, 5 Dec 2013 19:32:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1386268412; x=1417804412; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=3nYQbTjIpVfuGWvEBJX7vrNP/TRKetQaM/A2prrZiPU=; b=K63qKpbCR+bnosg7+pf/4tszlsncB88AhSZlK4463NGkvfqU7SGZo8sw leRn3F1oakhxDAG2fhWnj9LcjUlbAl/hsK4qOphXAjCYF25kNDfwvHp// qGQ8z+v8KhOHC64cJMxvIog9kXymVcAMUbAV1eqMWbKlLoZYbt0x4H3u6 I=; X-IronPort-AV: E=Sophos;i="4.93,835,1378857600"; d="scan'208,217";a="42453656" Received: from smtp-in-9003.sea19.amazon.com ([10.186.104.20]) by smtp-border-fw-out-9102.sea19.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 05 Dec 2013 18:33:26 +0000 Received: from ex10-hub-31002.ant.amazon.com (ex10-hub-31002.sea31.amazon.com [10.185.169.193]) by smtp-in-9003.sea19.amazon.com (8.14.7/8.14.7) with ESMTP id rB5IXL0X027699 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=OK); Thu, 5 Dec 2013 18:33:25 GMT Received: from EX10-MBX-9002.ant.amazon.com ([fe80::f0a2:552e:430e:3cdb]) by ex10-hub-31002.ant.amazon.com ([::1]) with mapi id 14.02.0342.003; Thu, 5 Dec 2013 10:33:07 -0800 From: "Benson, Bryan" To: Prashant Upadhyaya , Stephen Hemminger Thread-Topic: [dpdk-dev] generic load balancing Thread-Index: AQHO8RnFWyvPo7f6W02hdASxHnM2OppFDOSAgAB82wCAAAZaAIAACdCA//+fi/CAAG0RIIAAQ7z+ Date: Thu, 5 Dec 2013 18:33:06 +0000 Message-ID: References: <03f701cef134$7e564720$7b02d560$@com> , , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.17.1.114] MIME-Version: 1.0 Precedence: Bulk Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] generic load balancing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 18:32:29 -0000 Prashant, 1) I thought the same, but I was pleasantly surprised at how much a single = core can RX and distribute (from a single 10G port). It was a while back, b= ut in my experimentation with well distributed incoming flows, I found near= ly identical bottleneck points between polling using one core and using RSS= for multiple cores. 2) Here are a few links - The flex byte offset looks like it is global for = each port, but the value it matches is part of each in the signature, as is= the output port. Adding the sig: http://dpdk.org/doc/api/rte__ethdev_8h.html#a4ef515ffe18b57bed5493bcea90f16= d7 Filter definition: http://dpdk.org/doc/api/structrte__fdir__filter.html - Now, I have not yet used the flow director features, so YMMV. Also, I vag= uely remember hearing a bit about a performance issue when using more than = 4 output queues, due to the number of PCI-e lanes. I don't recall the exac= t details, so if you see something like that when using more than 4 queues,= don't be surprised (but let us know what you find!). I hope this helps, - Bryan ________________________________ From: Prashant Upadhyaya [prashant.upadhyaya@aricent.com] Sent: Thursday, December 05, 2013 6:16 AM To: Benson, Bryan; Stephen Hemminger Cc: dev@dpdk.org Subject: RE: [dpdk-dev] generic load balancing Hi Bryan, Regarding your 1st point, the single core becomes the rx bottleneck which i= s clearly not desirable. I am not sure regarding how to use the stuff you mentioned in 2nd point, is= there some DPDK api which lets me configure this, kindly let me know. Regards -Prashant From: Benson, Bryan [mailto:bmbenson@amazon.com] Sent: Thursday, December 05, 2013 1:14 PM To: Prashant Upadhyaya; Stephen Hemminger Cc: dev@dpdk.org Subject: RE: [dpdk-dev] generic load balancing Prashant, I assume your use case is not of one IP/UDP/TCP - or if it is, you are deal= ing with a single tuple that is not evenly distributed. You have a few options with the NIC that I can think of. 1) Use a single core to RX each port's frames and use your own software sol= ution to RR to worker rings. There is an example of this in the Load Balan= cer sample application. 2) If your packets/frames have an evenly distributed field in the first 64 = bytes of the frame, you can use the 2 byte match feature of flow director t= o send to different queues (with multiple match signatures). This will giv= e even distribution, but not round robin behavior. 3) Modify the RSS redirection table for the NIC in the order you desire. I= am unsure how often this can happen, or if there are performance issues wi= th reprogramming it. Definitely would need some experimentation. What is it you are trying to achieve with Round Robin? A distribution of p= ackets to multiple cores for processing, or something else? Without knowing the use case, my main suggestion is to use the LB sample ap= plication - that way you can distribute in any way you please. Thanks, Bryan Benson -------- Original message -------- From: Prashant Upadhyaya Date:12/04/2013 9:30 PM (GMT-08:00) To: Stephen Hemminger Cc: dev@dpdk.org Subject: Re: [dpdk-dev] generic load balancing Hi Stepher, The awfulness depends upon the 'usecase' I have eg. a usecase where I want this roundrobin behaviour. I just want the NIC to give me a facility to use this. Regards -Prashant -----Original Message----- From: Stephen Hemminger [mailto:stephen@networkplumber.org] Sent: Thursday, December 05, 2013 10:25 AM To: Prashant Upadhyaya Cc: Fran=E7ois-Fr=E9d=E9ric Ozog; Michael Quicquaro; dev@dpdk.org Subject: Re: [dpdk-dev] generic load balancing Round robin would actually be awful for any protocol because it would cause= out of order packets. That is why flow based algorithms like flow director and RSS work much bett= er. On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya > wrote: > Hi, > > It's a real pity that Intel 82599 NIC (and possibly others) don't have a = simple round robin scheduling of packets on the configured queues. > > I have requested Intel earlier, and using this forum requesting again -- = please please put this facility in the NIC that if I drop N queues there an= d configure the NIC for some round robin scheduling on queues, then NIC sh= ould simply put the received packets one by one on queue 1, then on queue2,= ....,then on queueN, and then back on queue 1. > The above is very useful in lot of load balancing cases. > > Regards > -Prashant > > > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fran=E7ois-Fr=E9d=E9= ric > Ozog > Sent: Thursday, December 05, 2013 2:35 AM > To: 'Michael Quicquaro' > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] generic load balancing > > Hi, > > As far as I can tell, this is really hardware dependent. Some hash functi= ons allow uplink and downlink packets of the same "session" to go to the sa= me queue (I know Chelsio can do this). > > For the Intel card, you may find what you want in: > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-g > be-con > troller-datasheet.html > > Other cards require NDA or other agreements to get details of RSS. > > If you have a performance problem, may I suggest you use kernel 3.10 then= monitor system activity with "perf" command. For instance you can start wi= th "perf top -a" this will give you nice information. Then your creativity = will do the rest ;-) You may be surprised what comes on the top hot points.= .. > (the most unexpected hot function I found here was Linux syscall > gettimeofday!!!) > > Fran=E7ois-Fr=E9d=E9ric > >> -----Message d'origine----- >> De : dev [mailto:dev-bounces@dpdk.org] De la part de Michael >> Quicquaro Envoy=E9 : mercredi 4 d=E9cembre 2013 18:53 =C0 : dev@dpdk.org= Objet : >> [dpdk-dev] generic load balancing >> >> Hi all, >> I am writing a dpdk application that will receive packets from one >> interface and process them. It does not forward packets in the > traditional >> sense. However, I do need to process them at full line rate and >> therefore need more than one core. The packets can be somewhat >> generic in nature > and >> can be nearly identical (especially at the beginning of the packet). >> I've used the rxonly function of testpmd as a model. >> >> I've run into problems in processing a full line rate of data since >> the nature of the data causes all the data to be presented to only one c= ore. > I >> get a large percentage of dropped packets (shows up as Rx-Errors in >> "port >> stats") because of this. I've tried modifying the data so that >> packets have different UDP ports and that seems to work when I use >> --rss-udp >> >> My questions are: >> 1) Is there a way to configure RSS so that it alternates packets to >> all configured cores regardless of the packet data? >> >> 2) Where is the best place to learn more about RSS and how to >> configure it? I have not found much in the DPDK documentation. >> >> Thanks for the help, >> - Mike > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D Please refer to > http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D Please refer to http://www.aricent.com/legal/email_disclaimer.html for important disclosures regarding this electronic communication. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D