From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E4447A034F; Mon, 10 Jan 2022 18:18:47 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4D27A41161; Mon, 10 Jan 2022 18:18:47 +0100 (CET) Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2078.outbound.protection.outlook.com [40.107.243.78]) by mails.dpdk.org (Postfix) with ESMTP id 1D5E640041 for ; Mon, 10 Jan 2022 18:18:46 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KKWLZwtFhQonGJraQb7iQhyuTDjUbQo/iCZ1JyrbiCWrmIQRw6jIgIK0D4yGeAF8jJkGSmNUgxJkCVCkkA7nzmg/mPSXOeZMCDz7VPrqcoWB8SWKIjswXe7zvJ+edF4H3JKa4nYXA/54qYZ42pYZjzLFe79GOGse4AE1QEvb/vDQ8bi1LJTN25MZPJ26TlDNrAoGwpmfq0PuAdICszRihmG45lPmSI7drO8UAZVBRaWSWJ6JZKWd/SwBtIbcOlaT2SdNcQ4penYHAoZDokWIn8pY7kq8NZhAA8TBeUdwBdpXlJNv7o20vqLKo92w7Q5YNq3ZE+XDyDmMDV3drA4EPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tOM+XVbYoNIOXzlI2C1x7+/3Lduq97WVM0QgPM2vuWA=; b=X/XFeqT6PtX8a6rpyvztSj0SHurbzvX+q8Ecmesp1An8GL+rlAuA1KZSqSajbxXg7RjJ2Krm2GJuoWoJQE5bfydW+Q0GljxoNpQWHTr8BwifetFHIavrGWVZHF9jMIAjFsyTOoy0I0QRI5ssuU5xsEVJAuLQn5tMHJkwBNfZzlHPkf0D9KT/JbQAwYXtvZXboJ1imsrzkT+C+h6PUkDgZPKN1ZnlEFWmVSS6JHPDKnRe8g4l/LjeMkTxYrZJ18wEhfBnT+T/u1iFtKspWabyHOFyfVC7oL8G41NII0oWJcVcAarDFIySc89cRfLf0UboiNDgy61md1VM8N/EzVLnsA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tOM+XVbYoNIOXzlI2C1x7+/3Lduq97WVM0QgPM2vuWA=; b=axsbUoOH15qdjOafNOYdgmn5NlSddQvlhIc173GJP4N8axhSXu96J3N7y+EQNoi5VZEiFkasKrPkopcTipJi8WOQ/MAZeuXjcNElp9D2LwaLHli0pXQwFGe7S7HVyG0/aleROiXXEQS1O651FnAxTrv/bDC2mlpTaVnghEISF0zH936rRFmeg1JIwnIaB9yeDAD1Mu0HSpKck04TNLRCK20QSee6YCZnLldMk26YpxfhhNAWM7a3w/TujPG9ttI5SlSxCQVRlh66awrMnjVBFQo1F+M41zkNX0lyPK9ysu5X4KLr50FhTPJneed6m/oO2WNmUZ3lyD8TAkesH1aomw== Received: from MW2PR12MB4666.namprd12.prod.outlook.com (2603:10b6:302:13::22) by MWHPR1201MB2556.namprd12.prod.outlook.com (2603:10b6:300:e3::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4867.11; Mon, 10 Jan 2022 17:18:44 +0000 Received: from MW2PR12MB4666.namprd12.prod.outlook.com ([fe80::78:438a:c6b7:1cc1]) by MW2PR12MB4666.namprd12.prod.outlook.com ([fe80::78:438a:c6b7:1cc1%3]) with mapi id 15.20.4867.011; Mon, 10 Jan 2022 17:18:44 +0000 From: Ori Kam To: Ivan Malov CC: Stephen Hemminger , "NBU-Contact-Thomas Monjalon (EXTERNAL)" , "NBU-Contact-Adrien Mazarguil (EXTERNAL)" , "dev@dpdk.org" , Andrew Rybchenko Subject: RE: Understanding Flow API action RSS Thread-Topic: Understanding Flow API action RSS Thread-Index: AQHX/MEw4cm0JPZdFkG5T3Lwla88o6xS12aAgABGoQCAABppAIAAOdQAgAc34BCAAA7fAIAAD+vggAGkOoCAACNU8A== Date: Mon, 10 Jan 2022 17:18:44 +0000 Message-ID: References: <76f98055-c517-5185-b79-d16ec5ef5ff@oktetlabs.ru> <4677833.GXAFRqVoOG@thomas> <20220104085442.4e406f2a@hermes.local> <13f1886-d714-7e8-e176-4872a1c8e85@oktetlabs.ru> <20220104135612.4e5c8143@hermes.local> <1fa28b5-22f4-36f0-a4fe-2ceedad4434@oktetlabs.ru> <37111834-aecb-ac17-1059-177287a1507e@oktetlabs.ru> In-Reply-To: <37111834-aecb-ac17-1059-177287a1507e@oktetlabs.ru> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 75cbdfaf-ac82-493f-8fdb-08d9d45d4161 x-ms-traffictypediagnostic: MWHPR1201MB2556:EE_ x-ld-processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 4X+13dS5axyLMknOkUgAbAtUo0r9XROn2BbresDIGQMGZPrWTL+bt+z74sJvQsNMMJIF8rw/BmSP2k5/HDTH41BHGCYiknR8AnGZMqg5vTz/4a1chmsCHVImD3EGBPfw806PyL5eZS9gb6YOj6BYY2UnrQFw4pTJOS0UWn9Tvunrk3RGTbctrkgh8iShNWRGr4K2QTWpCTFsIudPtRO8lFCidSc90mqlJup9Qc5Fb7o6NCirVqTxQMh00lGDj6D+SXOtoAQOWV8D1uaovAAfkZO3G9ASW2aS4GlOz+Fbz1YSsSSM9uuOGSdkWiWJjJ58vy0kvdHUegAWdw9mYHapCC+a1TYDoUoJkUmbgKZl4ZNHoaxLz4AqD3dIeL6JeLLqPsP/4rkvjgU3ct9+Kd6JoubziPYrnEc19eUcu03BHjanDPEHl40OdNZCVIQ5BYkLmiwqNaRnZB202bEAEG9tt+xh0EKO3F2t8iJKbIedpnee0hUAM/CsmVEOgH7WEpkE+6GBiD2FrFDKU8K1S0tJoIJYNIdkmElSZA23DMFPuvoljy7ExAz8YWksfXZGAuBcgqtA9yd7+mWDvajzVZtF70lmBNX7oW4CI+NtsAeIW/2IMvUN5xce/Nmxz/suqSyFRhK22OyXcjqEP+kceoJ6S9nDTOadGMu0h1W1eu8/x8szIaZsJEvFsVEIC0+ZE4PLal504ZidEYJF1jHEGRLL1Q== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MW2PR12MB4666.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(54906003)(64756008)(76116006)(66476007)(5660300002)(66556008)(55016003)(9686003)(66446008)(6916009)(8936002)(66946007)(71200400001)(316002)(52536014)(186003)(2906002)(38070700005)(122000001)(26005)(38100700002)(45080400002)(7696005)(508600001)(8676002)(4326008)(83380400001)(53546011)(33656002)(6506007)(86362001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?8MFHmLBBR7y9t/trex3mqWWtkyhqVkAip/9CHvWNIT+Dr7JBOUNHAOHi+fhI?= =?us-ascii?Q?lPAZxIpqBi8WrOoBVzGkAFOoo94XqD25GuVKqtQq96LOfT709aGJwkZBHUXP?= =?us-ascii?Q?zofAZnkyVHCPkoy9u36PxmvjuoJb//kiW+zC0Mg33AGekVHrv2KEwt+/XxWI?= =?us-ascii?Q?G5XxhpV5OUX2KuLliPs8eAbEXqP34Uh826eSL4FEUgPLaOVxzUDm/7X9++c8?= =?us-ascii?Q?b1eUxuO4Qso/nGVapG+9iEXSa9xIGnXvJXcbYa+ENn+txEmcdRWG8KEgJekr?= =?us-ascii?Q?3pXRji7Tg3Ic5CJ1Gv91JoXcaVymrc/j6NQbzQkVSRzrc8siQUB+BbyP57TS?= =?us-ascii?Q?GmXVw6JEhc46RV+JPNrg4D1jfluwhSubFDFKhAzUr9tLrBv+5COYcJWjaSg8?= =?us-ascii?Q?kBVpfTgF+ML+JkYYdLCygN2OKA6JgVGID/dgjKhP2m8GLt3Wvr/ufKmMSl+X?= =?us-ascii?Q?SnO5+kVP/u2BcMknKbSpXHhoZrbE7AMUsHBiLu9MUycsFhrb5TvzaOyNrDvF?= =?us-ascii?Q?NeHjujnKC+Hhw6zoSZZl6W4JgTDi4ITmayHW9ZbbgJZUtGQbSRmGcaUYTB6P?= =?us-ascii?Q?A5eNwfzXZ9rrhm2PW1vW9WxjQifNTT2o7PoUGuQHXS31rfaGEzM0HHI9HyDt?= =?us-ascii?Q?vTltwF2S5f/Cpn48+tGFH99TykQ+EyIJB/MW5eSV8VIueRuiQKUlSmcQY+gw?= =?us-ascii?Q?OKVYYjE1OyMQ/55+97tdcqlJvVmiAF+9CGjgLw1dwOcZ+bGA0cM0wsgjDesE?= =?us-ascii?Q?ONuzQun8ZCdMzoQ/mk9jhDpx8oVGmFtp8TRdMvIuU983pkOEP9SGAcYZZy2k?= =?us-ascii?Q?s7BJxfnokXq5g+ItG+LY1SoelEJCW48mD41BxRUvKaSv1KavLJH+Gte5/4fm?= =?us-ascii?Q?kuaHWb3FBEHnPBv8v0blyGCdY8msRibE4KfP2vZ75jcgPOOC1oQmkpnD1nQA?= =?us-ascii?Q?1hii/ceCK2Uc3nqQP8glBzbVgtuzbUZbXSP532jKEVhY5ynHUAgF9eqjy+c6?= =?us-ascii?Q?xYIa8MCT2+ZQ5qzmYbJG4aKtoa48XbBt9rAPXsvtG3Phr3Jpa65nsduBuM1G?= =?us-ascii?Q?A64D23vDyCoDtGvpQGz0j5mhR8hv+xRY1oTCZQOuEeogFT7Ds3AQ7OGR+eQf?= =?us-ascii?Q?OrTiI7MKPMFh4xfTbe+ep1O/4MAEypTbUZFAcmzkfJprBvt/6ycsEDQSxqah?= =?us-ascii?Q?R1jlnhMOJlxvNGSYbBYQ6EovJdJn7nbGGYF4MFB3LlUfQSN4uMETpQ+Rz4zN?= =?us-ascii?Q?emLbR8Go19p+BNAoIEO91fND9H4k6Tf4jHVZ53KD/wViG9nv5pz85NWgcSjh?= =?us-ascii?Q?ozYdIy97/mdHMQCo0EaYEHMP41ol7CbjVs6KSRaAcnIetFV1nWzUWdLvmiFb?= =?us-ascii?Q?6nVeFZH9f00XzUlWGydYO+hbS45iODjvFFRWAYd1cprYk/1qlIDio5D38G0/?= =?us-ascii?Q?6pTo0UU3YNYfqot3c6iljal//YmsCUzt3xi53hy4brTDvZxeNrouK/8i1AAS?= =?us-ascii?Q?rw8yAUBmYg0akmOIwymquoXCs99tug9soTA5s3iawDUVEc9uiPPoQVhAG1c9?= =?us-ascii?Q?aLxd+dfnQP9lX98ACnQVtVBzvNEV5AARvhxLgjWeBijHT7RgQADGbymkJQdi?= =?us-ascii?Q?6UFFNbWiCVAHMLDcxjEZf1E=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MW2PR12MB4666.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 75cbdfaf-ac82-493f-8fdb-08d9d45d4161 X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jan 2022 17:18:44.2658 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rsujiuZXvGC+IktsSnGTEkkWdtoE+7HozBMAk7eTMbx+7W7hDN1vz6PXQ+iJYS0Wkw+soXIDG1jO49kN79BDSA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR1201MB2556 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Ian, > -----Original Message----- > From: Ivan Malov > Subject: RE: Understanding Flow API action RSS >=20 > Hi Ori, >=20 > Many-many thanks for your commentary. >=20 > The nature of 'queue' array in flow action RSS is clear now. > I hope PMD vendors and API users share this vision, too. > Propably, this should be properly documented. > We'll see what we cad do in that direction. >=20 > Please see one more question below. >=20 > On Mon, 10 Jan 2022, Ori Kam wrote: >=20 > > Hi Ivan, > > > >> -----Original Message----- > >> From: Ivan Malov > >> Sent: Sunday, January 9, 2022 3:03 PM > >> Subject: RE: Understanding Flow API action RSS > >> > >> Hi Ori, > >> > >> On Sun, 9 Jan 2022, Ori Kam wrote: > >> > >>> Hi Stephen and Ivan > >>> > >>>> -----Original Message----- > >>>> From: Stephen Hemminger > >>>> Sent: Tuesday, January 4, 2022 11:56 PM > >>>> Subject: Re: Understanding Flow API action RSS > >>>> > >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > >>>> Ivan Malov wrote: > >>>> > >>>>> Hi Stephen, > >>>>> > >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: > >>>>> > >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100 > >>>>>> Thomas Monjalon wrote: > >>>>>> > >>>>>>> +Cc Ori Kam, rte_flow maintainer > >>>>>>> > >>>>>>> 29/12/2021 15:34, Ivan Malov: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, '= queue' is > >>>>>>>> to provide "Queue indices to use". But it is unclear whether the= order of > >>>>>>>> elements is meaningful or not. Does that matter? Can queue indic= es repeat? > >>>>>> > >>>>>> The order probably doesn't matter, it is like the RSS indirection = table. > >>>>> > >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In = it, > >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RET= A > >>>>> may comprise multiple "groups", each one comprising 64 entries. > >>>>> > >>>>> This 'queue' array in flow action RSS does not stick with the same > >>>>> terminology, it does not reuse the definition of RETA "group", etc. > >>>>> Just "queue indices to use". No definition of order, no structure. > >>>>> > >>>>> The API contract is not clear. Neither to users, nor to PMDs. > >>>>> > >>>> From API in RSS the queues are simply the queue ID, order doesn't ma= tter, > >>> Duplicating the queue may affect the the spread based on the HW/PMD. > >>> In common case each queue should appear only once and the PMD may dup= licate > >>> entries to get the best performance. > >> > >> Look. In a DPDK PMD, one has "global" RSS table. Consider the followin= g > >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue > >> indices may repeat. They may have different order: 1, 1, 0, 0, ... . > >> The order is of great importance. If you send a packet to a > >> DPDK-powered server, you can know in advance its hash value. > >> Hence, you may strictly predict which RSS table entry this > >> hash will point at. That predicts the target Rx queue. > >> > >> So the questions which one should attempt to clarify, are as follows: > >> 1) Is the 'queue' array ordered? (Does the order of elements matter?) > >> 2) Can its elements repeat? (*allowed* or *not allowed*?) > >> > >> From API point of view the array is ordered, and may have repeating el= ements. > > > >>> > >>>>>> > >>>>>> rx queue =3D RSS_indirection_table[ RSS_hash_value % RSS_indire= ction_table_size ] > >>>>>> > >>>>>> So you could play with multiple queues matching same hash value, b= ut that > >>>>>> would be uncommon. > >>>>>> > >>>>>>>> An ethdev may have "global" RSS setting with an indirection tabl= e of some > >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that si= ze matter? > >>>>>> > >>>>>> Global RSS is only used if the incoming packet does not match any = rte_flow > >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACT= ION_TYPE_RSS > >>>>>> these take precedence. > >>>>> > >>>>> Yes, I know all of that. The question is how does the PMD select RE= TA size > >>>>> for this action? Can it select an arbitrary value? Or should it sti= ck with > >>>>> the "global" one (eg. 512)? How does the user know the table size? > >>>>> > >>>>> If the user simply wants to spread traffic across the given queues, > >>>>> the effective table size is a don't care to them, and the existing > >>>>> API contract is fine. But if the user expects that certain packets > >>>>> hit some precise queues, they need to know the table size for that. > >>>>> > >>> Just like you said RSS simply spread the traffic to the given queues. > >> > >> Yes, to the given queues. The question is whether the 'queue' array > >> has RETA properties (order matters; elements can repeat) or not. > >> > > > > Yes order matters and elements can repeat. > > > >>> If application wants to send traffic to some queue it should use the = queue action. > >> > >> Yes, but that's not what I mean. Consider the following example. The u= ser > >> generates packets with random IP addresses at machine A. These packets > >> hit DPDK at machine B. For a given *packet*, the sender (A) can > >> compute its RSS hash in software. This will point out the RETA > >> entry index. But, in order to predict the exact *queue* index, > >> the sender has to know the table (its contents, its size). > >> > > Why do application need this info? > > > >> For a "global" DPDK RSS setting, the table can be easily obtained with > >> an ethdev callback / API. Very simple. Fixed-size table, and it can > >> be queried. But how does one obtain similar knowledge for RSS action? > >> > > The RSS action was designed to allow balanced traffic spread. > > The size of the reta is PMD dependent, in some PMD the size will be > > the number of queues in others it will be the number of queues but in > > power of 2, so if the app requested 8 queues the reta will also be 8. > > In any case PMD should use the given order, if the PMD needs to expend > > it should cycle on the application requested queues in the order they w= ere given. > > > > > >>> > >>>>> So, the question is whether the users should or should not build > >>>>> any expectations of the effective table size and, if they should, > >>>>> are they supposed to use the "global" table size for that? > >>>> > >>>> You are right this area is completely undocumented. Personally would= really like > >>>> it if rte_flow had a reference software implementation and all the H= W vendors > >>>> had to make sure their HW matched the SW reference version. But this= a case > >>>> where the funding is all on the HW side, and no one has time or reso= urces > >>>> to do a complete SW version.. > >>>> > >>>> A sane implementation would configure RSS indirection as across all > >>>> rx queues that were available when the device was started; ie all qu= eues > >>>> that did not have deferred start set. Then the application would sta= rt/stop > >>>> queues and use rte_flow to reach them. > >>>> > >>>> But it doesn't appear the HW follows that model. > >>>> > >>>> > >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action = RSS, does > >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash = algorithm? >=20 > What do you think about the above question? In my opinion, DEFAULT should > let the PMD select whatever hash function / algorithm it may want to > select. Just some vendor-specific optimal choice. >=20 > If the user wants exactly Toeplitz / "standard RSS hash" behaviour, > they can always specify enum TOEPLITZ. And the PMD must either > comply or reject. What do you think? Are we on the same page? >=20 Fully agree with you. The same goes if the user doesn't supply the key, PMD should select some de= fault value. > >>>>>> > >>>>>> No the default is always Toeplitz. This goes back to the original= definition > >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz. > >>>>> > >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, th= e > >>>>> documentation should be more specific to say which algorithm exactl= y > >>>>> this DEFAULT choice provides. Otherwise, it is very vague. > >>>>> > >>>>>> > >>>>>> DPDK should have more examples of using rte_flow, I have some samp= les > >>>>>> but they aren't that useful. > >>>>>> > >>>>> > >>>>> I could not agree more. > >>> > >>> Feel free to add/suggest what example are missing. > >>> > >>>>> > >>>>> Thanks, > >>>>> Ivan M. > >>> > >>> Best, > >>> Ori > >>> > > Best, > > Ori > > >=20 > Best regards, > Ivan M.